~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/networking/devmem.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/networking/devmem.rst (Version linux-6.12-rc7) and /Documentation/networking/devmem.rst (Version linux-2.6.32.71)


  1 .. SPDX-License-Identifier: GPL-2.0               
  2                                                   
  3 =================                                 
  4 Device Memory TCP                                 
  5 =================                                 
  6                                                   
  7                                                   
  8 Intro                                             
  9 =====                                             
 10                                                   
 11 Device memory TCP (devmem TCP) enables receivi    
 12 memory (dmabuf). The feature is currently impl    
 13                                                   
 14                                                   
 15 Opportunity                                       
 16 -----------                                       
 17                                                   
 18 A large number of data transfers have device m    
 19 destination. Accelerators drastically increase    
 20 transfers.  Some examples include:                
 21                                                   
 22 - Distributed training, where ML accelerators,    
 23   exchange data.                                  
 24                                                   
 25 - Distributed raw block storage applications t    
 26   remote SSDs. Much of this data does not requ    
 27                                                   
 28 Typically the Device-to-Device data transfers     
 29 the following low-level operations: Device-to-    
 30 transfer, and Host-to-Device copy.                
 31                                                   
 32 The flow involving host copies is suboptimal,     
 33 and can put significant strains on system reso    
 34 bandwidth and PCIe bandwidth.                     
 35                                                   
 36 Devmem TCP optimizes this use case by implemen    
 37 the user to receive incoming network packets d    
 38                                                   
 39 Packet payloads go directly from the NIC to de    
 40                                                   
 41 Packet headers go to host memory and are proce    
 42 normally. The NIC must support header split to    
 43                                                   
 44 Advantages:                                       
 45                                                   
 46 - Alleviate host memory bandwidth pressure, co    
 47   network-transfer + device-copy semantics.       
 48                                                   
 49 - Alleviate PCIe bandwidth pressure, by limiti    
 50   level of the PCIe tree, compared to the trad    
 51   through the root complex.                       
 52                                                   
 53                                                   
 54 More Info                                         
 55 ---------                                         
 56                                                   
 57   slides, video                                   
 58     https://netdevconf.org/0x17/sessions/talk/    
 59                                                   
 60   patchset                                        
 61     [PATCH net-next v24 00/13] Device Memory T    
 62     https://lore.kernel.org/netdev/20240831004    
 63                                                   
 64                                                   
 65 Interface                                         
 66 =========                                         
 67                                                   
 68                                                   
 69 Example                                           
 70 -------                                           
 71                                                   
 72 tools/testing/selftests/net/ncdevmem.c:do_serv    
 73 the RX path of this API.                          
 74                                                   
 75                                                   
 76 NIC Setup                                         
 77 ---------                                         
 78                                                   
 79 Header split, flow steering, & RSS are require    
 80                                                   
 81 Header split is used to split incoming packets    
 82 memory, and a payload buffer in device memory.    
 83                                                   
 84 Flow steering & RSS are used to ensure that on    
 85 an RX queue bound to devmem.                      
 86                                                   
 87 Enable header split & flow steering::             
 88                                                   
 89         # enable header split                     
 90         ethtool -G eth1 tcp-data-split on         
 91                                                   
 92                                                   
 93         # enable flow steering                    
 94         ethtool -K eth1 ntuple on                 
 95                                                   
 96 Configure RSS to steer all traffic away from t    
 97 this example)::                                   
 98                                                   
 99         ethtool --set-rxfh-indir eth1 equal 15    
100                                                   
101                                                   
102 The user must bind a dmabuf to any number of R    
103 the netlink API::                                 
104                                                   
105         /* Bind dmabuf to NIC RX queue 15 */      
106         struct netdev_queue *queues;              
107         queues = malloc(sizeof(*queues) * 1);     
108                                                   
109         queues[0]._present.type = 1;              
110         queues[0]._present.idx = 1;               
111         queues[0].type = NETDEV_RX_QUEUE_TYPE_    
112         queues[0].idx = 15;                       
113                                                   
114         *ys = ynl_sock_create(&ynl_netdev_fami    
115                                                   
116         req = netdev_bind_rx_req_alloc();         
117         netdev_bind_rx_req_set_ifindex(req, 1     
118         netdev_bind_rx_req_set_dmabuf_fd(req,     
119         __netdev_bind_rx_req_set_queues(req, q    
120                                                   
121         rsp = netdev_bind_rx(*ys, req);           
122                                                   
123         dmabuf_id = rsp->dmabuf_id;               
124                                                   
125                                                   
126 The netlink API returns a dmabuf_id: a unique     
127 that has been bound.                              
128                                                   
129 The user can unbind the dmabuf from the netdev    
130 that established the binding. We do this so th    
131 unbound even if the userspace process crashes.    
132                                                   
133 Note that any reasonably well-behaved dmabuf f    
134 devmem TCP, even if the dmabuf is not actually    
135 this is udmabuf, which wraps user memory (non-    
136                                                   
137                                                   
138 Socket Setup                                      
139 ------------                                      
140                                                   
141 The socket must be flow steered to the dmabuf     
142                                                   
143         ethtool -N eth1 flow-type tcp4 ... que    
144                                                   
145                                                   
146 Receiving data                                    
147 --------------                                    
148                                                   
149 The user application must signal to the kernel    
150 devmem data by passing the MSG_SOCK_DEVMEM fla    
151                                                   
152         ret = recvmsg(fd, &msg, MSG_SOCK_DEVME    
153                                                   
154 Applications that do not specify the MSG_SOCK_    
155 on devmem data.                                   
156                                                   
157 Devmem data is received directly into the dmab    
158 Setup', and the kernel signals such to the use    
159                                                   
160                 for (cm = CMSG_FIRSTHDR(&msg);    
161                         if (cm->cmsg_level !=     
162                                 (cm->cmsg_type    
163                                  cm->cmsg_type    
164                                 continue;         
165                                                   
166                         dmabuf_cmsg = (struct     
167                                                   
168                         if (cm->cmsg_type == S    
169                                 /* Frag landed    
170                                  *                
171                                  * dmabuf_cmsg    
172                                  * frag landed    
173                                  *                
174                                  * dmabuf_cmsg    
175                                  * the dmabuf     
176                                  *                
177                                  * dmabuf_cmsg    
178                                  * frag.          
179                                  *                
180                                  * dmabuf_cmsg    
181                                  * refer to th    
182                                  */               
183                                                   
184                                 struct dmabuf_    
185                                 token.token_st    
186                                 token.token_co    
187                                 continue;         
188                         }                         
189                                                   
190                         if (cm->cmsg_type == S    
191                                 /* Frag landed    
192                                  *                
193                                  * dmabuf_cmsg    
194                                  * frag.          
195                                  */               
196                                 continue;         
197                                                   
198                 }                                 
199                                                   
200 Applications may receive 2 cmsgs:                 
201                                                   
202 - SCM_DEVMEM_DMABUF: this indicates the fragme    
203   by dmabuf_id.                                   
204                                                   
205 - SCM_DEVMEM_LINEAR: this indicates the fragme    
206   This typically happens when the NIC is unabl    
207   header boundary, such that part (or all) of     
208   memory.                                         
209                                                   
210 Applications may receive no SO_DEVMEM_* cmsgs.    
211 regular TCP data that landed on an RX queue no    
212                                                   
213                                                   
214 Freeing frags                                     
215 -------------                                     
216                                                   
217 Frags received via SCM_DEVMEM_DMABUF are pinne    
218 processes the frag. The user must return the f    
219 SO_DEVMEM_DONTNEED::                              
220                                                   
221         ret = setsockopt(client_fd, SOL_SOCKET    
222                          sizeof(token));          
223                                                   
224 The user must ensure the tokens are returned t    
225 Failure to do so will exhaust the limited dmab    
226 and will lead to packet drops.                    
227                                                   
228                                                   
229 Implementation & Caveats                          
230 ========================                          
231                                                   
232 Unreadable skbs                                   
233 ---------------                                   
234                                                   
235 Devmem payloads are inaccessible to the kernel    
236 results in a few quirks for payloads of devmem    
237                                                   
238 - Loopback is not functional. Loopback relies     
239   not possible with devmem skbs.                  
240                                                   
241 - Software checksum calculation fails.            
242                                                   
243 - TCP Dump and bpf can't access devmem packet     
244                                                   
245                                                   
246 Testing                                           
247 =======                                           
248                                                   
249 More realistic example code can be found in th    
250 ``tools/testing/selftests/net/ncdevmem.c``        
251                                                   
252 ncdevmem is a devmem TCP netcat. It works very    
253 receives data directly into a udmabuf.            
254                                                   
255 To run ncdevmem, you need to run it on a serve    
256 you need to run netcat on a peer to provide th    
257                                                   
258 ncdevmem has a validation mode as well that ex    
259 incoming data and validates it as such. For ex    
260 ncdevmem on the server by::                       
261                                                   
262         ncdevmem -s <server IP> -c <client IP>    
263                  -p 5201 -v 7                     
264                                                   
265 On client side, use regular netcat to send TX     
266 on the server::                                   
267                                                   
268         yes $(echo -e \\x01\\x02\\x03\\x04\\x0    
269                 tr \\n \\0 | head -c 5G | nc <    
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php