~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/filesystems/nfs/localio.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/filesystems/nfs/localio.rst (Version linux-6.12-rc7) and /Documentation/filesystems/nfs/localio.rst (Version linux-6.11.7)


  1 ===========                                       
  2 NFS LOCALIO                                       
  3 ===========                                       
  4                                                   
  5 Overview                                          
  6 ========                                          
  7                                                   
  8 The LOCALIO auxiliary RPC protocol allows the     
  9 server to reliably handshake to determine if t    
 10 host. Select "NFS client and server support fo    
 11 protocol" in menuconfig to enable CONFIG_NFS_L    
 12 config (both CONFIG_NFS_FS and CONFIG_NFSD mus    
 13                                                   
 14 Once an NFS client and server handshake as "lo    
 15 bypass the network RPC protocol for read, writ    
 16 Due to this XDR and RPC bypass, these operatio    
 17                                                   
 18 The LOCALIO auxiliary protocol's implementatio    
 19 connection as NFS traffic, follows the pattern    
 20 ACL protocol extension.                           
 21                                                   
 22 The LOCALIO auxiliary protocol is needed to al    
 23 clients local to their servers. In a private i    
 24 preceded use of this LOCALIO protocol, a fragi    
 25 address based match against all local network     
 26 But unlike the LOCALIO protocol, the sockaddr-    
 27 handle use of iptables or containers.             
 28                                                   
 29 The robust handshake between local client and     
 30 beginning, the ultimate use case this locality    
 31 client is able to open files and issue reads,     
 32 directly to the server without having to go ov    
 33 requirement is to perform these loopback NFS o    
 34 as possible, this is particularly useful for c    
 35 (e.g. kubernetes) where it is possible to run     
 36 server.                                           
 37                                                   
 38 The performance advantage realized from LOCALI    
 39 using XDR and RPC for reads, writes and commit    
 40                                                   
 41 fio for 20 secs with directio, qd of 8, 16 lib    
 42   - With LOCALIO:                                 
 43     4K read:    IOPS=979k,  BW=3825MiB/s (4011    
 44     4K write:   IOPS=165k,  BW=646MiB/s  (678M    
 45     128K read:  IOPS=402k,  BW=49.1GiB/s (52.7    
 46     128K write: IOPS=11.5k, BW=1433MiB/s (1503    
 47                                                   
 48   - Without LOCALIO:                              
 49     4K read:    IOPS=79.2k, BW=309MiB/s  (324M    
 50     4K write:   IOPS=59.8k, BW=234MiB/s  (245M    
 51     128K read:  IOPS=33.9k, BW=4234MiB/s (4440    
 52     128K write: IOPS=11.5k, BW=1434MiB/s (1504    
 53                                                   
 54 fio for 20 secs with directio, qd of 8, 1 liba    
 55   - With LOCALIO:                                 
 56     4K read:    IOPS=230k,  BW=898MiB/s  (941M    
 57     4K write:   IOPS=22.6k, BW=88.3MiB/s (92.6    
 58     128K read:  IOPS=38.8k, BW=4855MiB/s (5091    
 59     128K write: IOPS=11.4k, BW=1428MiB/s (1497    
 60                                                   
 61   - Without LOCALIO:                              
 62     4K read:    IOPS=77.1k, BW=301MiB/s  (316M    
 63     4K write:   IOPS=32.8k, BW=128MiB/s  (135M    
 64     128K read:  IOPS=24.4k, BW=3050MiB/s (3198    
 65     128K write: IOPS=11.4k, BW=1430MiB/s (1500    
 66                                                   
 67 FAQ                                               
 68 ===                                               
 69                                                   
 70 1. What are the use cases for LOCALIO?            
 71                                                   
 72    a. Workloads where the NFS client and serve    
 73       realize improved IO performance. In part    
 74       running containerised workloads for jobs    
 75       running on the same host as the knfsd se    
 76       storage.                                    
 77                                                   
 78 2. What are the requirements for LOCALIO?         
 79                                                   
 80    a. Bypass use of the network RPC protocol a    
 81       includes bypassing XDR and RPC for open,    
 82       operations.                                 
 83    b. Allow client and server to autonomously     
 84       running local to each other without maki    
 85       the local network topology.                 
 86    c. Support the use of containers by being c    
 87       namespaces (e.g. network, user, mount).     
 88    d. Support all versions of NFS. NFSv3 is of    
 89       because it has wide enterprise usage and    
 90       of it for the data path.                    
 91                                                   
 92 3. Why doesn’t LOCALIO just compare IP addre    
 93    deciding if the NFS client and server are c    
 94    host?                                          
 95                                                   
 96    Since one of the main use cases is containe    
 97    assume that IP addresses will be shared bet    
 98    server. This sets up a requirement for a ha    
 99    needs to go over the same connection as the    
100    identify that the client and the server rea    
101    same host. The handshake uses a secret that    
102    and can be verified by both parties by comp    
103    in shared kernel memory if they are truly c    
104                                                   
105 4. Does LOCALIO improve pNFS flexfiles?           
106                                                   
107    Yes, LOCALIO complements pNFS flexfiles by     
108    advantage of NFS client and server locality    
109    client IO as closely to the server where th    
110    benefits from the data path optimization LO    
111                                                   
112 5. Why not develop a new pNFS layout to enable    
113                                                   
114    A new pNFS layout could be developed, but d    
115    onus on the server to somehow discover that    
116    when deciding to hand out the layout.          
117    There is value in a simpler approach (as pr    
118    allows the NFS client to negotiate and leve    
119    requiring more elaborate modeling and disco    
120    more centralized manner.                       
121                                                   
122 6. Why is having the client perform a server-s    
123    using RPC, beneficial?  Is the benefit pNFS    
124                                                   
125    Avoiding the use of XDR and RPC for file op    
126    performance regardless of whether pNFS is u    
127    dealing with small files its best to avoid     
128    whenever possible, otherwise it could reduc    
129    benefits of avoiding the wire for doing the    
130    Given LOCALIO's requirements the current ap    
131    client perform a server-side file open, wit    
132    If in the future requirements change then w    
133                                                   
134 7. Why is LOCALIO only supported with UNIX Aut    
135                                                   
136    Strong authentication is usually tied to th    
137    works by establishing a context that is cac    
138    that acts as the key for discovering the au    
139    can then be passed to rpc.mountd to complet    
140    process. On the other hand, in the case of     
141    that was passed over the wire is used direc    
142    upcall to rpc.mountd. This simplifies the a    
143    so makes AUTH_UNIX easier to support.          
144                                                   
145 8. How do export options that translate RPC us    
146    operations (eg. root_squash, all_squash)?      
147                                                   
148    Export options that translate user IDs are     
149    which is called by nfsd_setuser_and_check_p    
150    __fh_verify().  So they get handled exactly    
151    as they do for non-LOCALIO.                    
152                                                   
153 9. How does LOCALIO make certain that object l    
154    properly given NFSD and NFS operate in diff    
155                                                   
156    See the detailed "NFS Client and Server Int    
157                                                   
158 RPC                                               
159 ===                                               
160                                                   
161 The LOCALIO auxiliary RPC protocol consists of    
162 RPC method that allows the Linux NFS client to    
163 NFS server can see the nonce (single-use UUID)    
164 made available in nfs_common. This protocol is    
165 standard, nor does it need to be considering i    
166 auxiliary RPC protocol that amounts to an impl    
167                                                   
168 The UUID_IS_LOCAL method encodes the client ge    
169 the fixed UUID_SIZE (16 bytes). The fixed size    
170 XDR methods are used instead of the less effic    
171 methods.                                          
172                                                   
173 The RPC program number for the NFS_LOCALIO_PRO    
174 by IANA, see https://www.iana.org/assignments/    
175 Linux Kernel Organization       400122  nfsloc    
176                                                   
177 The LOCALIO protocol spec in rpcgen syntax is:    
178                                                   
179   /* raw RFC 9562 UUID */                         
180   #define UUID_SIZE 16                            
181   typedef u8 uuid_t<UUID_SIZE>;                   
182                                                   
183   program NFS_LOCALIO_PROGRAM {                   
184       version LOCALIO_V1 {                        
185           void                                    
186               NULL(void) = 0;                     
187                                                   
188           void                                    
189               UUID_IS_LOCAL(uuid_t) = 1;          
190       } = 1;                                      
191   } = 400122;                                     
192                                                   
193 LOCALIO uses the same transport connection as     
194 LOCALIO is not registered with rpcbind.           
195                                                   
196 NFS Common and Client/Server Handshake            
197 ======================================            
198                                                   
199 fs/nfs_common/nfslocalio.c provides interfaces    
200 to generate a nonce (single-use UUID) and asso    
201 nfs_uuid_t struct, register it with nfs_common    
202 verification by the NFS server and if matched     
203 members in the nfs_uuid_t struct. The NFS clie    
204 transfer the nfs_uuid_t from its nfs_uuids to     
205 clients_list from the nfs_common's uuids_list.    
206 fs/nfs/localio.c:nfs_local_probe()                
207                                                   
208 nfs_common's nfs_uuids list is the basis for L    
209 it has members that point to nfsd memory for d    
210 (e.g. 'net' is the server's network namespace,    
211 access nn->nfsd_serv with proper rcu read acce    
212 and server synchronization that enables advanc    
213 objects to span from the host kernel's nfsd to    
214 instances that are connected to nfs client's r    
215 host.                                             
216                                                   
217 NFS Client and Server Interlock                   
218 ===============================                   
219                                                   
220 LOCALIO provides the nfs_uuid_t object and ass    
221 allow proper network namespace (net-ns) and NF    
222                                                   
223     We don't want to keep a long-term counted     
224     net-ns in the client because that prevents    
225     completely shutting down.                     
226                                                   
227     So we avoid taking a reference at all and     
228     reference to the server (detailed below) b    
229     the net-ns active. This involves allowing     
230     code to iterate all active clients and cle    
231     (which are needed to find the per-cpu-refc    
232                                                   
233     Details:                                      
234                                                   
235      - Embed nfs_uuid_t in nfs_client. nfs_uui    
236        that can be used to find the client. It    
237        uuid_t to nfs_client so it is bigger th    
238        uuid_t is only used during the initial     
239        LOCALIO handshake to determine if they     
240        If that is really a problem we can find    
241                                                   
242      - When the nfs server confirms that the u    
243        the nfs_uuid_t onto a per-net-ns list i    
244                                                   
245      - When each server's net-ns is shutting d    
246        handler, all these nfs_uuid_t have thei    
247        an rcu_synchronize() call between pre_e    
248        handlers so any caller that sees nfs_uu    
249        safely manage the per-cpu-refcount for     
250                                                   
251      - The client's nfs_uuid_t is passed to nf    
252        can safely dereference ->net in a priva    
253        to allow safe access to the associated     
254                                                   
255 So LOCALIO required the introduction and use o    
256 interlock nfsd_destroy_serv() and nfsd_open_lo    
257 nn->nfsd_serv is not destroyed while in use by    
258 warrants a more detailed explanation:             
259                                                   
260     nfsd_open_local_fh() uses nfsd_serv_try_ge    
261     nfsd_file handle and then the caller (NFS     
262     reference for the nfsd_file and associated    
263     nfs_file_put_local() once it has completed    
264                                                   
265     This interlock working relies heavily on n    
266     afforded the ability to safely deal with t    
267     NFSD's net-ns (and nfsd_net by association    
268     by nfsd_destroy_serv() via nfsd_shutdown_n    
269     possible given the nfs_uuid_t ->net pointe    
270     above.                                        
271                                                   
272 All told, this elaborate interlock of the NFS     
273 verified to fix an easy to hit crash that woul    
274 instance running in a container, with a LOCALI    
275 shutdown. Upon restart of the container and as    
276 would go on to crash due to NULL pointer deref    
277 to the LOCALIO client's attempting to nfsd_ope    
278 nn->nfsd_serv, without having a proper referen    
279                                                   
280 NFS Client issues IO instead of Server            
281 ======================================            
282                                                   
283 Because LOCALIO is focused on protocol bypass     
284 performance, alternatives to the traditional N    
285 with XDR) must be provided to access the backi    
286                                                   
287 See fs/nfs/localio.c:nfs_local_open_fh() and      
288 fs/nfsd/localio.c:nfsd_open_local_fh() for the    
289 focused use of select nfs server objects to al    
290 server to open a file pointer without needing     
291                                                   
292 The client's fs/nfs/localio.c:nfs_local_open_f    
293 server's fs/nfsd/localio.c:nfsd_open_local_fh(    
294 both the associated nfsd network namespace and    
295 RCU. If nfsd_open_local_fh() finds that the cl    
296 nfsd objects (be it struct net or nn->nfsd_ser    
297 to nfs_local_open_fh() and the client will try    
298 LOCALIO resources needed by calling nfs_local_    
299 recovery is needed if/when an nfsd instance ru    
300 to reboot while a LOCALIO client is connected     
301                                                   
302 Once the client has an open nfsd_file pointer     
303 writes and commits directly to the underlying     
304 done by the nfs server). As such, for these op    
305 is issuing IO to the underlying local filesyst    
306 the NFS server. See: fs/nfs/localio.c:nfs_loca    
307 fs/nfs/localio.c:nfs_local_commit().              
308                                                   
309 Security                                          
310 ========                                          
311                                                   
312 Localio is only supported when UNIX-style auth    
313 AUTH_SYS) is used.                                
314                                                   
315 Care is taken to ensure the same NFS security     
316 (authentication, etc) regardless of whether LO    
317 access is used. The auth_domain established as    
318 NFS client access to the NFS server is also us    
319                                                   
320 Relative to containers, LOCALIO gives the clie    
321 namespace the server has. This is required to     
322 the server's per-namespace nfsd_net struct. Wi    
323 client is afforded this same level of access (    
324 protocol via SUNRPC). No other namespaces (use    
325 altered or purposely extended from the server     
326                                                   
327 Testing                                           
328 =======                                           
329                                                   
330 The LOCALIO auxiliary protocol and associated     
331 and commit access have proven stable against v    
332                                                   
333 - Client and server both on the same host.        
334                                                   
335 - All permutations of client and server suppor    
336   local and remote client and server.             
337                                                   
338 - Testing against NFS storage products that do    
339   protocol was also performed.                    
340                                                   
341 - Client on host, server within a container (f    
342   The container testing was in terms of podman    
343   includes successful container stop/restart s    
344                                                   
345 - Formalizing these test scenarios in terms of    
346   infrastructure is on-going. Initial regular     
347   terms of ktest running xfstests against a LO    
348   mount configuration, and includes lockdep an    
349   https://evilpiepirate.org/~testdashboard/ci?    
350   https://github.com/koverstreet/ktest            
351                                                   
352 - Various kdevops testing (in terms of "Chuck'    
353   performed to regularly verify the LOCALIO ch    
354   regressions to non-LOCALIO NFS use cases.       
355                                                   
356 - All of Hammerspace's various sanity tests pa    
357   (this includes numerous pNFS and flexfiles t    
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php