1 =========== 2 NFS LOCALIO 3 =========== 4 5 Overview 6 ======== 7 8 The LOCALIO auxiliary RPC protocol allows the 9 server to reliably handshake to determine if t 10 host. Select "NFS client and server support fo 11 protocol" in menuconfig to enable CONFIG_NFS_L 12 config (both CONFIG_NFS_FS and CONFIG_NFSD mus 13 14 Once an NFS client and server handshake as "lo 15 bypass the network RPC protocol for read, writ 16 Due to this XDR and RPC bypass, these operatio 17 18 The LOCALIO auxiliary protocol's implementatio 19 connection as NFS traffic, follows the pattern 20 ACL protocol extension. 21 22 The LOCALIO auxiliary protocol is needed to al 23 clients local to their servers. In a private i 24 preceded use of this LOCALIO protocol, a fragi 25 address based match against all local network 26 But unlike the LOCALIO protocol, the sockaddr- 27 handle use of iptables or containers. 28 29 The robust handshake between local client and 30 beginning, the ultimate use case this locality 31 client is able to open files and issue reads, 32 directly to the server without having to go ov 33 requirement is to perform these loopback NFS o 34 as possible, this is particularly useful for c 35 (e.g. kubernetes) where it is possible to run 36 server. 37 38 The performance advantage realized from LOCALI 39 using XDR and RPC for reads, writes and commit 40 41 fio for 20 secs with directio, qd of 8, 16 lib 42 - With LOCALIO: 43 4K read: IOPS=979k, BW=3825MiB/s (4011 44 4K write: IOPS=165k, BW=646MiB/s (678M 45 128K read: IOPS=402k, BW=49.1GiB/s (52.7 46 128K write: IOPS=11.5k, BW=1433MiB/s (1503 47 48 - Without LOCALIO: 49 4K read: IOPS=79.2k, BW=309MiB/s (324M 50 4K write: IOPS=59.8k, BW=234MiB/s (245M 51 128K read: IOPS=33.9k, BW=4234MiB/s (4440 52 128K write: IOPS=11.5k, BW=1434MiB/s (1504 53 54 fio for 20 secs with directio, qd of 8, 1 liba 55 - With LOCALIO: 56 4K read: IOPS=230k, BW=898MiB/s (941M 57 4K write: IOPS=22.6k, BW=88.3MiB/s (92.6 58 128K read: IOPS=38.8k, BW=4855MiB/s (5091 59 128K write: IOPS=11.4k, BW=1428MiB/s (1497 60 61 - Without LOCALIO: 62 4K read: IOPS=77.1k, BW=301MiB/s (316M 63 4K write: IOPS=32.8k, BW=128MiB/s (135M 64 128K read: IOPS=24.4k, BW=3050MiB/s (3198 65 128K write: IOPS=11.4k, BW=1430MiB/s (1500 66 67 FAQ 68 === 69 70 1. What are the use cases for LOCALIO? 71 72 a. Workloads where the NFS client and serve 73 realize improved IO performance. In part 74 running containerised workloads for jobs 75 running on the same host as the knfsd se 76 storage. 77 78 2. What are the requirements for LOCALIO? 79 80 a. Bypass use of the network RPC protocol a 81 includes bypassing XDR and RPC for open, 82 operations. 83 b. Allow client and server to autonomously 84 running local to each other without maki 85 the local network topology. 86 c. Support the use of containers by being c 87 namespaces (e.g. network, user, mount). 88 d. Support all versions of NFS. NFSv3 is of 89 because it has wide enterprise usage and 90 of it for the data path. 91 92 3. Why doesn’t LOCALIO just compare IP addre 93 deciding if the NFS client and server are c 94 host? 95 96 Since one of the main use cases is containe 97 assume that IP addresses will be shared bet 98 server. This sets up a requirement for a ha 99 needs to go over the same connection as the 100 identify that the client and the server rea 101 same host. The handshake uses a secret that 102 and can be verified by both parties by comp 103 in shared kernel memory if they are truly c 104 105 4. Does LOCALIO improve pNFS flexfiles? 106 107 Yes, LOCALIO complements pNFS flexfiles by 108 advantage of NFS client and server locality 109 client IO as closely to the server where th 110 benefits from the data path optimization LO 111 112 5. Why not develop a new pNFS layout to enable 113 114 A new pNFS layout could be developed, but d 115 onus on the server to somehow discover that 116 when deciding to hand out the layout. 117 There is value in a simpler approach (as pr 118 allows the NFS client to negotiate and leve 119 requiring more elaborate modeling and disco 120 more centralized manner. 121 122 6. Why is having the client perform a server-s 123 using RPC, beneficial? Is the benefit pNFS 124 125 Avoiding the use of XDR and RPC for file op 126 performance regardless of whether pNFS is u 127 dealing with small files its best to avoid 128 whenever possible, otherwise it could reduc 129 benefits of avoiding the wire for doing the 130 Given LOCALIO's requirements the current ap 131 client perform a server-side file open, wit 132 If in the future requirements change then w 133 134 7. Why is LOCALIO only supported with UNIX Aut 135 136 Strong authentication is usually tied to th 137 works by establishing a context that is cac 138 that acts as the key for discovering the au 139 can then be passed to rpc.mountd to complet 140 process. On the other hand, in the case of 141 that was passed over the wire is used direc 142 upcall to rpc.mountd. This simplifies the a 143 so makes AUTH_UNIX easier to support. 144 145 8. How do export options that translate RPC us 146 operations (eg. root_squash, all_squash)? 147 148 Export options that translate user IDs are 149 which is called by nfsd_setuser_and_check_p 150 __fh_verify(). So they get handled exactly 151 as they do for non-LOCALIO. 152 153 9. How does LOCALIO make certain that object l 154 properly given NFSD and NFS operate in diff 155 156 See the detailed "NFS Client and Server Int 157 158 RPC 159 === 160 161 The LOCALIO auxiliary RPC protocol consists of 162 RPC method that allows the Linux NFS client to 163 NFS server can see the nonce (single-use UUID) 164 made available in nfs_common. This protocol is 165 standard, nor does it need to be considering i 166 auxiliary RPC protocol that amounts to an impl 167 168 The UUID_IS_LOCAL method encodes the client ge 169 the fixed UUID_SIZE (16 bytes). The fixed size 170 XDR methods are used instead of the less effic 171 methods. 172 173 The RPC program number for the NFS_LOCALIO_PRO 174 by IANA, see https://www.iana.org/assignments/ 175 Linux Kernel Organization 400122 nfsloc 176 177 The LOCALIO protocol spec in rpcgen syntax is: 178 179 /* raw RFC 9562 UUID */ 180 #define UUID_SIZE 16 181 typedef u8 uuid_t<UUID_SIZE>; 182 183 program NFS_LOCALIO_PROGRAM { 184 version LOCALIO_V1 { 185 void 186 NULL(void) = 0; 187 188 void 189 UUID_IS_LOCAL(uuid_t) = 1; 190 } = 1; 191 } = 400122; 192 193 LOCALIO uses the same transport connection as 194 LOCALIO is not registered with rpcbind. 195 196 NFS Common and Client/Server Handshake 197 ====================================== 198 199 fs/nfs_common/nfslocalio.c provides interfaces 200 to generate a nonce (single-use UUID) and asso 201 nfs_uuid_t struct, register it with nfs_common 202 verification by the NFS server and if matched 203 members in the nfs_uuid_t struct. The NFS clie 204 transfer the nfs_uuid_t from its nfs_uuids to 205 clients_list from the nfs_common's uuids_list. 206 fs/nfs/localio.c:nfs_local_probe() 207 208 nfs_common's nfs_uuids list is the basis for L 209 it has members that point to nfsd memory for d 210 (e.g. 'net' is the server's network namespace, 211 access nn->nfsd_serv with proper rcu read acce 212 and server synchronization that enables advanc 213 objects to span from the host kernel's nfsd to 214 instances that are connected to nfs client's r 215 host. 216 217 NFS Client and Server Interlock 218 =============================== 219 220 LOCALIO provides the nfs_uuid_t object and ass 221 allow proper network namespace (net-ns) and NF 222 223 We don't want to keep a long-term counted 224 net-ns in the client because that prevents 225 completely shutting down. 226 227 So we avoid taking a reference at all and 228 reference to the server (detailed below) b 229 the net-ns active. This involves allowing 230 code to iterate all active clients and cle 231 (which are needed to find the per-cpu-refc 232 233 Details: 234 235 - Embed nfs_uuid_t in nfs_client. nfs_uui 236 that can be used to find the client. It 237 uuid_t to nfs_client so it is bigger th 238 uuid_t is only used during the initial 239 LOCALIO handshake to determine if they 240 If that is really a problem we can find 241 242 - When the nfs server confirms that the u 243 the nfs_uuid_t onto a per-net-ns list i 244 245 - When each server's net-ns is shutting d 246 handler, all these nfs_uuid_t have thei 247 an rcu_synchronize() call between pre_e 248 handlers so any caller that sees nfs_uu 249 safely manage the per-cpu-refcount for 250 251 - The client's nfs_uuid_t is passed to nf 252 can safely dereference ->net in a priva 253 to allow safe access to the associated 254 255 So LOCALIO required the introduction and use o 256 interlock nfsd_destroy_serv() and nfsd_open_lo 257 nn->nfsd_serv is not destroyed while in use by 258 warrants a more detailed explanation: 259 260 nfsd_open_local_fh() uses nfsd_serv_try_ge 261 nfsd_file handle and then the caller (NFS 262 reference for the nfsd_file and associated 263 nfs_file_put_local() once it has completed 264 265 This interlock working relies heavily on n 266 afforded the ability to safely deal with t 267 NFSD's net-ns (and nfsd_net by association 268 by nfsd_destroy_serv() via nfsd_shutdown_n 269 possible given the nfs_uuid_t ->net pointe 270 above. 271 272 All told, this elaborate interlock of the NFS 273 verified to fix an easy to hit crash that woul 274 instance running in a container, with a LOCALI 275 shutdown. Upon restart of the container and as 276 would go on to crash due to NULL pointer deref 277 to the LOCALIO client's attempting to nfsd_ope 278 nn->nfsd_serv, without having a proper referen 279 280 NFS Client issues IO instead of Server 281 ====================================== 282 283 Because LOCALIO is focused on protocol bypass 284 performance, alternatives to the traditional N 285 with XDR) must be provided to access the backi 286 287 See fs/nfs/localio.c:nfs_local_open_fh() and 288 fs/nfsd/localio.c:nfsd_open_local_fh() for the 289 focused use of select nfs server objects to al 290 server to open a file pointer without needing 291 292 The client's fs/nfs/localio.c:nfs_local_open_f 293 server's fs/nfsd/localio.c:nfsd_open_local_fh( 294 both the associated nfsd network namespace and 295 RCU. If nfsd_open_local_fh() finds that the cl 296 nfsd objects (be it struct net or nn->nfsd_ser 297 to nfs_local_open_fh() and the client will try 298 LOCALIO resources needed by calling nfs_local_ 299 recovery is needed if/when an nfsd instance ru 300 to reboot while a LOCALIO client is connected 301 302 Once the client has an open nfsd_file pointer 303 writes and commits directly to the underlying 304 done by the nfs server). As such, for these op 305 is issuing IO to the underlying local filesyst 306 the NFS server. See: fs/nfs/localio.c:nfs_loca 307 fs/nfs/localio.c:nfs_local_commit(). 308 309 Security 310 ======== 311 312 Localio is only supported when UNIX-style auth 313 AUTH_SYS) is used. 314 315 Care is taken to ensure the same NFS security 316 (authentication, etc) regardless of whether LO 317 access is used. The auth_domain established as 318 NFS client access to the NFS server is also us 319 320 Relative to containers, LOCALIO gives the clie 321 namespace the server has. This is required to 322 the server's per-namespace nfsd_net struct. Wi 323 client is afforded this same level of access ( 324 protocol via SUNRPC). No other namespaces (use 325 altered or purposely extended from the server 326 327 Testing 328 ======= 329 330 The LOCALIO auxiliary protocol and associated 331 and commit access have proven stable against v 332 333 - Client and server both on the same host. 334 335 - All permutations of client and server suppor 336 local and remote client and server. 337 338 - Testing against NFS storage products that do 339 protocol was also performed. 340 341 - Client on host, server within a container (f 342 The container testing was in terms of podman 343 includes successful container stop/restart s 344 345 - Formalizing these test scenarios in terms of 346 infrastructure is on-going. Initial regular 347 terms of ktest running xfstests against a LO 348 mount configuration, and includes lockdep an 349 https://evilpiepirate.org/~testdashboard/ci? 350 https://github.com/koverstreet/ktest 351 352 - Various kdevops testing (in terms of "Chuck' 353 performed to regularly verify the LOCALIO ch 354 regressions to non-LOCALIO NFS use cases. 355 356 - All of Hammerspace's various sanity tests pa 357 (this includes numerous pNFS and flexfiles t
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.