~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/filesystems/nfs/localio.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/filesystems/nfs/localio.rst (Architecture sparc64) and /Documentation/filesystems/nfs/localio.rst (Architecture m68k)


  1 ===========                                         1 ===========
  2 NFS LOCALIO                                         2 NFS LOCALIO
  3 ===========                                         3 ===========
  4                                                     4 
  5 Overview                                            5 Overview
  6 ========                                            6 ========
  7                                                     7 
  8 The LOCALIO auxiliary RPC protocol allows the       8 The LOCALIO auxiliary RPC protocol allows the Linux NFS client and
  9 server to reliably handshake to determine if t      9 server to reliably handshake to determine if they are on the same
 10 host. Select "NFS client and server support fo     10 host. Select "NFS client and server support for LOCALIO auxiliary
 11 protocol" in menuconfig to enable CONFIG_NFS_L     11 protocol" in menuconfig to enable CONFIG_NFS_LOCALIO in the kernel
 12 config (both CONFIG_NFS_FS and CONFIG_NFSD mus     12 config (both CONFIG_NFS_FS and CONFIG_NFSD must also be enabled).
 13                                                    13 
 14 Once an NFS client and server handshake as "lo     14 Once an NFS client and server handshake as "local", the client will
 15 bypass the network RPC protocol for read, writ     15 bypass the network RPC protocol for read, write and commit operations.
 16 Due to this XDR and RPC bypass, these operatio     16 Due to this XDR and RPC bypass, these operations will operate faster.
 17                                                    17 
 18 The LOCALIO auxiliary protocol's implementatio     18 The LOCALIO auxiliary protocol's implementation, which uses the same
 19 connection as NFS traffic, follows the pattern     19 connection as NFS traffic, follows the pattern established by the NFS
 20 ACL protocol extension.                            20 ACL protocol extension.
 21                                                    21 
 22 The LOCALIO auxiliary protocol is needed to al     22 The LOCALIO auxiliary protocol is needed to allow robust discovery of
 23 clients local to their servers. In a private i     23 clients local to their servers. In a private implementation that
 24 preceded use of this LOCALIO protocol, a fragi     24 preceded use of this LOCALIO protocol, a fragile sockaddr network
 25 address based match against all local network      25 address based match against all local network interfaces was attempted.
 26 But unlike the LOCALIO protocol, the sockaddr-     26 But unlike the LOCALIO protocol, the sockaddr-based matching didn't
 27 handle use of iptables or containers.              27 handle use of iptables or containers.
 28                                                    28 
 29 The robust handshake between local client and      29 The robust handshake between local client and server is just the
 30 beginning, the ultimate use case this locality     30 beginning, the ultimate use case this locality makes possible is the
 31 client is able to open files and issue reads,      31 client is able to open files and issue reads, writes and commits
 32 directly to the server without having to go ov     32 directly to the server without having to go over the network. The
 33 requirement is to perform these loopback NFS o     33 requirement is to perform these loopback NFS operations as efficiently
 34 as possible, this is particularly useful for c     34 as possible, this is particularly useful for container use cases
 35 (e.g. kubernetes) where it is possible to run      35 (e.g. kubernetes) where it is possible to run an IO job local to the
 36 server.                                            36 server.
 37                                                    37 
 38 The performance advantage realized from LOCALI     38 The performance advantage realized from LOCALIO's ability to bypass
 39 using XDR and RPC for reads, writes and commit     39 using XDR and RPC for reads, writes and commits can be extreme, e.g.:
 40                                                    40 
 41 fio for 20 secs with directio, qd of 8, 16 lib     41 fio for 20 secs with directio, qd of 8, 16 libaio threads:
 42   - With LOCALIO:                                  42   - With LOCALIO:
 43     4K read:    IOPS=979k,  BW=3825MiB/s (4011     43     4K read:    IOPS=979k,  BW=3825MiB/s (4011MB/s)(74.7GiB/20002msec)
 44     4K write:   IOPS=165k,  BW=646MiB/s  (678M     44     4K write:   IOPS=165k,  BW=646MiB/s  (678MB/s)(12.6GiB/20002msec)
 45     128K read:  IOPS=402k,  BW=49.1GiB/s (52.7     45     128K read:  IOPS=402k,  BW=49.1GiB/s (52.7GB/s)(982GiB/20002msec)
 46     128K write: IOPS=11.5k, BW=1433MiB/s (1503     46     128K write: IOPS=11.5k, BW=1433MiB/s (1503MB/s)(28.0GiB/20004msec)
 47                                                    47 
 48   - Without LOCALIO:                               48   - Without LOCALIO:
 49     4K read:    IOPS=79.2k, BW=309MiB/s  (324M     49     4K read:    IOPS=79.2k, BW=309MiB/s  (324MB/s)(6188MiB/20003msec)
 50     4K write:   IOPS=59.8k, BW=234MiB/s  (245M     50     4K write:   IOPS=59.8k, BW=234MiB/s  (245MB/s)(4671MiB/20002msec)
 51     128K read:  IOPS=33.9k, BW=4234MiB/s (4440     51     128K read:  IOPS=33.9k, BW=4234MiB/s (4440MB/s)(82.7GiB/20004msec)
 52     128K write: IOPS=11.5k, BW=1434MiB/s (1504     52     128K write: IOPS=11.5k, BW=1434MiB/s (1504MB/s)(28.0GiB/20011msec)
 53                                                    53 
 54 fio for 20 secs with directio, qd of 8, 1 liba     54 fio for 20 secs with directio, qd of 8, 1 libaio thread:
 55   - With LOCALIO:                                  55   - With LOCALIO:
 56     4K read:    IOPS=230k,  BW=898MiB/s  (941M     56     4K read:    IOPS=230k,  BW=898MiB/s  (941MB/s)(17.5GiB/20001msec)
 57     4K write:   IOPS=22.6k, BW=88.3MiB/s (92.6     57     4K write:   IOPS=22.6k, BW=88.3MiB/s (92.6MB/s)(1766MiB/20001msec)
 58     128K read:  IOPS=38.8k, BW=4855MiB/s (5091     58     128K read:  IOPS=38.8k, BW=4855MiB/s (5091MB/s)(94.8GiB/20001msec)
 59     128K write: IOPS=11.4k, BW=1428MiB/s (1497     59     128K write: IOPS=11.4k, BW=1428MiB/s (1497MB/s)(27.9GiB/20001msec)
 60                                                    60 
 61   - Without LOCALIO:                               61   - Without LOCALIO:
 62     4K read:    IOPS=77.1k, BW=301MiB/s  (316M     62     4K read:    IOPS=77.1k, BW=301MiB/s  (316MB/s)(6022MiB/20001msec)
 63     4K write:   IOPS=32.8k, BW=128MiB/s  (135M     63     4K write:   IOPS=32.8k, BW=128MiB/s  (135MB/s)(2566MiB/20001msec)
 64     128K read:  IOPS=24.4k, BW=3050MiB/s (3198     64     128K read:  IOPS=24.4k, BW=3050MiB/s (3198MB/s)(59.6GiB/20001msec)
 65     128K write: IOPS=11.4k, BW=1430MiB/s (1500     65     128K write: IOPS=11.4k, BW=1430MiB/s (1500MB/s)(27.9GiB/20001msec)
 66                                                    66 
 67 FAQ                                                67 FAQ
 68 ===                                                68 ===
 69                                                    69 
 70 1. What are the use cases for LOCALIO?             70 1. What are the use cases for LOCALIO?
 71                                                    71 
 72    a. Workloads where the NFS client and serve     72    a. Workloads where the NFS client and server are on the same host
 73       realize improved IO performance. In part     73       realize improved IO performance. In particular, it is common when
 74       running containerised workloads for jobs     74       running containerised workloads for jobs to find themselves
 75       running on the same host as the knfsd se     75       running on the same host as the knfsd server being used for
 76       storage.                                     76       storage.
 77                                                    77 
 78 2. What are the requirements for LOCALIO?          78 2. What are the requirements for LOCALIO?
 79                                                    79 
 80    a. Bypass use of the network RPC protocol a     80    a. Bypass use of the network RPC protocol as much as possible. This
 81       includes bypassing XDR and RPC for open,     81       includes bypassing XDR and RPC for open, read, write and commit
 82       operations.                                  82       operations.
 83    b. Allow client and server to autonomously      83    b. Allow client and server to autonomously discover if they are
 84       running local to each other without maki     84       running local to each other without making any assumptions about
 85       the local network topology.                  85       the local network topology.
 86    c. Support the use of containers by being c     86    c. Support the use of containers by being compatible with relevant
 87       namespaces (e.g. network, user, mount).      87       namespaces (e.g. network, user, mount).
 88    d. Support all versions of NFS. NFSv3 is of     88    d. Support all versions of NFS. NFSv3 is of particular importance
 89       because it has wide enterprise usage and     89       because it has wide enterprise usage and pNFS flexfiles makes use
 90       of it for the data path.                     90       of it for the data path.
 91                                                    91 
 92 3. Why doesn’t LOCALIO just compare IP addre     92 3. Why doesn’t LOCALIO just compare IP addresses or hostnames when
 93    deciding if the NFS client and server are c     93    deciding if the NFS client and server are co-located on the same
 94    host?                                           94    host?
 95                                                    95 
 96    Since one of the main use cases is containe     96    Since one of the main use cases is containerised workloads, we cannot
 97    assume that IP addresses will be shared bet     97    assume that IP addresses will be shared between the client and
 98    server. This sets up a requirement for a ha     98    server. This sets up a requirement for a handshake protocol that
 99    needs to go over the same connection as the     99    needs to go over the same connection as the NFS traffic in order to
100    identify that the client and the server rea    100    identify that the client and the server really are running on the
101    same host. The handshake uses a secret that    101    same host. The handshake uses a secret that is sent over the wire,
102    and can be verified by both parties by comp    102    and can be verified by both parties by comparing with a value stored
103    in shared kernel memory if they are truly c    103    in shared kernel memory if they are truly co-located.
104                                                   104 
105 4. Does LOCALIO improve pNFS flexfiles?           105 4. Does LOCALIO improve pNFS flexfiles?
106                                                   106 
107    Yes, LOCALIO complements pNFS flexfiles by     107    Yes, LOCALIO complements pNFS flexfiles by allowing it to take
108    advantage of NFS client and server locality    108    advantage of NFS client and server locality.  Policy that initiates
109    client IO as closely to the server where th    109    client IO as closely to the server where the data is stored naturally
110    benefits from the data path optimization LO    110    benefits from the data path optimization LOCALIO provides.
111                                                   111 
112 5. Why not develop a new pNFS layout to enable    112 5. Why not develop a new pNFS layout to enable LOCALIO?
113                                                   113 
114    A new pNFS layout could be developed, but d    114    A new pNFS layout could be developed, but doing so would put the
115    onus on the server to somehow discover that    115    onus on the server to somehow discover that the client is co-located
116    when deciding to hand out the layout.          116    when deciding to hand out the layout.
117    There is value in a simpler approach (as pr    117    There is value in a simpler approach (as provided by LOCALIO) that
118    allows the NFS client to negotiate and leve    118    allows the NFS client to negotiate and leverage locality without
119    requiring more elaborate modeling and disco    119    requiring more elaborate modeling and discovery of such locality in a
120    more centralized manner.                       120    more centralized manner.
121                                                   121 
122 6. Why is having the client perform a server-s    122 6. Why is having the client perform a server-side file OPEN, without
123    using RPC, beneficial?  Is the benefit pNFS    123    using RPC, beneficial?  Is the benefit pNFS specific?
124                                                   124 
125    Avoiding the use of XDR and RPC for file op    125    Avoiding the use of XDR and RPC for file opens is beneficial to
126    performance regardless of whether pNFS is u    126    performance regardless of whether pNFS is used. Especially when
127    dealing with small files its best to avoid     127    dealing with small files its best to avoid going over the wire
128    whenever possible, otherwise it could reduc    128    whenever possible, otherwise it could reduce or even negate the
129    benefits of avoiding the wire for doing the    129    benefits of avoiding the wire for doing the small file I/O itself.
130    Given LOCALIO's requirements the current ap    130    Given LOCALIO's requirements the current approach of having the
131    client perform a server-side file open, wit    131    client perform a server-side file open, without using RPC, is ideal.
132    If in the future requirements change then w    132    If in the future requirements change then we can adapt accordingly.
133                                                   133 
134 7. Why is LOCALIO only supported with UNIX Aut    134 7. Why is LOCALIO only supported with UNIX Authentication (AUTH_UNIX)?
135                                                   135 
136    Strong authentication is usually tied to th    136    Strong authentication is usually tied to the connection itself. It
137    works by establishing a context that is cac    137    works by establishing a context that is cached by the server, and
138    that acts as the key for discovering the au    138    that acts as the key for discovering the authorisation token, which
139    can then be passed to rpc.mountd to complet    139    can then be passed to rpc.mountd to complete the authentication
140    process. On the other hand, in the case of     140    process. On the other hand, in the case of AUTH_UNIX, the credential
141    that was passed over the wire is used direc    141    that was passed over the wire is used directly as the key in the
142    upcall to rpc.mountd. This simplifies the a    142    upcall to rpc.mountd. This simplifies the authentication process, and
143    so makes AUTH_UNIX easier to support.          143    so makes AUTH_UNIX easier to support.
144                                                   144 
145 8. How do export options that translate RPC us    145 8. How do export options that translate RPC user IDs behave for LOCALIO
146    operations (eg. root_squash, all_squash)?      146    operations (eg. root_squash, all_squash)?
147                                                   147 
148    Export options that translate user IDs are     148    Export options that translate user IDs are managed by nfsd_setuser()
149    which is called by nfsd_setuser_and_check_p    149    which is called by nfsd_setuser_and_check_port() which is called by
150    __fh_verify().  So they get handled exactly    150    __fh_verify().  So they get handled exactly the same way for LOCALIO
151    as they do for non-LOCALIO.                    151    as they do for non-LOCALIO.
152                                                   152 
153 9. How does LOCALIO make certain that object l    153 9. How does LOCALIO make certain that object lifetimes are managed
154    properly given NFSD and NFS operate in diff    154    properly given NFSD and NFS operate in different contexts?
155                                                   155 
156    See the detailed "NFS Client and Server Int    156    See the detailed "NFS Client and Server Interlock" section below.
157                                                   157 
158 RPC                                               158 RPC
159 ===                                               159 ===
160                                                   160 
161 The LOCALIO auxiliary RPC protocol consists of    161 The LOCALIO auxiliary RPC protocol consists of a single "UUID_IS_LOCAL"
162 RPC method that allows the Linux NFS client to    162 RPC method that allows the Linux NFS client to verify the local Linux
163 NFS server can see the nonce (single-use UUID)    163 NFS server can see the nonce (single-use UUID) the client generated and
164 made available in nfs_common. This protocol is    164 made available in nfs_common. This protocol isn't part of an IETF
165 standard, nor does it need to be considering i    165 standard, nor does it need to be considering it is Linux-to-Linux
166 auxiliary RPC protocol that amounts to an impl    166 auxiliary RPC protocol that amounts to an implementation detail.
167                                                   167 
168 The UUID_IS_LOCAL method encodes the client ge    168 The UUID_IS_LOCAL method encodes the client generated uuid_t in terms of
169 the fixed UUID_SIZE (16 bytes). The fixed size    169 the fixed UUID_SIZE (16 bytes). The fixed size opaque encode and decode
170 XDR methods are used instead of the less effic    170 XDR methods are used instead of the less efficient variable sized
171 methods.                                          171 methods.
172                                                   172 
173 The RPC program number for the NFS_LOCALIO_PRO    173 The RPC program number for the NFS_LOCALIO_PROGRAM is 400122 (as assigned
174 by IANA, see https://www.iana.org/assignments/    174 by IANA, see https://www.iana.org/assignments/rpc-program-numbers/ ):
175 Linux Kernel Organization       400122  nfsloc    175 Linux Kernel Organization       400122  nfslocalio
176                                                   176 
177 The LOCALIO protocol spec in rpcgen syntax is:    177 The LOCALIO protocol spec in rpcgen syntax is::
178                                                   178 
179   /* raw RFC 9562 UUID */                         179   /* raw RFC 9562 UUID */
180   #define UUID_SIZE 16                            180   #define UUID_SIZE 16
181   typedef u8 uuid_t<UUID_SIZE>;                   181   typedef u8 uuid_t<UUID_SIZE>;
182                                                   182 
183   program NFS_LOCALIO_PROGRAM {                   183   program NFS_LOCALIO_PROGRAM {
184       version LOCALIO_V1 {                        184       version LOCALIO_V1 {
185           void                                    185           void
186               NULL(void) = 0;                     186               NULL(void) = 0;
187                                                   187 
188           void                                    188           void
189               UUID_IS_LOCAL(uuid_t) = 1;          189               UUID_IS_LOCAL(uuid_t) = 1;
190       } = 1;                                      190       } = 1;
191   } = 400122;                                     191   } = 400122;
192                                                   192 
193 LOCALIO uses the same transport connection as     193 LOCALIO uses the same transport connection as NFS traffic. As such,
194 LOCALIO is not registered with rpcbind.           194 LOCALIO is not registered with rpcbind.
195                                                   195 
196 NFS Common and Client/Server Handshake            196 NFS Common and Client/Server Handshake
197 ======================================            197 ======================================
198                                                   198 
199 fs/nfs_common/nfslocalio.c provides interfaces    199 fs/nfs_common/nfslocalio.c provides interfaces that enable an NFS client
200 to generate a nonce (single-use UUID) and asso    200 to generate a nonce (single-use UUID) and associated short-lived
201 nfs_uuid_t struct, register it with nfs_common    201 nfs_uuid_t struct, register it with nfs_common for subsequent lookup and
202 verification by the NFS server and if matched     202 verification by the NFS server and if matched the NFS server populates
203 members in the nfs_uuid_t struct. The NFS clie    203 members in the nfs_uuid_t struct. The NFS client then uses nfs_common to
204 transfer the nfs_uuid_t from its nfs_uuids to     204 transfer the nfs_uuid_t from its nfs_uuids to the nn->nfsd_serv
205 clients_list from the nfs_common's uuids_list.    205 clients_list from the nfs_common's uuids_list.  See:
206 fs/nfs/localio.c:nfs_local_probe()                206 fs/nfs/localio.c:nfs_local_probe()
207                                                   207 
208 nfs_common's nfs_uuids list is the basis for L    208 nfs_common's nfs_uuids list is the basis for LOCALIO enablement, as such
209 it has members that point to nfsd memory for d    209 it has members that point to nfsd memory for direct use by the client
210 (e.g. 'net' is the server's network namespace,    210 (e.g. 'net' is the server's network namespace, through it the client can
211 access nn->nfsd_serv with proper rcu read acce    211 access nn->nfsd_serv with proper rcu read access). It is this client
212 and server synchronization that enables advanc    212 and server synchronization that enables advanced usage and lifetime of
213 objects to span from the host kernel's nfsd to    213 objects to span from the host kernel's nfsd to per-container knfsd
214 instances that are connected to nfs client's r    214 instances that are connected to nfs client's running on the same local
215 host.                                             215 host.
216                                                   216 
217 NFS Client and Server Interlock                   217 NFS Client and Server Interlock
218 ===============================                   218 ===============================
219                                                   219 
220 LOCALIO provides the nfs_uuid_t object and ass    220 LOCALIO provides the nfs_uuid_t object and associated interfaces to
221 allow proper network namespace (net-ns) and NF    221 allow proper network namespace (net-ns) and NFSD object refcounting:
222                                                   222 
223     We don't want to keep a long-term counted     223     We don't want to keep a long-term counted reference on each NFSD's
224     net-ns in the client because that prevents    224     net-ns in the client because that prevents a server container from
225     completely shutting down.                     225     completely shutting down.
226                                                   226 
227     So we avoid taking a reference at all and     227     So we avoid taking a reference at all and rely on the per-cpu
228     reference to the server (detailed below) b    228     reference to the server (detailed below) being sufficient to keep
229     the net-ns active. This involves allowing     229     the net-ns active. This involves allowing the NFSD's net-ns exit
230     code to iterate all active clients and cle    230     code to iterate all active clients and clear their ->net pointers
231     (which are needed to find the per-cpu-refc    231     (which are needed to find the per-cpu-refcount for the nfsd_serv).
232                                                   232 
233     Details:                                      233     Details:
234                                                   234 
235      - Embed nfs_uuid_t in nfs_client. nfs_uui    235      - Embed nfs_uuid_t in nfs_client. nfs_uuid_t provides a list_head
236        that can be used to find the client. It    236        that can be used to find the client. It does add the 16-byte
237        uuid_t to nfs_client so it is bigger th    237        uuid_t to nfs_client so it is bigger than needed (given that
238        uuid_t is only used during the initial     238        uuid_t is only used during the initial NFS client and server
239        LOCALIO handshake to determine if they     239        LOCALIO handshake to determine if they are local to each other).
240        If that is really a problem we can find    240        If that is really a problem we can find a fix.
241                                                   241 
242      - When the nfs server confirms that the u    242      - When the nfs server confirms that the uuid_t is local, it moves
243        the nfs_uuid_t onto a per-net-ns list i    243        the nfs_uuid_t onto a per-net-ns list in NFSD's nfsd_net.
244                                                   244 
245      - When each server's net-ns is shutting d    245      - When each server's net-ns is shutting down - in a "pre_exit"
246        handler, all these nfs_uuid_t have thei    246        handler, all these nfs_uuid_t have their ->net cleared. There is
247        an rcu_synchronize() call between pre_e    247        an rcu_synchronize() call between pre_exit() handlers and exit()
248        handlers so any caller that sees nfs_uu    248        handlers so any caller that sees nfs_uuid_t ->net as not NULL can
249        safely manage the per-cpu-refcount for     249        safely manage the per-cpu-refcount for nfsd_serv.
250                                                   250 
251      - The client's nfs_uuid_t is passed to nf    251      - The client's nfs_uuid_t is passed to nfsd_open_local_fh() so it
252        can safely dereference ->net in a priva    252        can safely dereference ->net in a private rcu_read_lock() section
253        to allow safe access to the associated     253        to allow safe access to the associated nfsd_net and nfsd_serv.
254                                                   254 
255 So LOCALIO required the introduction and use o    255 So LOCALIO required the introduction and use of NFSD's percpu_ref to
256 interlock nfsd_destroy_serv() and nfsd_open_lo    256 interlock nfsd_destroy_serv() and nfsd_open_local_fh(), to ensure each
257 nn->nfsd_serv is not destroyed while in use by    257 nn->nfsd_serv is not destroyed while in use by nfsd_open_local_fh(), and
258 warrants a more detailed explanation:             258 warrants a more detailed explanation:
259                                                   259 
260     nfsd_open_local_fh() uses nfsd_serv_try_ge    260     nfsd_open_local_fh() uses nfsd_serv_try_get() before opening its
261     nfsd_file handle and then the caller (NFS     261     nfsd_file handle and then the caller (NFS client) must drop the
262     reference for the nfsd_file and associated    262     reference for the nfsd_file and associated nn->nfsd_serv using
263     nfs_file_put_local() once it has completed    263     nfs_file_put_local() once it has completed its IO.
264                                                   264 
265     This interlock working relies heavily on n    265     This interlock working relies heavily on nfsd_open_local_fh() being
266     afforded the ability to safely deal with t    266     afforded the ability to safely deal with the possibility that the
267     NFSD's net-ns (and nfsd_net by association    267     NFSD's net-ns (and nfsd_net by association) may have been destroyed
268     by nfsd_destroy_serv() via nfsd_shutdown_n    268     by nfsd_destroy_serv() via nfsd_shutdown_net() -- which is only
269     possible given the nfs_uuid_t ->net pointe    269     possible given the nfs_uuid_t ->net pointer managemenet detailed
270     above.                                        270     above.
271                                                   271 
272 All told, this elaborate interlock of the NFS     272 All told, this elaborate interlock of the NFS client and server has been
273 verified to fix an easy to hit crash that woul    273 verified to fix an easy to hit crash that would occur if an NFSD
274 instance running in a container, with a LOCALI    274 instance running in a container, with a LOCALIO client mounted, is
275 shutdown. Upon restart of the container and as    275 shutdown. Upon restart of the container and associated NFSD the client
276 would go on to crash due to NULL pointer deref    276 would go on to crash due to NULL pointer dereference that occurred due
277 to the LOCALIO client's attempting to nfsd_ope    277 to the LOCALIO client's attempting to nfsd_open_local_fh(), using
278 nn->nfsd_serv, without having a proper referen    278 nn->nfsd_serv, without having a proper reference on nn->nfsd_serv.
279                                                   279 
280 NFS Client issues IO instead of Server            280 NFS Client issues IO instead of Server
281 ======================================            281 ======================================
282                                                   282 
283 Because LOCALIO is focused on protocol bypass     283 Because LOCALIO is focused on protocol bypass to achieve improved IO
284 performance, alternatives to the traditional N    284 performance, alternatives to the traditional NFS wire protocol (SUNRPC
285 with XDR) must be provided to access the backi    285 with XDR) must be provided to access the backing filesystem.
286                                                   286 
287 See fs/nfs/localio.c:nfs_local_open_fh() and      287 See fs/nfs/localio.c:nfs_local_open_fh() and
288 fs/nfsd/localio.c:nfsd_open_local_fh() for the    288 fs/nfsd/localio.c:nfsd_open_local_fh() for the interface that makes
289 focused use of select nfs server objects to al    289 focused use of select nfs server objects to allow a client local to a
290 server to open a file pointer without needing     290 server to open a file pointer without needing to go over the network.
291                                                   291 
292 The client's fs/nfs/localio.c:nfs_local_open_f    292 The client's fs/nfs/localio.c:nfs_local_open_fh() will call into the
293 server's fs/nfsd/localio.c:nfsd_open_local_fh(    293 server's fs/nfsd/localio.c:nfsd_open_local_fh() and carefully access
294 both the associated nfsd network namespace and    294 both the associated nfsd network namespace and nn->nfsd_serv in terms of
295 RCU. If nfsd_open_local_fh() finds that the cl    295 RCU. If nfsd_open_local_fh() finds that the client no longer sees valid
296 nfsd objects (be it struct net or nn->nfsd_ser    296 nfsd objects (be it struct net or nn->nfsd_serv) it returns -ENXIO
297 to nfs_local_open_fh() and the client will try    297 to nfs_local_open_fh() and the client will try to reestablish the
298 LOCALIO resources needed by calling nfs_local_    298 LOCALIO resources needed by calling nfs_local_probe() again. This
299 recovery is needed if/when an nfsd instance ru    299 recovery is needed if/when an nfsd instance running in a container were
300 to reboot while a LOCALIO client is connected     300 to reboot while a LOCALIO client is connected to it.
301                                                   301 
302 Once the client has an open nfsd_file pointer     302 Once the client has an open nfsd_file pointer it will issue reads,
303 writes and commits directly to the underlying     303 writes and commits directly to the underlying local filesystem (normally
304 done by the nfs server). As such, for these op    304 done by the nfs server). As such, for these operations, the NFS client
305 is issuing IO to the underlying local filesyst    305 is issuing IO to the underlying local filesystem that it is sharing with
306 the NFS server. See: fs/nfs/localio.c:nfs_loca    306 the NFS server. See: fs/nfs/localio.c:nfs_local_doio() and
307 fs/nfs/localio.c:nfs_local_commit().              307 fs/nfs/localio.c:nfs_local_commit().
308                                                   308 
309 Security                                          309 Security
310 ========                                          310 ========
311                                                   311 
312 Localio is only supported when UNIX-style auth    312 Localio is only supported when UNIX-style authentication (AUTH_UNIX, aka
313 AUTH_SYS) is used.                                313 AUTH_SYS) is used.
314                                                   314 
315 Care is taken to ensure the same NFS security     315 Care is taken to ensure the same NFS security mechanisms are used
316 (authentication, etc) regardless of whether LO    316 (authentication, etc) regardless of whether LOCALIO or regular NFS
317 access is used. The auth_domain established as    317 access is used. The auth_domain established as part of the traditional
318 NFS client access to the NFS server is also us    318 NFS client access to the NFS server is also used for LOCALIO.
319                                                   319 
320 Relative to containers, LOCALIO gives the clie    320 Relative to containers, LOCALIO gives the client access to the network
321 namespace the server has. This is required to     321 namespace the server has. This is required to allow the client to access
322 the server's per-namespace nfsd_net struct. Wi    322 the server's per-namespace nfsd_net struct. With traditional NFS, the
323 client is afforded this same level of access (    323 client is afforded this same level of access (albeit in terms of the NFS
324 protocol via SUNRPC). No other namespaces (use    324 protocol via SUNRPC). No other namespaces (user, mount, etc) have been
325 altered or purposely extended from the server     325 altered or purposely extended from the server to the client.
326                                                   326 
327 Testing                                           327 Testing
328 =======                                           328 =======
329                                                   329 
330 The LOCALIO auxiliary protocol and associated     330 The LOCALIO auxiliary protocol and associated NFS LOCALIO read, write
331 and commit access have proven stable against v    331 and commit access have proven stable against various test scenarios:
332                                                   332 
333 - Client and server both on the same host.        333 - Client and server both on the same host.
334                                                   334 
335 - All permutations of client and server suppor    335 - All permutations of client and server support enablement for both
336   local and remote client and server.             336   local and remote client and server.
337                                                   337 
338 - Testing against NFS storage products that do    338 - Testing against NFS storage products that don't support the LOCALIO
339   protocol was also performed.                    339   protocol was also performed.
340                                                   340 
341 - Client on host, server within a container (f    341 - Client on host, server within a container (for both v3 and v4.2).
342   The container testing was in terms of podman    342   The container testing was in terms of podman managed containers and
343   includes successful container stop/restart s    343   includes successful container stop/restart scenario.
344                                                   344 
345 - Formalizing these test scenarios in terms of    345 - Formalizing these test scenarios in terms of existing test
346   infrastructure is on-going. Initial regular     346   infrastructure is on-going. Initial regular coverage is provided in
347   terms of ktest running xfstests against a LO    347   terms of ktest running xfstests against a LOCALIO-enabled NFS loopback
348   mount configuration, and includes lockdep an    348   mount configuration, and includes lockdep and KASAN coverage, see:
349   https://evilpiepirate.org/~testdashboard/ci?    349   https://evilpiepirate.org/~testdashboard/ci?user=snitzer&branch=snitm-nfs-next
350   https://github.com/koverstreet/ktest            350   https://github.com/koverstreet/ktest
351                                                   351 
352 - Various kdevops testing (in terms of "Chuck'    352 - Various kdevops testing (in terms of "Chuck's BuildBot") has been
353   performed to regularly verify the LOCALIO ch    353   performed to regularly verify the LOCALIO changes haven't caused any
354   regressions to non-LOCALIO NFS use cases.       354   regressions to non-LOCALIO NFS use cases.
355                                                   355 
356 - All of Hammerspace's various sanity tests pa    356 - All of Hammerspace's various sanity tests pass with LOCALIO enabled
357   (this includes numerous pNFS and flexfiles t    357   (this includes numerous pNFS and flexfiles tests).
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php