~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/filesystems/netfs_library.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 .. SPDX-License-Identifier: GPL-2.0
  2 
  3 =================================
  4 Network Filesystem Helper Library
  5 =================================
  6 
  7 .. Contents:
  8 
  9  - Overview.
 10  - Per-inode context.
 11    - Inode context helper functions.
 12  - Buffered read helpers.
 13    - Read helper functions.
 14    - Read helper structures.
 15    - Read helper operations.
 16    - Read helper procedure.
 17    - Read helper cache API.
 18 
 19 
 20 Overview
 21 ========
 22 
 23 The network filesystem helper library is a set of functions designed to aid a
 24 network filesystem in implementing VM/VFS operations.  For the moment, that
 25 just includes turning various VM buffered read operations into requests to read
 26 from the server.  The helper library, however, can also interpose other
 27 services, such as local caching or local data encryption.
 28 
 29 Note that the library module doesn't link against local caching directly, so
 30 access must be provided by the netfs.
 31 
 32 
 33 Per-Inode Context
 34 =================
 35 
 36 The network filesystem helper library needs a place to store a bit of state for
 37 its use on each netfs inode it is helping to manage.  To this end, a context
 38 structure is defined::
 39 
 40         struct netfs_inode {
 41                 struct inode inode;
 42                 const struct netfs_request_ops *ops;
 43                 struct fscache_cookie *cache;
 44         };
 45 
 46 A network filesystem that wants to use netfs lib must place one of these in its
 47 inode wrapper struct instead of the VFS ``struct inode``.  This can be done in
 48 a way similar to the following::
 49 
 50         struct my_inode {
 51                 struct netfs_inode netfs; /* Netfslib context and vfs inode */
 52                 ...
 53         };
 54 
 55 This allows netfslib to find its state by using ``container_of()`` from the
 56 inode pointer, thereby allowing the netfslib helper functions to be pointed to
 57 directly by the VFS/VM operation tables.
 58 
 59 The structure contains the following fields:
 60 
 61  * ``inode``
 62 
 63    The VFS inode structure.
 64 
 65  * ``ops``
 66 
 67    The set of operations provided by the network filesystem to netfslib.
 68 
 69  * ``cache``
 70 
 71    Local caching cookie, or NULL if no caching is enabled.  This field does not
 72    exist if fscache is disabled.
 73 
 74 
 75 Inode Context Helper Functions
 76 ------------------------------
 77 
 78 To help deal with the per-inode context, a number helper functions are
 79 provided.  Firstly, a function to perform basic initialisation on a context and
 80 set the operations table pointer::
 81 
 82         void netfs_inode_init(struct netfs_inode *ctx,
 83                               const struct netfs_request_ops *ops);
 84 
 85 then a function to cast from the VFS inode structure to the netfs context::
 86 
 87         struct netfs_inode *netfs_node(struct inode *inode);
 88 
 89 and finally, a function to get the cache cookie pointer from the context
 90 attached to an inode (or NULL if fscache is disabled)::
 91 
 92         struct fscache_cookie *netfs_i_cookie(struct netfs_inode *ctx);
 93 
 94 
 95 Buffered Read Helpers
 96 =====================
 97 
 98 The library provides a set of read helpers that handle the ->read_folio(),
 99 ->readahead() and much of the ->write_begin() VM operations and translate them
100 into a common call framework.
101 
102 The following services are provided:
103 
104  * Handle folios that span multiple pages.
105 
106  * Insulate the netfs from VM interface changes.
107 
108  * Allow the netfs to arbitrarily split reads up into pieces, even ones that
109    don't match folio sizes or folio alignments and that may cross folios.
110 
111  * Allow the netfs to expand a readahead request in both directions to meet its
112    needs.
113 
114  * Allow the netfs to partially fulfil a read, which will then be resubmitted.
115 
116  * Handle local caching, allowing cached data and server-read data to be
117    interleaved for a single request.
118 
119  * Handle clearing of bufferage that aren't on the server.
120 
121  * Handle retrying of reads that failed, switching reads from the cache to the
122    server as necessary.
123 
124  * In the future, this is a place that other services can be performed, such as
125    local encryption of data to be stored remotely or in the cache.
126 
127 From the network filesystem, the helpers require a table of operations.  This
128 includes a mandatory method to issue a read operation along with a number of
129 optional methods.
130 
131 
132 Read Helper Functions
133 ---------------------
134 
135 Three read helpers are provided::
136 
137         void netfs_readahead(struct readahead_control *ractl);
138         int netfs_read_folio(struct file *file,
139                              struct folio *folio);
140         int netfs_write_begin(struct netfs_inode *ctx,
141                               struct file *file,
142                               struct address_space *mapping,
143                               loff_t pos,
144                               unsigned int len,
145                               struct folio **_folio,
146                               void **_fsdata);
147 
148 Each corresponds to a VM address space operation.  These operations use the
149 state in the per-inode context.
150 
151 For ->readahead() and ->read_folio(), the network filesystem just point directly
152 at the corresponding read helper; whereas for ->write_begin(), it may be a
153 little more complicated as the network filesystem might want to flush
154 conflicting writes or track dirty data and needs to put the acquired folio if
155 an error occurs after calling the helper.
156 
157 The helpers manage the read request, calling back into the network filesystem
158 through the supplied table of operations.  Waits will be performed as
159 necessary before returning for helpers that are meant to be synchronous.
160 
161 If an error occurs, the ->free_request() will be called to clean up the
162 netfs_io_request struct allocated.  If some parts of the request are in
163 progress when an error occurs, the request will get partially completed if
164 sufficient data is read.
165 
166 Additionally, there is::
167 
168   * void netfs_subreq_terminated(struct netfs_io_subrequest *subreq,
169                                  ssize_t transferred_or_error,
170                                  bool was_async);
171 
172 which should be called to complete a read subrequest.  This is given the number
173 of bytes transferred or a negative error code, plus a flag indicating whether
174 the operation was asynchronous (ie. whether the follow-on processing can be
175 done in the current context, given this may involve sleeping).
176 
177 
178 Read Helper Structures
179 ----------------------
180 
181 The read helpers make use of a couple of structures to maintain the state of
182 the read.  The first is a structure that manages a read request as a whole::
183 
184         struct netfs_io_request {
185                 struct inode            *inode;
186                 struct address_space    *mapping;
187                 struct netfs_cache_resources cache_resources;
188                 void                    *netfs_priv;
189                 loff_t                  start;
190                 size_t                  len;
191                 loff_t                  i_size;
192                 const struct netfs_request_ops *netfs_ops;
193                 unsigned int            debug_id;
194                 ...
195         };
196 
197 The above fields are the ones the netfs can use.  They are:
198 
199  * ``inode``
200  * ``mapping``
201 
202    The inode and the address space of the file being read from.  The mapping
203    may or may not point to inode->i_data.
204 
205  * ``cache_resources``
206 
207    Resources for the local cache to use, if present.
208 
209  * ``netfs_priv``
210 
211    The network filesystem's private data.  The value for this can be passed in
212    to the helper functions or set during the request.
213 
214  * ``start``
215  * ``len``
216 
217    The file position of the start of the read request and the length.  These
218    may be altered by the ->expand_readahead() op.
219 
220  * ``i_size``
221 
222    The size of the file at the start of the request.
223 
224  * ``netfs_ops``
225 
226    A pointer to the operation table.  The value for this is passed into the
227    helper functions.
228 
229  * ``debug_id``
230 
231    A number allocated to this operation that can be displayed in trace lines
232    for reference.
233 
234 
235 The second structure is used to manage individual slices of the overall read
236 request::
237 
238         struct netfs_io_subrequest {
239                 struct netfs_io_request *rreq;
240                 loff_t                  start;
241                 size_t                  len;
242                 size_t                  transferred;
243                 unsigned long           flags;
244                 unsigned short          debug_index;
245                 ...
246         };
247 
248 Each subrequest is expected to access a single source, though the helpers will
249 handle falling back from one source type to another.  The members are:
250 
251  * ``rreq``
252 
253    A pointer to the read request.
254 
255  * ``start``
256  * ``len``
257 
258    The file position of the start of this slice of the read request and the
259    length.
260 
261  * ``transferred``
262 
263    The amount of data transferred so far of the length of this slice.  The
264    network filesystem or cache should start the operation this far into the
265    slice.  If a short read occurs, the helpers will call again, having updated
266    this to reflect the amount read so far.
267 
268  * ``flags``
269 
270    Flags pertaining to the read.  There are two of interest to the filesystem
271    or cache:
272 
273    * ``NETFS_SREQ_CLEAR_TAIL``
274 
275      This can be set to indicate that the remainder of the slice, from
276      transferred to len, should be cleared.
277 
278    * ``NETFS_SREQ_SEEK_DATA_READ``
279 
280      This is a hint to the cache that it might want to try skipping ahead to
281      the next data (ie. using SEEK_DATA).
282 
283  * ``debug_index``
284 
285    A number allocated to this slice that can be displayed in trace lines for
286    reference.
287 
288 
289 Read Helper Operations
290 ----------------------
291 
292 The network filesystem must provide the read helpers with a table of operations
293 through which it can issue requests and negotiate::
294 
295         struct netfs_request_ops {
296                 void (*init_request)(struct netfs_io_request *rreq, struct file *file);
297                 void (*free_request)(struct netfs_io_request *rreq);
298                 void (*expand_readahead)(struct netfs_io_request *rreq);
299                 bool (*clamp_length)(struct netfs_io_subrequest *subreq);
300                 void (*issue_read)(struct netfs_io_subrequest *subreq);
301                 bool (*is_still_valid)(struct netfs_io_request *rreq);
302                 int (*check_write_begin)(struct file *file, loff_t pos, unsigned len,
303                                          struct folio **foliop, void **_fsdata);
304                 void (*done)(struct netfs_io_request *rreq);
305         };
306 
307 The operations are as follows:
308 
309  * ``init_request()``
310 
311    [Optional] This is called to initialise the request structure.  It is given
312    the file for reference.
313 
314  * ``free_request()``
315 
316    [Optional] This is called as the request is being deallocated so that the
317    filesystem can clean up any state it has attached there.
318 
319  * ``expand_readahead()``
320 
321    [Optional] This is called to allow the filesystem to expand the size of a
322    readahead read request.  The filesystem gets to expand the request in both
323    directions, though it's not permitted to reduce it as the numbers may
324    represent an allocation already made.  If local caching is enabled, it gets
325    to expand the request first.
326 
327    Expansion is communicated by changing ->start and ->len in the request
328    structure.  Note that if any change is made, ->len must be increased by at
329    least as much as ->start is reduced.
330 
331  * ``clamp_length()``
332 
333    [Optional] This is called to allow the filesystem to reduce the size of a
334    subrequest.  The filesystem can use this, for example, to chop up a request
335    that has to be split across multiple servers or to put multiple reads in
336    flight.
337 
338    This should return 0 on success and an error code on error.
339 
340  * ``issue_read()``
341 
342    [Required] The helpers use this to dispatch a subrequest to the server for
343    reading.  In the subrequest, ->start, ->len and ->transferred indicate what
344    data should be read from the server.
345 
346    There is no return value; the netfs_subreq_terminated() function should be
347    called to indicate whether or not the operation succeeded and how much data
348    it transferred.  The filesystem also should not deal with setting folios
349    uptodate, unlocking them or dropping their refs - the helpers need to deal
350    with this as they have to coordinate with copying to the local cache.
351 
352    Note that the helpers have the folios locked, but not pinned.  It is
353    possible to use the ITER_XARRAY iov iterator to refer to the range of the
354    inode that is being operated upon without the need to allocate large bvec
355    tables.
356 
357  * ``is_still_valid()``
358 
359    [Optional] This is called to find out if the data just read from the local
360    cache is still valid.  It should return true if it is still valid and false
361    if not.  If it's not still valid, it will be reread from the server.
362 
363  * ``check_write_begin()``
364 
365    [Optional] This is called from the netfs_write_begin() helper once it has
366    allocated/grabbed the folio to be modified to allow the filesystem to flush
367    conflicting state before allowing it to be modified.
368 
369    It may unlock and discard the folio it was given and set the caller's folio
370    pointer to NULL.  It should return 0 if everything is now fine (``*foliop``
371    left set) or the op should be retried (``*foliop`` cleared) and any other
372    error code to abort the operation.
373 
374  * ``done``
375 
376    [Optional] This is called after the folios in the request have all been
377    unlocked (and marked uptodate if applicable).
378 
379 
380 
381 Read Helper Procedure
382 ---------------------
383 
384 The read helpers work by the following general procedure:
385 
386  * Set up the request.
387 
388  * For readahead, allow the local cache and then the network filesystem to
389    propose expansions to the read request.  This is then proposed to the VM.
390    If the VM cannot fully perform the expansion, a partially expanded read will
391    be performed, though this may not get written to the cache in its entirety.
392 
393  * Loop around slicing chunks off of the request to form subrequests:
394 
395    * If a local cache is present, it gets to do the slicing, otherwise the
396      helpers just try to generate maximal slices.
397 
398    * The network filesystem gets to clamp the size of each slice if it is to be
399      the source.  This allows rsize and chunking to be implemented.
400 
401    * The helpers issue a read from the cache or a read from the server or just
402      clears the slice as appropriate.
403 
404    * The next slice begins at the end of the last one.
405 
406    * As slices finish being read, they terminate.
407 
408  * When all the subrequests have terminated, the subrequests are assessed and
409    any that are short or have failed are reissued:
410 
411    * Failed cache requests are issued against the server instead.
412 
413    * Failed server requests just fail.
414 
415    * Short reads against either source will be reissued against that source
416      provided they have transferred some more data:
417 
418      * The cache may need to skip holes that it can't do DIO from.
419 
420      * If NETFS_SREQ_CLEAR_TAIL was set, a short read will be cleared to the
421        end of the slice instead of reissuing.
422 
423  * Once the data is read, the folios that have been fully read/cleared:
424 
425    * Will be marked uptodate.
426 
427    * If a cache is present, will be marked with PG_fscache.
428 
429    * Unlocked
430 
431  * Any folios that need writing to the cache will then have DIO writes issued.
432 
433  * Synchronous operations will wait for reading to be complete.
434 
435  * Writes to the cache will proceed asynchronously and the folios will have the
436    PG_fscache mark removed when that completes.
437 
438  * The request structures will be cleaned up when everything has completed.
439 
440 
441 Read Helper Cache API
442 ---------------------
443 
444 When implementing a local cache to be used by the read helpers, two things are
445 required: some way for the network filesystem to initialise the caching for a
446 read request and a table of operations for the helpers to call.
447 
448 To begin a cache operation on an fscache object, the following function is
449 called::
450 
451         int fscache_begin_read_operation(struct netfs_io_request *rreq,
452                                          struct fscache_cookie *cookie);
453 
454 passing in the request pointer and the cookie corresponding to the file.  This
455 fills in the cache resources mentioned below.
456 
457 The netfs_io_request object contains a place for the cache to hang its
458 state::
459 
460         struct netfs_cache_resources {
461                 const struct netfs_cache_ops    *ops;
462                 void                            *cache_priv;
463                 void                            *cache_priv2;
464         };
465 
466 This contains an operations table pointer and two private pointers.  The
467 operation table looks like the following::
468 
469         struct netfs_cache_ops {
470                 void (*end_operation)(struct netfs_cache_resources *cres);
471 
472                 void (*expand_readahead)(struct netfs_cache_resources *cres,
473                                          loff_t *_start, size_t *_len, loff_t i_size);
474 
475                 enum netfs_io_source (*prepare_read)(struct netfs_io_subrequest *subreq,
476                                                        loff_t i_size);
477 
478                 int (*read)(struct netfs_cache_resources *cres,
479                             loff_t start_pos,
480                             struct iov_iter *iter,
481                             bool seek_data,
482                             netfs_io_terminated_t term_func,
483                             void *term_func_priv);
484 
485                 int (*prepare_write)(struct netfs_cache_resources *cres,
486                                      loff_t *_start, size_t *_len, loff_t i_size,
487                                      bool no_space_allocated_yet);
488 
489                 int (*write)(struct netfs_cache_resources *cres,
490                              loff_t start_pos,
491                              struct iov_iter *iter,
492                              netfs_io_terminated_t term_func,
493                              void *term_func_priv);
494 
495                 int (*query_occupancy)(struct netfs_cache_resources *cres,
496                                        loff_t start, size_t len, size_t granularity,
497                                        loff_t *_data_start, size_t *_data_len);
498         };
499 
500 With a termination handler function pointer::
501 
502         typedef void (*netfs_io_terminated_t)(void *priv,
503                                               ssize_t transferred_or_error,
504                                               bool was_async);
505 
506 The methods defined in the table are:
507 
508  * ``end_operation()``
509 
510    [Required] Called to clean up the resources at the end of the read request.
511 
512  * ``expand_readahead()``
513 
514    [Optional] Called at the beginning of a netfs_readahead() operation to allow
515    the cache to expand a request in either direction.  This allows the cache to
516    size the request appropriately for the cache granularity.
517 
518    The function is passed poiners to the start and length in its parameters,
519    plus the size of the file for reference, and adjusts the start and length
520    appropriately.  It should return one of:
521 
522    * ``NETFS_FILL_WITH_ZEROES``
523    * ``NETFS_DOWNLOAD_FROM_SERVER``
524    * ``NETFS_READ_FROM_CACHE``
525    * ``NETFS_INVALID_READ``
526 
527    to indicate whether the slice should just be cleared or whether it should be
528    downloaded from the server or read from the cache - or whether slicing
529    should be given up at the current point.
530 
531  * ``prepare_read()``
532 
533    [Required] Called to configure the next slice of a request.  ->start and
534    ->len in the subrequest indicate where and how big the next slice can be;
535    the cache gets to reduce the length to match its granularity requirements.
536 
537  * ``read()``
538 
539    [Required] Called to read from the cache.  The start file offset is given
540    along with an iterator to read to, which gives the length also.  It can be
541    given a hint requesting that it seek forward from that start position for
542    data.
543 
544    Also provided is a pointer to a termination handler function and private
545    data to pass to that function.  The termination function should be called
546    with the number of bytes transferred or an error code, plus a flag
547    indicating whether the termination is definitely happening in the caller's
548    context.
549 
550  * ``prepare_write()``
551 
552    [Required] Called to prepare a write to the cache to take place.  This
553    involves checking to see whether the cache has sufficient space to honour
554    the write.  ``*_start`` and ``*_len`` indicate the region to be written; the
555    region can be shrunk or it can be expanded to a page boundary either way as
556    necessary to align for direct I/O.  i_size holds the size of the object and
557    is provided for reference.  no_space_allocated_yet is set to true if the
558    caller is certain that no data has been written to that region - for example
559    if it tried to do a read from there already.
560 
561  * ``write()``
562 
563    [Required] Called to write to the cache.  The start file offset is given
564    along with an iterator to write from, which gives the length also.
565 
566    Also provided is a pointer to a termination handler function and private
567    data to pass to that function.  The termination function should be called
568    with the number of bytes transferred or an error code, plus a flag
569    indicating whether the termination is definitely happening in the caller's
570    context.
571 
572  * ``query_occupancy()``
573 
574    [Required] Called to find out where the next piece of data is within a
575    particular region of the cache.  The start and length of the region to be
576    queried are passed in, along with the granularity to which the answer needs
577    to be aligned.  The function passes back the start and length of the data,
578    if any, available within that region.  Note that there may be a hole at the
579    front.
580 
581    It returns 0 if some data was found, -ENODATA if there was no usable data
582    within the region or -ENOBUFS if there is no caching on this file.
583 
584 Note that these methods are passed a pointer to the cache resource structure,
585 not the read request structure as they could be used in other situations where
586 there isn't a read request structure as well, such as writing dirty data to the
587 cache.
588 
589 
590 API Function Reference
591 ======================
592 
593 .. kernel-doc:: include/linux/netfs.h
594 .. kernel-doc:: fs/netfs/buffered_read.c
595 .. kernel-doc:: fs/netfs/io.c

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php