~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/filesystems/vfs.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/filesystems/vfs.rst (Version linux-6.12-rc7) and /Documentation/filesystems/vfs.rst (Version linux-4.16.18)


  1 .. SPDX-License-Identifier: GPL-2.0               
  2                                                   
  3 =========================================         
  4 Overview of the Linux Virtual File System         
  5 =========================================         
  6                                                   
  7 Original author: Richard Gooch <rgooch@atnf.csi    
  8                                                   
  9 - Copyright (C) 1999 Richard Gooch                
 10 - Copyright (C) 2005 Pekka Enberg                 
 11                                                   
 12                                                   
 13 Introduction                                      
 14 ============                                      
 15                                                   
 16 The Virtual File System (also known as the Vir    
 17 the software layer in the kernel that provides    
 18 to userspace programs.  It also provides an ab    
 19 kernel which allows different filesystem imple    
 20                                                   
 21 VFS system calls open(2), stat(2), read(2), wr    
 22 are called from a process context.  Filesystem    
 23 the document Documentation/filesystems/locking    
 24                                                   
 25                                                   
 26 Directory Entry Cache (dcache)                    
 27 ------------------------------                    
 28                                                   
 29 The VFS implements the open(2), stat(2), chmod    
 30 calls.  The pathname argument that is passed t    
 31 to search through the directory entry cache (a    
 32 cache or dcache).  This provides a very fast l    
 33 translate a pathname (filename) into a specifi    
 34 in RAM and are never saved to disc: they exist    
 35                                                   
 36 The dentry cache is meant to be a view into yo    
 37 most computers cannot fit all dentries in the     
 38 bits of the cache are missing.  In order to re    
 39 dentry, the VFS may have to resort to creating    
 40 and then loading the inode.  This is done by l    
 41                                                   
 42                                                   
 43 The Inode Object                                  
 44 ----------------                                  
 45                                                   
 46 An individual dentry usually has a pointer to     
 47 filesystem objects such as regular files, dire    
 48 beasts.  They live either on the disc (for blo    
 49 in the memory (for pseudo filesystems).  Inode    
 50 are copied into the memory when required and c    
 51 written back to disc.  A single inode can be p    
 52 dentries (hard links, for example, do this).      
 53                                                   
 54 To look up an inode requires that the VFS call    
 55 the parent directory inode.  This method is in    
 56 filesystem implementation that the inode lives    
 57 required dentry (and hence the inode), we can     
 58 like open(2) the file, or stat(2) it to peek a    
 59 stat(2) operation is fairly simple: once the V    
 60 peeks at the inode data and passes some of it     
 61                                                   
 62                                                   
 63 The File Object                                   
 64 ---------------                                   
 65                                                   
 66 Opening a file requires another operation: all    
 67 structure (this is the kernel-side implementat    
 68 The freshly allocated file structure is initia    
 69 the dentry and a set of file operation member     
 70 taken from the inode data.  The open() file me    
 71 specific filesystem implementation can do its     
 72 this is another switch performed by the VFS.      
 73 placed into the file descriptor table for the     
 74                                                   
 75 Reading, writing and closing files (and other     
 76 is done by using the userspace file descriptor    
 77 file structure, and then calling the required     
 78 do whatever is required.  For as long as the f    
 79 dentry in use, which in turn means that the VF    
 80                                                   
 81                                                   
 82 Registering and Mounting a Filesystem             
 83 =====================================             
 84                                                   
 85 To register and unregister a filesystem, use t    
 86 functions:                                        
 87                                                   
 88 .. code-block:: c                                 
 89                                                   
 90         #include <linux/fs.h>                     
 91                                                   
 92         extern int register_filesystem(struct     
 93         extern int unregister_filesystem(struc    
 94                                                   
 95 The passed struct file_system_type describes y    
 96 request is made to mount a filesystem onto a d    
 97 namespace, the VFS will call the appropriate m    
 98 specific filesystem.  New vfsmount referring t    
 99 ->mount() will be attached to the mountpoint,     
100 resolution reaches the mountpoint it will jump    
101 vfsmount.                                         
102                                                   
103 You can see all filesystems that are registere    
104 file /proc/filesystems.                           
105                                                   
106                                                   
107 struct file_system_type                           
108 -----------------------                           
109                                                   
110 This describes the filesystem.  The following     
111 members are defined:                              
112                                                   
113 .. code-block:: c                                 
114                                                   
115         struct file_system_type {                 
116                 const char *name;                 
117                 int fs_flags;                     
118                 int (*init_fs_context)(struct     
119                 const struct fs_parameter_spec    
120                 struct dentry *(*mount) (struc    
121                         const char *, void *);    
122                 void (*kill_sb) (struct super_    
123                 struct module *owner;             
124                 struct file_system_type * next    
125                 struct hlist_head fs_supers;      
126                                                   
127                 struct lock_class_key s_lock_k    
128                 struct lock_class_key s_umount    
129                 struct lock_class_key s_vfs_re    
130                 struct lock_class_key s_writer    
131                                                   
132                 struct lock_class_key i_lock_k    
133                 struct lock_class_key i_mutex_    
134                 struct lock_class_key invalida    
135                 struct lock_class_key i_mutex_    
136         };                                        
137                                                   
138 ``name``                                          
139         the name of the filesystem type, such     
140         "msdos" and so on                         
141                                                   
142 ``fs_flags``                                      
143         various flags (i.e. FS_REQUIRES_DEV, F    
144                                                   
145 ``init_fs_context``                               
146         Initializes 'struct fs_context' ->ops     
147         filesystem-specific data.                 
148                                                   
149 ``parameters``                                    
150         Pointer to the array of filesystem par    
151         'struct fs_parameter_spec'.               
152         More info in Documentation/filesystems    
153                                                   
154 ``mount``                                         
155         the method to call when a new instance    
156         be mounted                                
157                                                   
158 ``kill_sb``                                       
159         the method to call when an instance of    
160         shut down                                 
161                                                   
162                                                   
163 ``owner``                                         
164         for internal VFS use: you should initi    
165         in most cases.                            
166                                                   
167 ``next``                                          
168         for internal VFS use: you should initi    
169                                                   
170 ``fs_supers``                                     
171         for internal VFS use: hlist of filesys    
172                                                   
173   s_lock_key, s_umount_key, s_vfs_rename_key,     
174   i_lock_key, i_mutex_key, invalidate_lock_key    
175                                                   
176 The mount() method has the following arguments    
177                                                   
178 ``struct file_system_type *fs_type``              
179         describes the filesystem, partly initi    
180         filesystem code                           
181                                                   
182 ``int flags``                                     
183         mount flags                               
184                                                   
185 ``const char *dev_name``                          
186         the device name we are mounting.          
187                                                   
188 ``void *data``                                    
189         arbitrary mount options, usually comes    
190         "Mount Options" section)                  
191                                                   
192 The mount() method must return the root dentry    
193 caller.  An active reference to its superblock    
194 superblock must be locked.  On failure it shou    
195                                                   
196 The arguments match those of mount(2) and thei    
197 on filesystem type.  E.g. for block filesystem    
198 as block device name, that device is opened an    
199 suitable filesystem image the method creates a    
200 super_block accordingly, returning its root de    
201                                                   
202 ->mount() may choose to return a subtree of ex    
203 doesn't have to create a new one.  The main re    
204 point of view is a reference to dentry at the     
205 attached; creation of new superblock is a comm    
206                                                   
207 The most interesting member of the superblock     
208 method fills in is the "s_op" field.  This is     
209 super_operations" which describes the next lev    
210 implementation.                                   
211                                                   
212 Usually, a filesystem uses one of the generic     
213 and provides a fill_super() callback instead.     
214                                                   
215 ``mount_bdev``                                    
216         mount a filesystem residing on a block    
217                                                   
218 ``mount_nodev``                                   
219         mount a filesystem that is not backed     
220                                                   
221 ``mount_single``                                  
222         mount a filesystem which shares the in    
223                                                   
224 A fill_super() callback implementation has the    
225                                                   
226 ``struct super_block *sb``                        
227         the superblock structure.  The callbac    
228         properly.                                 
229                                                   
230 ``void *data``                                    
231         arbitrary mount options, usually comes    
232         "Mount Options" section)                  
233                                                   
234 ``int silent``                                    
235         whether or not to be silent on error      
236                                                   
237                                                   
238 The Superblock Object                             
239 =====================                             
240                                                   
241 A superblock object represents a mounted files    
242                                                   
243                                                   
244 struct super_operations                           
245 -----------------------                           
246                                                   
247 This describes how the VFS can manipulate the     
248 filesystem.  The following members are defined    
249                                                   
250 .. code-block:: c                                 
251                                                   
252         struct super_operations {                 
253                 struct inode *(*alloc_inode)(s    
254                 void (*destroy_inode)(struct i    
255                 void (*free_inode)(struct inod    
256                                                   
257                 void (*dirty_inode) (struct in    
258                 int (*write_inode) (struct ino    
259                 int (*drop_inode) (struct inod    
260                 void (*evict_inode) (struct in    
261                 void (*put_super) (struct supe    
262                 int (*sync_fs)(struct super_bl    
263                 int (*freeze_super) (struct su    
264                                         enum f    
265                 int (*freeze_fs) (struct super    
266                 int (*thaw_super) (struct supe    
267                                         enum f    
268                 int (*unfreeze_fs) (struct sup    
269                 int (*statfs) (struct dentry *    
270                 int (*remount_fs) (struct supe    
271                 void (*umount_begin) (struct s    
272                                                   
273                 int (*show_options)(struct seq    
274                 int (*show_devname)(struct seq    
275                 int (*show_path)(struct seq_fi    
276                 int (*show_stats)(struct seq_f    
277                                                   
278                 ssize_t (*quota_read)(struct s    
279                 ssize_t (*quota_write)(struct     
280                 struct dquot **(*get_dquots)(s    
281                                                   
282                 long (*nr_cached_objects)(stru    
283                                         struct    
284                 long (*free_cached_objects)(st    
285                                         struct    
286         };                                        
287                                                   
288 All methods are called without any locks being    
289 noted.  This means that most methods can block    
290 only called from a process context (i.e. not f    
291 or bottom half).                                  
292                                                   
293 ``alloc_inode``                                   
294         this method is called by alloc_inode()    
295         struct inode and initialize it.  If th    
296         defined, a simple 'struct inode' is al    
297         alloc_inode will be used to allocate a    
298         contains a 'struct inode' embedded wit    
299                                                   
300 ``destroy_inode``                                 
301         this method is called by destroy_inode    
302         allocated for struct inode.  It is onl    
303         ->alloc_inode was defined and simply u    
304         ->alloc_inode.                            
305                                                   
306 ``free_inode``                                    
307         this method is called from RCU callbac    
308         in ->destroy_inode to free 'struct ino    
309         better to release memory in this metho    
310                                                   
311 ``dirty_inode``                                   
312         this method is called by the VFS when     
313         This is specifically for the inode its    
314         not its data.  If the update needs to     
315         then I_DIRTY_DATASYNC will be set in t    
316         I_DIRTY_TIME will be set in the flags     
317         and struct inode has times updated sin    
318         call.                                     
319                                                   
320 ``write_inode``                                   
321         this method is called when the VFS nee    
322         disc.  The second parameter indicates     
323         be synchronous or not, not all filesys    
324                                                   
325 ``drop_inode``                                    
326         called when the last access to the ino    
327         inode->i_lock spinlock held.              
328                                                   
329         This method should be either NULL (nor    
330         semantics) or "generic_delete_inode" (    
331         not want to cache inodes - causing "de    
332         called regardless of the value of i_nl    
333                                                   
334         The "generic_delete_inode()" behavior     
335         practice of using "force_delete" in th    
336         does not have the races that the "forc    
337                                                   
338 ``evict_inode``                                   
339         called when the VFS wants to evict an     
340         *not* evict the pagecache or inode-ass    
341         the method has to use truncate_inode_p    
342         of those. Caller makes sure async writ    
343         the inode while (or after) ->evict_ino    
344                                                   
345 ``put_super``                                     
346         called when the VFS wishes to free the    
347         (i.e. unmount).  This is called with t    
348                                                   
349 ``sync_fs``                                       
350         called when VFS is writing out all dir    
351         superblock.  The second parameter indi    
352         should wait until the write out has be    
353                                                   
354 ``freeze_super``                                  
355         Called instead of ->freeze_fs callback    
356         Main difference is that ->freeze_super    
357         down_write(&sb->s_umount). If filesyst    
358         ->freeze_fs to be called too, then it     
359         explicitly from this callback. Optiona    
360                                                   
361 ``freeze_fs``                                     
362         called when VFS is locking a filesyste    
363         consistent state.  This method is curr    
364         Volume Manager (LVM) and ioctl(FIFREEZ    
365                                                   
366 ``thaw_super``                                    
367         called when VFS is unlocking a filesys    
368         again after ->freeze_super. Optional.     
369                                                   
370 ``unfreeze_fs``                                   
371         called when VFS is unlocking a filesys    
372         again after ->freeze_fs. Optional.        
373                                                   
374 ``statfs``                                        
375         called when the VFS needs to get files    
376                                                   
377 ``remount_fs``                                    
378         called when the filesystem is remounte    
379         the kernel lock held                      
380                                                   
381 ``umount_begin``                                  
382         called when the VFS is unmounting a fi    
383                                                   
384 ``show_options``                                  
385         called by the VFS to show mount option    
386         and /proc/<pid>/mountinfo.                
387         (see "Mount Options" section)             
388                                                   
389 ``show_devname``                                  
390         Optional. Called by the VFS to show de    
391         /proc/<pid>/{mounts,mountinfo,mountsta    
392         '(struct mount).mnt_devname' will be u    
393                                                   
394 ``show_path``                                     
395         Optional. Called by the VFS (for /proc    
396         the mount root dentry path relative to    
397                                                   
398 ``show_stats``                                    
399         Optional. Called by the VFS (for /proc    
400         filesystem-specific mount statistics.     
401                                                   
402 ``quota_read``                                    
403         called by the VFS to read from filesys    
404                                                   
405 ``quota_write``                                   
406         called by the VFS to write to filesyst    
407                                                   
408 ``get_dquots``                                    
409         called by quota to get 'struct dquot'     
410         Optional.                                 
411                                                   
412 ``nr_cached_objects``                             
413         called by the sb cache shrinking funct    
414         return the number of freeable cached o    
415         Optional.                                 
416                                                   
417 ``free_cache_objects``                            
418         called by the sb cache shrinking funct    
419         scan the number of objects indicated t    
420         Optional, but any filesystem implement    
421         also implement ->nr_cached_objects for    
422         correctly.                                
423                                                   
424         We can't do anything with any errors t    
425         encountered, hence the void return typ    
426         called if the VM is trying to reclaim     
427         hence this method does not need to han    
428                                                   
429         Implementations must include condition    
430         any scanning loop that is done.  This     
431         determine appropriate scan batch sizes    
432         about whether implementations will cau    
433         large scan batch sizes.                   
434                                                   
435 Whoever sets up the inode is responsible for f    
436 field.  This is a pointer to a "struct inode_o    
437 the methods that can be performed on individua    
438                                                   
439                                                   
440 struct xattr_handler                              
441 ---------------------                             
442                                                   
443 On filesystems that support extended attribute    
444 superblock field points to a NULL-terminated a    
445 Extended attributes are name:value pairs.         
446                                                   
447 ``name``                                          
448         Indicates that the handler matches att    
449         name (such as "system.posix_acl_access    
450         be NULL.                                  
451                                                   
452 ``prefix``                                        
453         Indicates that the handler matches all    
454         specified name prefix (such as "user."    
455         NULL.                                     
456                                                   
457 ``list``                                          
458         Determine if attributes matching this     
459         listed for a particular dentry.  Used     
460         implementations like generic_listxattr    
461                                                   
462 ``get``                                           
463         Called by the VFS to get the value of     
464         attribute.  This method is called by t    
465         call.                                     
466                                                   
467 ``set``                                           
468         Called by the VFS to set the value of     
469         attribute.  When the new value is NULL    
470         particular extended attribute.  This m    
471         setxattr(2) and removexattr(2) system     
472                                                   
473 When none of the xattr handlers of a filesyste    
474 attribute name or when a filesystem doesn't su    
475 the various ``*xattr(2)`` system calls return     
476                                                   
477                                                   
478 The Inode Object                                  
479 ================                                  
480                                                   
481 An inode object represents an object within th    
482                                                   
483                                                   
484 struct inode_operations                           
485 -----------------------                           
486                                                   
487 This describes how the VFS can manipulate an i    
488 As of kernel 2.6.22, the following members are    
489                                                   
490 .. code-block:: c                                 
491                                                   
492         struct inode_operations {                 
493                 int (*create) (struct mnt_idma    
494                 struct dentry * (*lookup) (str    
495                 int (*link) (struct dentry *,s    
496                 int (*unlink) (struct inode *,    
497                 int (*symlink) (struct mnt_idm    
498                 int (*mkdir) (struct mnt_idmap    
499                 int (*rmdir) (struct inode *,s    
500                 int (*mknod) (struct mnt_idmap    
501                 int (*rename) (struct mnt_idma    
502                                struct inode *,    
503                 int (*readlink) (struct dentry    
504                 const char *(*get_link) (struc    
505                                          struc    
506                 int (*permission) (struct mnt_    
507                 struct posix_acl * (*get_inode    
508                 int (*setattr) (struct mnt_idm    
509                 int (*getattr) (struct mnt_idm    
510                 ssize_t (*listxattr) (struct d    
511                 void (*update_time)(struct ino    
512                 int (*atomic_open)(struct inod    
513                                    unsigned op    
514                 int (*tmpfile) (struct mnt_idm    
515                 struct posix_acl * (*get_acl)(    
516                 int (*set_acl)(struct mnt_idma    
517                 int (*fileattr_set)(struct mnt    
518                                     struct den    
519                 int (*fileattr_get)(struct den    
520                 struct offset_ctx *(*get_offse    
521         };                                        
522                                                   
523 Again, all methods are called without any lock    
524 otherwise noted.                                  
525                                                   
526 ``create``                                        
527         called by the open(2) and creat(2) sys    
528         if you want to support regular files.     
529         not have an inode (i.e. it should be a    
530         you will probably call d_instantiate()    
531         newly created inode                       
532                                                   
533 ``lookup``                                        
534         called when the VFS needs to look up a    
535         directory.  The name to look for is fo    
536         method must call d_add() to insert the    
537         dentry.  The "i_count" field in the in    
538         incremented.  If the named inode does     
539         should be inserted into the dentry (th    
540         dentry).  Returning an error code from    
541         done on a real error, otherwise creati    
542         calls like create(2), mknod(2), mkdir(    
543         If you wish to overload the dentry met    
544         initialise the "d_dop" field in the de    
545         a struct "dentry_operations".  This me    
546         directory inode semaphore held            
547                                                   
548 ``link``                                          
549         called by the link(2) system call.  On    
550         support hard links.  You will probably    
551         d_instantiate() just as you would in t    
552                                                   
553 ``unlink``                                        
554         called by the unlink(2) system call.      
555         to support deleting inodes                
556                                                   
557 ``symlink``                                       
558         called by the symlink(2) system call.     
559         to support symlinks.  You will probabl    
560         d_instantiate() just as you would in t    
561                                                   
562 ``mkdir``                                         
563         called by the mkdir(2) system call.  O    
564         to support creating subdirectories.  Y    
565         call d_instantiate() just as you would    
566                                                   
567 ``rmdir``                                         
568         called by the rmdir(2) system call.  O    
569         to support deleting subdirectories        
570                                                   
571 ``mknod``                                         
572         called by the mknod(2) system call to     
573         block) inode or a named pipe (FIFO) or    
574         you want to support creating these typ    
575         probably need to call d_instantiate()     
576         create() method                           
577                                                   
578 ``rename``                                        
579         called by the rename(2) system call to    
580         the parent and name given by the secon    
581                                                   
582         The filesystem must return -EINVAL for    
583         unknown flags.  Currently the followin    
584         (1) RENAME_NOREPLACE: this flag indica    
585         the rename exists the rename should fa    
586         replacing the target.  The VFS already    
587         for local filesystems the RENAME_NOREP    
588         equivalent to plain rename.               
589         (2) RENAME_EXCHANGE: exchange source a    
590         exist; this is checked by the VFS.  Un    
591         and target may be of different type.      
592                                                   
593 ``get_link``                                      
594         called by the VFS to follow a symbolic    
595         points to.  Only required if you want     
596         This method returns the symlink body t    
597         resets the current position with nd_ju    
598         won't go away until the inode is gone,    
599         if it needs to be otherwise pinned, ar    
600         having get_link(..., ..., done) do set    
601         destructor, argument).  In that case d    
602         be called once VFS is done with the bo    
603         be called in RCU mode; that is indicat    
604         argument.  If request can't be handled    
605         have it return ERR_PTR(-ECHILD).          
606                                                   
607         If the filesystem stores the symlink t    
608         VFS may use it directly without callin    
609         ->get_link() must still be provided.      
610         freed until after an RCU grace period.    
611         post-iget() time requires a 'release'     
612                                                   
613 ``readlink``                                      
614         this is now just an override for use b    
615         cases when ->get_link uses nd_jump_lin    
616         fact a symlink.  Normally filesystems     
617         ->get_link for symlinks and readlink(2    
618         that.                                     
619                                                   
620 ``permission``                                    
621         called by the VFS to check for access     
622         filesystem.                               
623                                                   
624         May be called in rcu-walk mode (mask &    
625         rcu-walk mode, the filesystem must che    
626         blocking or storing to the inode.         
627                                                   
628         If a situation is encountered that rcu    
629         return                                    
630         -ECHILD and it will be called again in    
631                                                   
632 ``setattr``                                       
633         called by the VFS to set attributes fo    
634         called by chmod(2) and related system     
635                                                   
636 ``getattr``                                       
637         called by the VFS to get attributes of    
638         called by stat(2) and related system c    
639                                                   
640 ``listxattr``                                     
641         called by the VFS to list all extended    
642         file.  This method is called by the li    
643                                                   
644 ``update_time``                                   
645         called by the VFS to update a specific    
646         an inode.  If this is not defined the     
647         itself and call mark_inode_dirty_sync.    
648                                                   
649 ``atomic_open``                                   
650         called on the last component of an ope    
651         method the filesystem can look up, pos    
652         file in one atomic operation.  If it w    
653         opening to the caller (e.g. if the fil    
654         symlink, device, or just something fil    
655         open for), it may signal this by retur    
656         dentry).  This method is only called i    
657         negative or needs lookup.  Cached posi    
658         handled by f_op->open().  If the file     
659         flag should be set in file->f_mode.  I    
660         method must only succeed if the file d    
661         FMODE_CREATED shall always be set on s    
662                                                   
663 ``tmpfile``                                       
664         called in the end of O_TMPFILE open().    
665         atomically creating, opening and unlin    
666         directory.  On success needs to return    
667         open; this can be done by calling fini    
668         the end.                                  
669                                                   
670 ``fileattr_get``                                  
671         called on ioctl(FS_IOC_GETFLAGS) and i    
672         retrieve miscellaneous file flags and     
673         before the relevant SET operation to c    
674         (in this case with i_rwsem locked excl    
675         fall back to f_op->ioctl().               
676                                                   
677 ``fileattr_set``                                  
678         called on ioctl(FS_IOC_SETFLAGS) and i    
679         change miscellaneous file flags and at    
680         i_rwsem exclusive.  If unset, then fal    
681 ``get_offset_ctx``                                
682         called to get the offset context for a    
683         filesystem must define this operation     
684         simple_offset_dir_operations.             
685                                                   
686 The Address Space Object                          
687 ========================                          
688                                                   
689 The address space object is used to group and     
690 cache.  It can be used to keep track of the pa    
691 else) and also track the mapping of sections o    
692 address spaces.                                   
693                                                   
694 There are a number of distinct yet related ser    
695 address-space can provide.  These include comm    
696 page lookup by address, and keeping track of p    
697 Writeback.                                        
698                                                   
699 The first can be used independently to the oth    
700 either write dirty pages in order to clean the    
701 in order to reuse them.  To do this it can cal    
702 on dirty pages, and ->release_folio on clean f    
703 flag set.  Clean pages without PagePrivate and    
704 will be released without notice being given to    
705                                                   
706 To achieve this functionality, pages need to b    
707 lru_cache_add and mark_page_active needs to be    
708 is used.                                          
709                                                   
710 Pages are normally kept in a radix tree index     
711 maintains information about the PG_Dirty and P    
712 page, so that pages with either of these flags    
713                                                   
714 The Dirty tag is primarily used by mpage_write    
715 ->writepages method.  It uses the tag to find     
716 ->writepage on.  If mpage_writepages is not us    
717 provides its own ->writepages) , the PAGECACHE    
718 unused.  write_inode_now and sync_inode do use    
719 __sync_single_inode) to check if ->writepages     
720 writing out the whole address_space.              
721                                                   
722 The Writeback tag is used by filemap*wait* and    
723 filemap_fdatawait_range, to wait for all write    
724                                                   
725 An address_space handler may attach extra info    
726 typically using the 'private' field in the 'st    
727 information is attached, the PG_Private flag s    
728 cause various VM routines to make extra calls     
729 handler to deal with that data.                   
730                                                   
731 An address space acts as an intermediate betwe    
732 application.  Data is read into the address sp    
733 time, and provided to the application either b    
734 by memory-mapping the page.  Data is written i    
735 the application, and then written-back to stor    
736 pages, however the address_space has finer con    
737                                                   
738 The read process essentially only requires 're    
739 process is more complicated and uses write_beg    
740 dirty_folio to write data into the address_spa    
741 writepages to writeback data to storage.          
742                                                   
743 Adding and removing pages to/from an address_s    
744 inode's i_mutex.                                  
745                                                   
746 When data is written to a page, the PG_Dirty f    
747 typically remains set until writepage asks for    
748 should clear PG_Dirty and set PG_Writeback.  I    
749 at any point after PG_Dirty is clear.  Once it    
750 PG_Writeback is cleared.                          
751                                                   
752 Writeback makes use of a writeback_control str    
753 operations.  This gives the writepage and writ    
754 information about the nature of and reason for    
755 and the constraints under which it is being do    
756 return information back to the caller about th    
757 writepages request.                               
758                                                   
759                                                   
760 Handling errors during writeback                  
761 --------------------------------                  
762                                                   
763 Most applications that do buffered I/O will pe    
764 synchronization call (fsync, fdatasync, msync     
765 ensure that data written has made it to the ba    
766 is an error during writeback, they expect that    
767 a file sync request is made.  After an error h    
768 request, subsequent requests on the same file     
769 0, unless further writeback errors have occurr    
770 synchronization.                                  
771                                                   
772 Ideally, the kernel would report errors only o    
773 which writes were done that subsequently faile    
774 generic pagecache infrastructure does not trac    
775 that have dirtied each individual page however    
776 file descriptors should get back an error is n    
777                                                   
778 Instead, the generic writeback error tracking     
779 kernel settles for reporting errors to fsync o    
780 that were open at the time that the error occu    
781 multiple writers, all of them will get back an    
782 fsync, even if all of the writes done through     
783 descriptor succeeded (or even if there were no    
784 descriptor at all).                               
785                                                   
786 Filesystems that wish to use this infrastructu    
787 mapping_set_error to record the error in the a    
788 occurs.  Then, after writing back data from th    
789 file->fsync operation, they should call file_c    
790 ensure that the struct file's error cursor has    
791 point in the stream of errors emitted by the b    
792                                                   
793                                                   
794 struct address_space_operations                   
795 -------------------------------                   
796                                                   
797 This describes how the VFS can manipulate mapp    
798 cache in your filesystem.  The following membe    
799                                                   
800 .. code-block:: c                                 
801                                                   
802         struct address_space_operations {         
803                 int (*writepage)(struct page *    
804                 int (*read_folio)(struct file     
805                 int (*writepages)(struct addre    
806                 bool (*dirty_folio)(struct add    
807                 void (*readahead)(struct reada    
808                 int (*write_begin)(struct file    
809                                    loff_t pos,    
810                                 struct page **    
811                 int (*write_end)(struct file *    
812                                  loff_t pos, u    
813                                  struct folio     
814                 sector_t (*bmap)(struct addres    
815                 void (*invalidate_folio) (stru    
816                 bool (*release_folio)(struct f    
817                 void (*free_folio)(struct foli    
818                 ssize_t (*direct_IO)(struct ki    
819                 int (*migrate_folio)(struct ma    
820                                 struct folio *    
821                 int (*launder_folio) (struct f    
822                                                   
823                 bool (*is_partially_uptodate)     
824                                                   
825                 void (*is_dirty_writeback)(str    
826                 int (*error_remove_folio)(stru    
827                 int (*swap_activate)(struct sw    
828                 int (*swap_deactivate)(struct     
829                 int (*swap_rw)(struct kiocb *i    
830         };                                        
831                                                   
832 ``writepage``                                     
833         called by the VM to write a dirty page    
834         may happen for data integrity reasons     
835         up memory (flush).  The difference can    
836         wbc->sync_mode.  The PG_Dirty flag has    
837         PageLocked is true.  writepage should     
838         PG_Writeback, and should make sure the    
839         synchronously or asynchronously when t    
840         completes.                                
841                                                   
842         If wbc->sync_mode is WB_SYNC_NONE, ->w    
843         try too hard if there are problems, an    
844         other pages from the mapping if that i    
845         internal dependencies).  If it chooses    
846         should return AOP_WRITEPAGE_ACTIVATE s    
847         keep calling ->writepage on that page.    
848                                                   
849         See the file "Locking" for more detail    
850                                                   
851 ``read_folio``                                    
852         Called by the page cache to read a fol    
853         The 'file' argument supplies authentic    
854         filesystems, and is generally not used    
855         It may be NULL if the caller does not     
856         the kernel is performing a read for it    
857         of a userspace process with an open fi    
858                                                   
859         If the mapping does not support large     
860         contain a single page.  The folio will    
861         is called.  If the read completes succ    
862         be marked uptodate.  The filesystem sh    
863         once the read has completed, whether i    
864         The filesystem does not need to modify    
865         the page cache holds a reference count    
866         released until the folio is unlocked.     
867                                                   
868         Filesystems may implement ->read_folio    
869         In normal operation, folios are read t    
870         method.  Only if this fails, or if the    
871         the read to complete will the page cac    
872         Filesystems should not attempt to perf    
873         in the ->read_folio() operation.          
874                                                   
875         If the filesystem cannot perform the r    
876         unlock the folio, do whatever action i    
877         read will succeed in the future and re    
878         In this case, the caller should look u    
879         and call ->read_folio again.              
880                                                   
881         Callers may invoke the ->read_folio()     
882         read_mapping_folio() will take care of    
883         read to complete and handle cases such    
884                                                   
885 ``writepages``                                    
886         called by the VM to write out pages as    
887         address_space object.  If wbc->sync_mo    
888         the writeback_control will specify a r    
889         written out.  If it is WB_SYNC_NONE, t    
890         given and that many pages should be wr    
891         ->writepages is given, then mpage_writ    
892         This will choose pages from the addres    
893         DIRTY and will pass them to ->writepag    
894                                                   
895 ``dirty_folio``                                   
896         called by the VM to mark a folio as di    
897         needed if an address space attaches pr    
898         that data needs to be updated when a f    
899         called, for example, when a memory map    
900         If defined, it should set the folio di    
901         PAGECACHE_TAG_DIRTY search mark in i_p    
902                                                   
903 ``readahead``                                     
904         Called by the VM to read pages associa    
905         object.  The pages are consecutive in     
906         locked.  The implementation should dec    
907         after starting I/O on each page.  Usua    
908         unlocked by the I/O completion handler    
909         divided into some sync pages followed     
910         rac->ra->async_size gives the number o    
911         filesystem should attempt to read all     
912         to stop once it reaches the async page    
913         stop attempting I/O, it can simply ret    
914         remove the remaining pages from the ad    
915         and decrement the page refcount.  Set     
916         completes successfully.                   
917                                                   
918 ``write_begin``                                   
919         Called by the generic buffered write c    
920         to prepare to write len bytes at the g    
921         The address_space should check that th    
922         complete, by allocating space if neces    
923         internal housekeeping.  If the write w    
924         basic-blocks on storage, then those bl    
925         (if they haven't been read already) so    
926         can be written out properly.              
927                                                   
928         The filesystem must return the locked     
929         specified offset, in ``*foliop``, for     
930                                                   
931         It must be able to cope with short wri    
932         passed to write_begin is greater than     
933         into the folio).                          
934                                                   
935         A void * may be returned in fsdata, wh    
936         write_end.                                
937                                                   
938         Returns 0 on success; < 0 on failure (    
939         in which case write_end is not called.    
940                                                   
941 ``write_end``                                     
942         After a successful write_begin, and da    
943         called.  len is the original len passe    
944         copied is the amount that was able to     
945                                                   
946         The filesystem must take care of unloc    
947         decrementing its refcount, and updatin    
948                                                   
949         Returns < 0 on failure, otherwise the     
950         'copied') that were able to be copied     
951                                                   
952 ``bmap``                                          
953         called by the VFS to map a logical blo    
954         physical block number.  This method is    
955         and for working with swap-files.  To b    
956         the file must have a stable mapping to    
957         system does not go through the filesys    
958         to find out where the blocks in the fi    
959         addresses directly.                       
960                                                   
961 ``invalidate_folio``                              
962         If a folio has private data, then inva    
963         called when part or all of the folio i    
964         address space.  This generally corresp    
965         truncation, punch hole or a complete i    
966         space (in the latter case 'offset' wil    
967         will be folio_size()).  Any private da    
968         should be updated to reflect this trun    
969         and length is folio_size(), then the p    
970         released, because the folio must be ab    
971         discarded.  This may be done by callin    
972         function, but in this case the release    
973                                                   
974 ``release_folio``                                 
975         release_folio is called on folios with    
976         filesystem that the folio is about to     
977         should remove any private data from th    
978         private flag.  If release_folio() fail    
979         release_folio() is used in two distinc    
980         The first is when the VM wants to free    
981         active users.  If ->release_folio succ    
982         removed from the address_space and be     
983                                                   
984         The second case is when a request has     
985         some or all folios in an address_space    
986         through the fadvise(POSIX_FADV_DONTNEE    
987         filesystem explicitly requesting it as    
988         believe the cache may be out of date w    
989         invalidate_inode_pages2().  If the fil    
990         and needs to be certain that all folio    
991         its release_folio will need to ensure     
992         clear the uptodate flag if it cannot f    
993                                                   
994 ``free_folio``                                    
995         free_folio is called once the folio is    
996         page cache in order to allow the clean    
997         Since it may be called by the memory r    
998         assume that the original address_space    
999         it should not block.                      
1000                                                  
1001 ``direct_IO``                                    
1002         called by the generic read/write rout    
1003         that is IO requests which bypass the     
1004         data directly between the storage and    
1005         space.                                   
1006                                                  
1007 ``migrate_folio``                                
1008         This is used to compact the physical     
1009         wants to relocate a folio (maybe from    
1010         signalling imminent failure) it will     
1011         folio to this function.  migrate_foli    
1012         data across and update any references    
1013                                                  
1014 ``launder_folio``                                
1015         Called before freeing a folio - it wr    
1016         To prevent redirtying the folio, it i    
1017         whole operation.                         
1018                                                  
1019 ``is_partially_uptodate``                        
1020         Called by the VM when reading a file     
1021         the underlying blocksize is smaller t    
1022         If the required block is up to date t    
1023         without needing I/O to bring the whol    
1024                                                  
1025 ``is_dirty_writeback``                           
1026         Called by the VM when attempting to r    
1027         dirty and writeback information to de    
1028         stall to allow flushers a chance to c    
1029         Ordinarily it can use folio_test_dirt    
1030         some filesystems have more complex st    
1031         prevent reclaim) or do not set those     
1032         problems.  This callback allows a fil    
1033         VM if a folio should be treated as di    
1034         purposes of stalling.                    
1035                                                  
1036 ``error_remove_folio``                           
1037         normally set to generic_error_remove_    
1038         for this address space.  Used for mem    
1039         Setting this implies you deal with pa    
1040         unless you have them locked or refere    
1041                                                  
1042 ``swap_activate``                                
1043                                                  
1044         Called to prepare the given file for     
1045         any validation and preparation necess    
1046         can be performed with minimal memory     
1047         add_swap_extent(), or the helper ioma    
1048         return the number of extents added.      
1049         through ->swap_rw(), it should set SW    
1050         be submitted directly to the block de    
1051                                                  
1052 ``swap_deactivate``                              
1053         Called during swapoff on files where     
1054         successful.                              
1055                                                  
1056 ``swap_rw``                                      
1057         Called to read or write swap pages wh    
1058                                                  
1059 The File Object                                  
1060 ===============                                  
1061                                                  
1062 A file object represents a file opened by a p    
1063 as an "open file description" in POSIX parlan    
1064                                                  
1065                                                  
1066 struct file_operations                           
1067 ----------------------                           
1068                                                  
1069 This describes how the VFS can manipulate an     
1070 4.18, the following members are defined:         
1071                                                  
1072 .. code-block:: c                                
1073                                                  
1074         struct file_operations {                 
1075                 struct module *owner;            
1076                 loff_t (*llseek) (struct file    
1077                 ssize_t (*read) (struct file     
1078                 ssize_t (*write) (struct file    
1079                 ssize_t (*read_iter) (struct     
1080                 ssize_t (*write_iter) (struct    
1081                 int (*iopoll)(struct kiocb *k    
1082                 int (*iterate_shared) (struct    
1083                 __poll_t (*poll) (struct file    
1084                 long (*unlocked_ioctl) (struc    
1085                 long (*compat_ioctl) (struct     
1086                 int (*mmap) (struct file *, s    
1087                 int (*open) (struct inode *,     
1088                 int (*flush) (struct file *,     
1089                 int (*release) (struct inode     
1090                 int (*fsync) (struct file *,     
1091                 int (*fasync) (int, struct fi    
1092                 int (*lock) (struct file *, i    
1093                 unsigned long (*get_unmapped_    
1094                 int (*check_flags)(int);         
1095                 int (*flock) (struct file *,     
1096                 ssize_t (*splice_write)(struc    
1097                 ssize_t (*splice_read)(struct    
1098                 int (*setlease)(struct file *    
1099                 long (*fallocate)(struct file    
1100                                   loff_t len)    
1101                 void (*show_fdinfo)(struct se    
1102         #ifndef CONFIG_MMU                       
1103                 unsigned (*mmap_capabilities)    
1104         #endif                                   
1105                 ssize_t (*copy_file_range)(st    
1106                 loff_t (*remap_file_range)(st    
1107                                            st    
1108                                            lo    
1109                 int (*fadvise)(struct file *,    
1110         };                                       
1111                                                  
1112 Again, all methods are called without any loc    
1113 otherwise noted.                                 
1114                                                  
1115 ``llseek``                                       
1116         called when the VFS needs to move the    
1117                                                  
1118 ``read``                                         
1119         called by read(2) and related system     
1120                                                  
1121 ``read_iter``                                    
1122         possibly asynchronous read with iov_i    
1123                                                  
1124 ``write``                                        
1125         called by write(2) and related system    
1126                                                  
1127 ``write_iter``                                   
1128         possibly asynchronous write with iov_    
1129                                                  
1130 ``iopoll``                                       
1131         called when aio wants to poll for com    
1132                                                  
1133 ``iterate_shared``                               
1134         called when the VFS needs to read the    
1135                                                  
1136 ``poll``                                         
1137         called by the VFS when a process want    
1138         activity on this file and (optionally    
1139         is activity.  Called by the select(2)    
1140                                                  
1141 ``unlocked_ioctl``                               
1142         called by the ioctl(2) system call.      
1143                                                  
1144 ``compat_ioctl``                                 
1145         called by the ioctl(2) system call wh    
1146          used on 64 bit kernels.                 
1147                                                  
1148 ``mmap``                                         
1149         called by the mmap(2) system call        
1150                                                  
1151 ``open``                                         
1152         called by the VFS when an inode shoul    
1153         opens a file, it creates a new "struc    
1154         open method for the newly allocated f    
1155         think that the open method really bel    
1156         inode_operations", and you may be rig    
1157         way it is because it makes filesystem    
1158         The open() method is a good place to     
1159         "private_data" member in the file str    
1160         to a device structure                    
1161                                                  
1162 ``flush``                                        
1163         called by the close(2) system call to    
1164                                                  
1165 ``release``                                      
1166         called when the last reference to an     
1167                                                  
1168 ``fsync``                                        
1169         called by the fsync(2) system call.      
1170         entitled "Handling errors during writ    
1171                                                  
1172 ``fasync``                                       
1173         called by the fcntl(2) system call wh    
1174         (non-blocking) mode is enabled for a     
1175                                                  
1176 ``lock``                                         
1177         called by the fcntl(2) system call fo    
1178         F_SETLKW commands                        
1179                                                  
1180 ``get_unmapped_area``                            
1181         called by the mmap(2) system call        
1182                                                  
1183 ``check_flags``                                  
1184         called by the fcntl(2) system call fo    
1185                                                  
1186 ``flock``                                        
1187         called by the flock(2) system call       
1188                                                  
1189 ``splice_write``                                 
1190         called by the VFS to splice data from    
1191         method is used by the splice(2) syste    
1192                                                  
1193 ``splice_read``                                  
1194         called by the VFS to splice data from    
1195         method is used by the splice(2) syste    
1196                                                  
1197 ``setlease``                                     
1198         called by the VFS to set or release a    
1199         implementations should call generic_s    
1200         the lease in the inode after setting     
1201                                                  
1202 ``fallocate``                                    
1203         called by the VFS to preallocate bloc    
1204                                                  
1205 ``copy_file_range``                              
1206         called by the copy_file_range(2) syst    
1207                                                  
1208 ``remap_file_range``                             
1209         called by the ioctl(2) system call fo    
1210         and FIDEDUPERANGE commands to remap f    
1211         implementation should remap len bytes    
1212         file into the dest file at pos_out.      
1213         callers passing in len == 0; this mea    
1214         source file".  The return value shoul    
1215         remapped, or the usual negative error    
1216         before any bytes were remapped.  The     
1217         accepts REMAP_FILE_* flags.  If REMAP    
1218         implementation must only remap if the    
1219         identical contents.  If REMAP_FILE_CA    
1220         ok with the implementation shortening    
1221         satisfy alignment or EOF requirements    
1222                                                  
1223 ``fadvise``                                      
1224         possibly called by the fadvise64() sy    
1225                                                  
1226 Note that the file operations are implemented    
1227 filesystem in which the inode resides.  When     
1228 (character or block special) most filesystems    
1229 support routines in the VFS which will locate    
1230 driver information.  These support routines r    
1231 operations with those for the device driver,     
1232 the new open() method for the file.  This is     
1233 in the filesystem eventually ends up calling     
1234 method.                                          
1235                                                  
1236                                                  
1237 Directory Entry Cache (dcache)                   
1238 ==============================                   
1239                                                  
1240                                                  
1241 struct dentry_operations                         
1242 ------------------------                         
1243                                                  
1244 This describes how a filesystem can overload     
1245 operations.  Dentries and the dcache are the     
1246 individual filesystem implementations.  Devic    
1247 here.  These methods may be set to NULL, as t    
1248 the VFS uses a default.  As of kernel 2.6.22,    
1249 defined:                                         
1250                                                  
1251 .. code-block:: c                                
1252                                                  
1253         struct dentry_operations {               
1254                 int (*d_revalidate)(struct de    
1255                 int (*d_weak_revalidate)(stru    
1256                 int (*d_hash)(const struct de    
1257                 int (*d_compare)(const struct    
1258                                  unsigned int    
1259                 int (*d_delete)(const struct     
1260                 int (*d_init)(struct dentry *    
1261                 void (*d_release)(struct dent    
1262                 void (*d_iput)(struct dentry     
1263                 char *(*d_dname)(struct dentr    
1264                 struct vfsmount *(*d_automoun    
1265                 int (*d_manage)(const struct     
1266                 struct dentry *(*d_real)(stru    
1267         };                                       
1268                                                  
1269 ``d_revalidate``                                 
1270         called when the VFS needs to revalida    
1271         called whenever a name look-up finds     
1272         Most local filesystems leave this as     
1273         dentries in the dcache are valid.  Ne    
1274         different since things can change on     
1275         client necessarily being aware of it.    
1276                                                  
1277         This function should return a positiv    
1278         still valid, and zero or a negative e    
1279                                                  
1280         d_revalidate may be called in rcu-wal    
1281         LOOKUP_RCU).  If in rcu-walk mode, th    
1282         revalidate the dentry without blockin    
1283         d_parent and d_inode should not be us    
1284         they can change and, in d_inode case,    
1285         us).                                     
1286                                                  
1287         If a situation is encountered that rc    
1288         return                                   
1289         -ECHILD and it will be called again i    
1290                                                  
1291 ``d_weak_revalidate``                            
1292         called when the VFS needs to revalida    
1293         is called when a path-walk ends at de    
1294         by doing a lookup in the parent direc    
1295         "." and "..", as well as procfs-style    
1296         traversal.                               
1297                                                  
1298         In this case, we are less concerned w    
1299         still fully correct, but rather that     
1300         As with d_revalidate, most local file    
1301         NULL since their dcache entries are a    
1302                                                  
1303         This function has the same return cod    
1304         d_revalidate.                            
1305                                                  
1306         d_weak_revalidate is only called afte    
1307                                                  
1308 ``d_hash``                                       
1309         called when the VFS adds a dentry to     
1310         dentry passed to d_hash is the parent    
1311         to be hashed into.                       
1312                                                  
1313         Same locking and synchronisation rule    
1314         what is safe to dereference etc.         
1315                                                  
1316 ``d_compare``                                    
1317         called to compare a dentry name with     
1318         dentry is the parent of the dentry to    
1319         the child dentry.  len and name strin    
1320         dentry to be compared.  qstr is the n    
1321                                                  
1322         Must be constant and idempotent, and     
1323         possible, and should not or store int    
1324         dereference pointers outside the dent    
1325         (eg.  d_parent, d_inode, d_name shoul    
1326                                                  
1327         However, our vfsmount is pinned, and     
1328         and inodes won't disappear, neither w    
1329         module.  ->d_sb may be used.             
1330                                                  
1331         It is a tricky calling convention bec    
1332         under "rcu-walk", ie. without any loc    
1333                                                  
1334 ``d_delete``                                     
1335         called when the last reference to a d    
1336         dcache is deciding whether or not to     
1337         delete immediately, or 0 to cache the    
1338         which means to always cache a reachab    
1339         be constant and idempotent.              
1340                                                  
1341 ``d_init``                                       
1342         called when a dentry is allocated        
1343                                                  
1344 ``d_release``                                    
1345         called when a dentry is really deallo    
1346                                                  
1347 ``d_iput``                                       
1348         called when a dentry loses its inode     
1349         deallocated).  The default when this     
1350         calls iput().  If you define this met    
1351         yourself                                 
1352                                                  
1353 ``d_dname``                                      
1354         called when the pathname of a dentry     
1355         Useful for some pseudo filesystems (s    
1356         delay pathname generation.  (Instead     
1357         created, it's done only when the path    
1358         filesystems probably dont want to use    
1359         are present in global dcache hash, so    
1360         invariant.  As no lock is held, d_dna    
1361         modify the dentry itself, unless appr    
1362         CAUTION : d_path() logic is quite tri    
1363         return for example "Hello" is to put     
1364         buffer, and returns a pointer to the     
1365         dynamic_dname() helper function is pr    
1366         this.                                    
1367                                                  
1368         Example :                                
1369                                                  
1370 .. code-block:: c                                
1371                                                  
1372         static char *pipefs_dname(struct dent    
1373         {                                        
1374                 return dynamic_dname(dentry,     
1375                                 dentry->d_ino    
1376         }                                        
1377                                                  
1378 ``d_automount``                                  
1379         called when an automount dentry is to    
1380         This should create a new VFS mount re    
1381         to the caller.  The caller is supplie    
1382         giving the automount directory to des    
1383         and the parent VFS mount record to pr    
1384         parameters.  NULL should be returned     
1385         make the automount first.  If the vfs    
1386         an error code should be returned.  If    
1387         the directory will be treated as an o    
1388         returned to pathwalk to continue walk    
1389                                                  
1390         If a vfsmount is returned, the caller    
1391         on the mountpoint and will remove the    
1392         expiration list in the case of failur    
1393         returned with 2 refs on it to prevent    
1394         caller will clean up the additional r    
1395                                                  
1396         This function is only used if DCACHE_    
1397         the dentry.  This is set by __d_insta    
1398         set on the inode being added.            
1399                                                  
1400 ``d_manage``                                     
1401         called to allow the filesystem to man    
1402         dentry (optional).  This allows autof    
1403         clients waiting to explore behind a '    
1404         the daemon go past and construct the     
1405         returned to let the calling process c    
1406         returned to tell pathwalk to use this    
1407         directory and to ignore anything moun    
1408         the automount flag.  Any other error     
1409         completely.                              
1410                                                  
1411         If the 'rcu_walk' parameter is true,     
1412         pathwalk in RCU-walk mode.  Sleeping     
1413         mode, and the caller can be asked to     
1414         returning -ECHILD.  -EISDIR may also     
1415         pathwalk to ignore d_automount or any    
1416                                                  
1417         This function is only used if DCACHE_    
1418         the dentry being transited from.         
1419                                                  
1420 ``d_real``                                       
1421         overlay/union type filesystems implem    
1422         of the underlying dentries of a regul    
1423                                                  
1424         The 'type' argument takes the values     
1425         for returning the real underlying den    
1426         hosting the file's data or metadata r    
1427                                                  
1428         For non-regular files, the 'dentry' a    
1429                                                  
1430 Each dentry has a pointer to its parent dentr    
1431 of child dentries.  Child dentries are basica    
1432 directory.                                       
1433                                                  
1434                                                  
1435 Directory Entry Cache API                        
1436 --------------------------                       
1437                                                  
1438 There are a number of functions defined which    
1439 manipulate dentries:                             
1440                                                  
1441 ``dget``                                         
1442         open a new handle for an existing den    
1443         the usage count)                         
1444                                                  
1445 ``dput``                                         
1446         close a handle for a dentry (decremen    
1447         the usage count drops to 0, and the d    
1448         parent's hash, the "d_delete" method     
1449         it should be cached.  If it should no    
1450         dentry is not hashed, it is deleted.     
1451         are put into an LRU list to be reclai    
1452                                                  
1453 ``d_drop``                                       
1454         this unhashes a dentry from its paren    
1455         call to dput() will deallocate the de    
1456         drops to 0                               
1457                                                  
1458 ``d_delete``                                     
1459         delete a dentry.  If there are no oth    
1460         dentry then the dentry is turned into    
1461         d_iput() method is called).  If there    
1462         d_drop() is called instead               
1463                                                  
1464 ``d_add``                                        
1465         add a dentry to its parents hash list    
1466         d_instantiate()                          
1467                                                  
1468 ``d_instantiate``                                
1469         add a dentry to the alias hash list f    
1470         the "d_inode" member.  The "i_count"     
1471         structure should be set/incremented.     
1472         NULL, the dentry is called a "negativ    
1473         is commonly called when an inode is c    
1474         negative dentry                          
1475                                                  
1476 ``d_lookup``                                     
1477         look up a dentry given its parent and    
1478         looks up the child of that given name    
1479         table.  If it is found, the reference    
1480         the dentry is returned.  The caller m    
1481         dentry when it finishes using it.        
1482                                                  
1483                                                  
1484 Mount Options                                    
1485 =============                                    
1486                                                  
1487                                                  
1488 Parsing options                                  
1489 ---------------                                  
1490                                                  
1491 On mount and remount the filesystem is passed    
1492 comma separated list of mount options.  The o    
1493 these forms:                                     
1494                                                  
1495   option                                         
1496   option=value                                   
1497                                                  
1498 The <linux/parser.h> header defines an API th    
1499 options.  There are plenty of examples on how    
1500 filesystems.                                     
1501                                                  
1502                                                  
1503 Showing options                                  
1504 ---------------                                  
1505                                                  
1506 If a filesystem accepts mount options, it mus    
1507 show all the currently active options.  The r    
1508                                                  
1509   - options MUST be shown which are not defau    
1510     from the default                             
1511                                                  
1512   - options MAY be shown which are enabled by    
1513     default value                                
1514                                                  
1515 Options used only internally between a mount     
1516 as file descriptors), or which only have an e    
1517 (such as ones controlling the creation of a j    
1518 above rules.                                     
1519                                                  
1520 The underlying reason for the above rules is     
1521 can be accurately replicated (e.g. umounting     
1522 on the information found in /proc/mounts.        
1523                                                  
1524                                                  
1525 Resources                                        
1526 =========                                        
1527                                                  
1528 (Note some of these resources are not up-to-d    
1529  version.)                                       
1530                                                  
1531 Creating Linux virtual filesystems. 2002         
1532     <https://lwn.net/Articles/13325/>            
1533                                                  
1534 The Linux Virtual File-system Layer by Neil B    
1535     <http://www.cse.unsw.edu.au/~neilb/oss/li    
1536                                                  
1537 A tour of the Linux VFS by Michael K. Johnson    
1538     <https://www.tldp.org/LDP/khg/HyperNews/g    
1539                                                  
1540 A small trail through the Linux kernel by And    
1541     <https://www.win.tue.nl/~aeb/linux/vfs/tr    
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php