~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/filesystems/dax.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/filesystems/dax.rst (Version linux-6.11.5) and /Documentation/filesystems/dax.rst (Version linux-4.4.302)


  1 =======================                           
  2 Direct Access for files                           
  3 =======================                           
  4                                                   
  5 Motivation                                        
  6 ----------                                        
  7                                                   
  8 The page cache is usually used to buffer reads    
  9 It is also used to provide the pages which are    
 10 by a call to mmap.                                
 11                                                   
 12 For block devices that are memory-like, the pa    
 13 unnecessary copies of the original storage.  T    
 14 extra copy by performing reads and writes dire    
 15 For file mappings, the storage device is mappe    
 16                                                   
 17                                                   
 18 Usage                                             
 19 -----                                             
 20                                                   
 21 If you have a block device which supports `DAX    
 22 on it as usual.  The `DAX` code currently only    
 23 size equal to your kernel's `PAGE_SIZE`, so yo    
 24 size when creating the filesystem.                
 25                                                   
 26 Currently 5 filesystems support `DAX`: ext2, e    
 27 Enabling `DAX` on them is different.              
 28                                                   
 29 Enabling DAX on ext2 and erofs                    
 30 ------------------------------                    
 31                                                   
 32 When mounting the filesystem, use the ``-o dax    
 33 add 'dax' to the options in ``/etc/fstab``.  T    
 34 within the filesystem.  It is equivalent to th    
 35                                                   
 36                                                   
 37 Enabling DAX on xfs and ext4                      
 38 ----------------------------                      
 39                                                   
 40 Summary                                           
 41 -------                                           
 42                                                   
 43  1. There exists an in-kernel file access mode    
 44     the statx flag `STATX_ATTR_DAX`.  See the     
 45     about this access mode.                       
 46                                                   
 47  2. There exists a persistent flag `FS_XFLAG_D    
 48     files and directories. This advisory flag     
 49     time, but doing so does not immediately af    
 50                                                   
 51  3. If the persistent `FS_XFLAG_DAX` flag is s    
 52     be inherited by all regular files and subd    
 53     created in this directory. Files and subdi    
 54     this flag is set or cleared on the parent     
 55     this modification of the parent directory.    
 56                                                   
 57  4. There exist dax mount options which can ov    
 58     setting of the `S_DAX` flag.  Given underl    
 59     following hold:                               
 60                                                   
 61     ``-o dax=inode``  means "follow `FS_XFLAG_    
 62                                                   
 63     ``-o dax=never``  means "never set `S_DAX`    
 64                                                   
 65     ``-o dax=always`` means "always set `S_DAX    
 66                                                   
 67     ``-o dax``      is a legacy option which i    
 68                                                   
 69     .. warning::                                  
 70                                                   
 71       The option ``-o dax`` may be removed in     
 72       the preferred method for specifying this    
 73                                                   
 74     .. note::                                     
 75                                                   
 76       Modifications to and the inheritance beh    
 77       the same even when the filesystem is mou    
 78       in-core inode state (`S_DAX`) will be ov    
 79       remounted with dax=inode and the inode i    
 80                                                   
 81  5. The `S_DAX` policy can be changed via:        
 82                                                   
 83     a) Setting the parent directory `FS_XFLAG_    
 84        created                                    
 85                                                   
 86     b) Setting the appropriate dax="foo" mount    
 87                                                   
 88     c) Changing the `FS_XFLAG_DAX` flag on exi    
 89        directories.  This has runtime constrai    
 90        described in 6) below.                     
 91                                                   
 92  6. When changing the `S_DAX` policy via toggl    
 93     flag, the change to existing regular files    
 94     files are closed by all processes.            
 95                                                   
 96                                                   
 97 Details                                           
 98 -------                                           
 99                                                   
100 There are 2 per-file dax flags.  One is a pers    
101 and the other is a volatile flag indicating th    
102 (`S_DAX`).                                        
103                                                   
104 `FS_XFLAG_DAX` is preserved within the filesys    
105 setting can be set, cleared and/or queried usi    
106 (see ioctl_xfs_fsgetxattr(2)) or an utility su    
107                                                   
108 New files and directories automatically inheri    
109 their parent directory **when created**.  Ther    
110 directory creation time can be used to set a d    
111 sub-tree.                                         
112                                                   
113 To clarify inheritance, here are 3 examples:      
114                                                   
115 Example A:                                        
116                                                   
117 .. code-block:: shell                             
118                                                   
119   mkdir -p a/b/c                                  
120   xfs_io -c 'chattr +x' a                         
121   mkdir a/b/c/d                                   
122   mkdir a/e                                       
123                                                   
124   ------[outcome]------                           
125                                                   
126   dax: a,e                                        
127   no dax: b,c,d                                   
128                                                   
129 Example B:                                        
130                                                   
131 .. code-block:: shell                             
132                                                   
133   mkdir a                                         
134   xfs_io -c 'chattr +x' a                         
135   mkdir -p a/b/c/d                                
136                                                   
137   ------[outcome]------                           
138                                                   
139   dax: a,b,c,d                                    
140   no dax:                                         
141                                                   
142 Example C:                                        
143                                                   
144 .. code-block:: shell                             
145                                                   
146   mkdir -p a/b/c                                  
147   xfs_io -c 'chattr +x' c                         
148   mkdir a/b/c/d                                   
149                                                   
150   ------[outcome]------                           
151                                                   
152   dax: c,d                                        
153   no dax: a,b                                     
154                                                   
155 The current enabled state (`S_DAX`) is set whe    
156 memory by the kernel.  It is set based on the     
157 value of `FS_XFLAG_DAX` and the filesystem's d    
158                                                   
159 statx can be used to query `S_DAX`.               
160                                                   
161 .. note::                                         
162                                                   
163   That only regular files will ever have `S_DA    
164   will never indicate that `S_DAX` is set on d    
165                                                   
166 Setting the `FS_XFLAG_DAX` flag (specifically     
167 if the underlying media does not support dax a    
168 overridden with a mount option.                   
169                                                   
170                                                   
171 Enabling DAX on virtiofs                          
172 ----------------------------                      
173 The semantic of DAX on virtiofs is basically e    
174 except that when '-o dax=inode' is specified,     
175 whether DAX shall be enabled or not from virti    
176 rather than the persistent `FS_XFLAG_DAX` flag    
177 enabled or not is completely determined by vir    
178 server itself may deploy various algorithm mak    
179 on the persistent `FS_XFLAG_DAX` flag on the h    
180                                                   
181 It is still supported to set or clear persiste    
182 guest, but it is not guaranteed that DAX will     
183 corresponding file then. Users inside guest st    
184 check the statx flag `STATX_ATTR_DAX` to see i    
185                                                   
186                                                   
187 Implementation Tips for Block Driver Writers      
188 --------------------------------------------      
189                                                   
190 To support `DAX` in your block driver, impleme    
191 block device operation.  It is used to transla    
192 (expressed in units of 512-byte sectors) to a     
193 that identifies the physical page for the memo    
194 kernel virtual address that can be used to acc    
195                                                   
196 The direct_access method takes a 'size' parame    
197 number of bytes being requested.  The function    
198 of bytes that can be contiguously accessed at     
199 return a negative errno if an error occurs.       
200                                                   
201 In order to support this method, the storage m    
202 the CPU at all times.  If your device uses pag    
203 a large amount of memory through a smaller win    
204 implement direct_access.  Equally, if your dev    
205 stall the CPU for an extended period, you shou    
206 implement direct_access.                          
207                                                   
208 These block devices may be used for inspiratio    
209 - brd: RAM backed block device driver             
210 - dcssblk: s390 dcss block device driver          
211 - pmem: NVDIMM persistent memory driver           
212                                                   
213                                                   
214 Implementation Tips for Filesystem Writers        
215 ------------------------------------------        
216                                                   
217 Filesystem support consists of:                   
218                                                   
219 * Adding support to mark inodes as being `DAX`    
220   i_flags                                         
221 * Implementing ->read_iter and ->write_iter op    
222   :c:func:`dax_iomap_rw()` when inode has `S_D    
223 * Implementing an mmap file operation for `DAX    
224   `VM_MIXEDMAP` and `VM_HUGEPAGE` flags on the    
225   include handlers for fault, pmd_fault, page_    
226   handlers should probably call :c:func:`dax_i    
227   appropriate fault size and iomap operations.    
228 * Calling :c:func:`iomap_zero_range()` passing    
229   instead of :c:func:`block_truncate_page()` f    
230 * Ensuring that there is sufficient locking be    
231   truncates and page faults                       
232                                                   
233 The iomap handlers for allocating blocks must     
234 are zeroed out and converted to written extent    
235 exposure of uninitialized data through mmap.      
236                                                   
237 These filesystems may be used for inspiration:    
238                                                   
239 .. seealso::                                      
240                                                   
241   ext2: see Documentation/filesystems/ext2.rst    
242                                                   
243 .. seealso::                                      
244                                                   
245   xfs:  see Documentation/admin-guide/xfs.rst     
246                                                   
247 .. seealso::                                      
248                                                   
249   ext4: see Documentation/filesystems/ext4/       
250                                                   
251                                                   
252 Handling Media Errors                             
253 ---------------------                             
254                                                   
255 The libnvdimm subsystem stores a record of kno    
256 each pmem block device (in gendisk->badblocks)    
257 or one with a latent error not yet discovered,    
258 to receive a `SIGBUS`. Libnvdimm also allows c    
259 writing the affected sectors (through the pmem    
260 NVDIMM supports the clear_poison DSM defined b    
261                                                   
262 Since `DAX` IO normally doesn't go through the    
263 sysadmins have an option to restore the lost d    
264 redundancy in the following ways:                 
265                                                   
266 1. Delete the affected file, and restore from     
267    This will free the filesystem blocks that w    
268    and the next time they're allocated, they w    
269    happens through the driver, and will clear     
270                                                   
271 2. Truncate or hole-punch the part of the file    
272    an entire aligned sector has to be hole-pun    
273    entire filesystem block).                      
274                                                   
275 These are the two basic paths that allow `DAX`    
276 in the presence of media errors. More robust e    
277 built on top of this in the future, for exampl    
278 provided at the block layer through DM, or add    
279 level. These would have to rely on the above t    
280 can happen either by sending an IO through the    
281 the driver).                                      
282                                                   
283                                                   
284 Shortcomings                                      
285 ------------                                      
286                                                   
287 Even if the kernel or its modules are stored o    
288 `DAX` on a block device that supports `DAX`, t    
289                                                   
290 The DAX code does not work correctly on archit    
291 mapped caches such as ARM, MIPS and SPARC.        
292                                                   
293 Calling :c:func:`get_user_pages()` on a range     
294 mmapped from a `DAX` file will fail when there    
295 those pages.  This problem has been addressed     
296 by adding optional struct page support for pag    
297 the driver (see `CONFIG_NVDIMM_PFN` in ``drive    
298 how to do this). In the non struct page cases     
299 those memory ranges from a non-`DAX` file will    
300                                                   
301                                                   
302 .. note::                                         
303                                                   
304   `O_DIRECT` reads/writes _of a `DAX` file do     
305   is being accessed that is key here).  Other     
306   the non struct page case include RDMA, :c:fu    
307   :c:func:`splice()`.                             
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php