~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/filesystems/iomap/operations.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/filesystems/iomap/operations.rst (Architecture sparc64) and /Documentation/filesystems/iomap/operations.rst (Architecture i386)


  1 .. SPDX-License-Identifier: GPL-2.0                 1 .. SPDX-License-Identifier: GPL-2.0
  2 .. _iomap_operations:                               2 .. _iomap_operations:
  3                                                     3 
  4 ..                                                  4 ..
  5         Dumb style notes to maintain the autho      5         Dumb style notes to maintain the author's sanity:
  6         Please try to start sentences on separ      6         Please try to start sentences on separate lines so that
  7         sentence changes don't bleed colors in      7         sentence changes don't bleed colors in diff.
  8         Heading decorations are documented in       8         Heading decorations are documented in sphinx.rst.
  9                                                     9 
 10 =========================                          10 =========================
 11 Supported File Operations                          11 Supported File Operations
 12 =========================                          12 =========================
 13                                                    13 
 14 .. contents:: Table of Contents                    14 .. contents:: Table of Contents
 15    :local:                                         15    :local:
 16                                                    16 
 17 Below are a discussion of the high level file      17 Below are a discussion of the high level file operations that iomap
 18 implements.                                        18 implements.
 19                                                    19 
 20 Buffered I/O                                       20 Buffered I/O
 21 ============                                       21 ============
 22                                                    22 
 23 Buffered I/O is the default file I/O path in L     23 Buffered I/O is the default file I/O path in Linux.
 24 File contents are cached in memory ("pagecache     24 File contents are cached in memory ("pagecache") to satisfy reads and
 25 writes.                                            25 writes.
 26 Dirty cache will be written back to disk at so     26 Dirty cache will be written back to disk at some point that can be
 27 forced via ``fsync`` and variants.                 27 forced via ``fsync`` and variants.
 28                                                    28 
 29 iomap implements nearly all the folio and page     29 iomap implements nearly all the folio and pagecache management that
 30 filesystems have to implement themselves under     30 filesystems have to implement themselves under the legacy I/O model.
 31 This means that the filesystem need not know t     31 This means that the filesystem need not know the details of allocating,
 32 mapping, managing uptodate and dirty state, or     32 mapping, managing uptodate and dirty state, or writeback of pagecache
 33 folios.                                            33 folios.
 34 Under the legacy I/O model, this was managed v     34 Under the legacy I/O model, this was managed very inefficiently with
 35 linked lists of buffer heads instead of the pe     35 linked lists of buffer heads instead of the per-folio bitmaps that iomap
 36 uses.                                              36 uses.
 37 Unless the filesystem explicitly opts in to bu     37 Unless the filesystem explicitly opts in to buffer heads, they will not
 38 be used, which makes buffered I/O much more ef     38 be used, which makes buffered I/O much more efficient, and the pagecache
 39 maintainer much happier.                           39 maintainer much happier.
 40                                                    40 
 41 ``struct address_space_operations``                41 ``struct address_space_operations``
 42 -----------------------------------                42 -----------------------------------
 43                                                    43 
 44 The following iomap functions can be reference     44 The following iomap functions can be referenced directly from the
 45 address space operations structure:                45 address space operations structure:
 46                                                    46 
 47  * ``iomap_dirty_folio``                           47  * ``iomap_dirty_folio``
 48  * ``iomap_release_folio``                         48  * ``iomap_release_folio``
 49  * ``iomap_invalidate_folio``                      49  * ``iomap_invalidate_folio``
 50  * ``iomap_is_partially_uptodate``                 50  * ``iomap_is_partially_uptodate``
 51                                                    51 
 52 The following address space operations can be      52 The following address space operations can be wrapped easily:
 53                                                    53 
 54  * ``read_folio``                                  54  * ``read_folio``
 55  * ``readahead``                                   55  * ``readahead``
 56  * ``writepages``                                  56  * ``writepages``
 57  * ``bmap``                                        57  * ``bmap``
 58  * ``swap_activate``                               58  * ``swap_activate``
 59                                                    59 
 60 ``struct iomap_folio_ops``                         60 ``struct iomap_folio_ops``
 61 --------------------------                         61 --------------------------
 62                                                    62 
 63 The ``->iomap_begin`` function for pagecache o     63 The ``->iomap_begin`` function for pagecache operations may set the
 64 ``struct iomap::folio_ops`` field to an ops st     64 ``struct iomap::folio_ops`` field to an ops structure to override
 65 default behaviors of iomap:                        65 default behaviors of iomap:
 66                                                    66 
 67 .. code-block:: c                                  67 .. code-block:: c
 68                                                    68 
 69  struct iomap_folio_ops {                          69  struct iomap_folio_ops {
 70      struct folio *(*get_folio)(struct iomap_i     70      struct folio *(*get_folio)(struct iomap_iter *iter, loff_t pos,
 71                                 unsigned len);     71                                 unsigned len);
 72      void (*put_folio)(struct inode *inode, lo     72      void (*put_folio)(struct inode *inode, loff_t pos, unsigned copied,
 73                        struct folio *folio);       73                        struct folio *folio);
 74      bool (*iomap_valid)(struct inode *inode,      74      bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap);
 75  };                                                75  };
 76                                                    76 
 77 iomap calls these functions:                       77 iomap calls these functions:
 78                                                    78 
 79   - ``get_folio``: Called to allocate and retu     79   - ``get_folio``: Called to allocate and return an active reference to
 80     a locked folio prior to starting a write.      80     a locked folio prior to starting a write.
 81     If this function is not provided, iomap wi     81     If this function is not provided, iomap will call
 82     ``iomap_get_folio``.                           82     ``iomap_get_folio``.
 83     This could be used to `set up per-folio fi     83     This could be used to `set up per-folio filesystem state
 84     <https://lore.kernel.org/all/20190429220934     84     <https://lore.kernel.org/all/20190429220934.10415-5-agruenba@redhat.com/">https://lore.kernel.org/all/20190429220934.10415-5-agruenba@redhat.com/>`_
 85     for a write.                                   85     for a write.
 86                                                    86 
 87   - ``put_folio``: Called to unlock and put a      87   - ``put_folio``: Called to unlock and put a folio after a pagecache
 88     operation completes.                           88     operation completes.
 89     If this function is not provided, iomap wi     89     If this function is not provided, iomap will ``folio_unlock`` and
 90     ``folio_put`` on its own.                      90     ``folio_put`` on its own.
 91     This could be used to `commit per-folio fi     91     This could be used to `commit per-folio filesystem state
 92     <https://lore.kernel.org/all/20180619164137     92     <https://lore.kernel.org/all/20180619164137.13720-6-hch@lst.de/">https://lore.kernel.org/all/20180619164137.13720-6-hch@lst.de/>`_
 93     that was set up by ``->get_folio``.            93     that was set up by ``->get_folio``.
 94                                                    94 
 95   - ``iomap_valid``: The filesystem may not ho     95   - ``iomap_valid``: The filesystem may not hold locks between
 96     ``->iomap_begin`` and ``->iomap_end`` beca     96     ``->iomap_begin`` and ``->iomap_end`` because pagecache operations
 97     can take folio locks, fault on userspace p     97     can take folio locks, fault on userspace pages, initiate writeback
 98     for memory reclamation, or engage in other     98     for memory reclamation, or engage in other time-consuming actions.
 99     If a file's space mapping data are mutable     99     If a file's space mapping data are mutable, it is possible that the
100     mapping for a particular pagecache folio c    100     mapping for a particular pagecache folio can `change in the time it
101     takes                                         101     takes
102     <https://lore.kernel.org/all/20221123055812    102     <https://lore.kernel.org/all/20221123055812.747923-8-david@fromorbit.com/">https://lore.kernel.org/all/20221123055812.747923-8-david@fromorbit.com/>`_
103     to allocate, install, and lock that folio.    103     to allocate, install, and lock that folio.
104                                                   104 
105     For the pagecache, races can happen if wri    105     For the pagecache, races can happen if writeback doesn't take
106     ``i_rwsem`` or ``invalidate_lock`` and upd    106     ``i_rwsem`` or ``invalidate_lock`` and updates mapping information.
107     Races can also happen if the filesytem all    107     Races can also happen if the filesytem allows concurrent writes.
108     For such files, the mapping *must* be reva    108     For such files, the mapping *must* be revalidated after the folio
109     lock has been taken so that iomap can mana    109     lock has been taken so that iomap can manage the folio correctly.
110                                                   110 
111     fsdax does not need this revalidation beca    111     fsdax does not need this revalidation because there's no writeback
112     and no support for unwritten extents.         112     and no support for unwritten extents.
113                                                   113 
114     Filesystems subject to this kind of race m    114     Filesystems subject to this kind of race must provide a
115     ``->iomap_valid`` function to decide if th    115     ``->iomap_valid`` function to decide if the mapping is still valid.
116     If the mapping is not valid, the mapping w    116     If the mapping is not valid, the mapping will be sampled again.
117                                                   117 
118     To support making the validity decision, t    118     To support making the validity decision, the filesystem's
119     ``->iomap_begin`` function may set ``struc    119     ``->iomap_begin`` function may set ``struct iomap::validity_cookie``
120     at the same time that it populates the oth    120     at the same time that it populates the other iomap fields.
121     A simple validation cookie implementation     121     A simple validation cookie implementation is a sequence counter.
122     If the filesystem bumps the sequence count    122     If the filesystem bumps the sequence counter every time it modifies
123     the inode's extent map, it can be placed i    123     the inode's extent map, it can be placed in the ``struct
124     iomap::validity_cookie`` during ``->iomap_    124     iomap::validity_cookie`` during ``->iomap_begin``.
125     If the value in the cookie is found to be     125     If the value in the cookie is found to be different to the value
126     the filesystem holds when the mapping is p    126     the filesystem holds when the mapping is passed back to
127     ``->iomap_valid``, then the iomap should c    127     ``->iomap_valid``, then the iomap should considered stale and the
128     validation failed.                            128     validation failed.
129                                                   129 
130 These ``struct kiocb`` flags are significant f    130 These ``struct kiocb`` flags are significant for buffered I/O with iomap:
131                                                   131 
132  * ``IOCB_NOWAIT``: Turns on ``IOMAP_NOWAIT``.    132  * ``IOCB_NOWAIT``: Turns on ``IOMAP_NOWAIT``.
133                                                   133 
134 Internal per-Folio State                          134 Internal per-Folio State
135 ------------------------                          135 ------------------------
136                                                   136 
137 If the fsblock size matches the size of a page    137 If the fsblock size matches the size of a pagecache folio, it is assumed
138 that all disk I/O operations will operate on t    138 that all disk I/O operations will operate on the entire folio.
139 The uptodate (memory contents are at least as     139 The uptodate (memory contents are at least as new as what's on disk) and
140 dirty (memory contents are newer than what's o    140 dirty (memory contents are newer than what's on disk) status of the
141 folio are all that's needed for this case.        141 folio are all that's needed for this case.
142                                                   142 
143 If the fsblock size is less than the size of a    143 If the fsblock size is less than the size of a pagecache folio, iomap
144 tracks the per-fsblock uptodate and dirty stat    144 tracks the per-fsblock uptodate and dirty state itself.
145 This enables iomap to handle both "bs < ps" `f    145 This enables iomap to handle both "bs < ps" `filesystems
146 <https://lore.kernel.org/all/20230725122932.144    146 <https://lore.kernel.org/all/20230725122932.144426-1-ritesh.list@gmail.com/">https://lore.kernel.org/all/20230725122932.144426-1-ritesh.list@gmail.com/>`_
147 and large folios in the pagecache.                147 and large folios in the pagecache.
148                                                   148 
149 iomap internally tracks two state bits per fsb    149 iomap internally tracks two state bits per fsblock:
150                                                   150 
151  * ``uptodate``: iomap will try to keep folios    151  * ``uptodate``: iomap will try to keep folios fully up to date.
152    If there are read(ahead) errors, those fsbl    152    If there are read(ahead) errors, those fsblocks will not be marked
153    uptodate.                                      153    uptodate.
154    The folio itself will be marked uptodate wh    154    The folio itself will be marked uptodate when all fsblocks within the
155    folio are uptodate.                            155    folio are uptodate.
156                                                   156 
157  * ``dirty``: iomap will set the per-block dir    157  * ``dirty``: iomap will set the per-block dirty state when programs
158    write to the file.                             158    write to the file.
159    The folio itself will be marked dirty when     159    The folio itself will be marked dirty when any fsblock within the
160    folio is dirty.                                160    folio is dirty.
161                                                   161 
162 iomap also tracks the amount of read and write    162 iomap also tracks the amount of read and write disk IOs that are in
163 flight.                                           163 flight.
164 This structure is much lighter weight than ``s    164 This structure is much lighter weight than ``struct buffer_head``
165 because there is only one per folio, and the p    165 because there is only one per folio, and the per-fsblock overhead is two
166 bits vs. 104 bytes.                               166 bits vs. 104 bytes.
167                                                   167 
168 Filesystems wishing to turn on large folios in    168 Filesystems wishing to turn on large folios in the pagecache should call
169 ``mapping_set_large_folios`` when initializing    169 ``mapping_set_large_folios`` when initializing the incore inode.
170                                                   170 
171 Buffered Readahead and Reads                      171 Buffered Readahead and Reads
172 ----------------------------                      172 ----------------------------
173                                                   173 
174 The ``iomap_readahead`` function initiates rea    174 The ``iomap_readahead`` function initiates readahead to the pagecache.
175 The ``iomap_read_folio`` function reads one fo    175 The ``iomap_read_folio`` function reads one folio's worth of data into
176 the pagecache.                                    176 the pagecache.
177 The ``flags`` argument to ``->iomap_begin`` wi    177 The ``flags`` argument to ``->iomap_begin`` will be set to zero.
178 The pagecache takes whatever locks it needs be    178 The pagecache takes whatever locks it needs before calling the
179 filesystem.                                       179 filesystem.
180                                                   180 
181 Buffered Writes                                   181 Buffered Writes
182 ---------------                                   182 ---------------
183                                                   183 
184 The ``iomap_file_buffered_write`` function wri    184 The ``iomap_file_buffered_write`` function writes an ``iocb`` to the
185 pagecache.                                        185 pagecache.
186 ``IOMAP_WRITE`` or ``IOMAP_WRITE`` | ``IOMAP_N    186 ``IOMAP_WRITE`` or ``IOMAP_WRITE`` | ``IOMAP_NOWAIT`` will be passed as
187 the ``flags`` argument to ``->iomap_begin``.      187 the ``flags`` argument to ``->iomap_begin``.
188 Callers commonly take ``i_rwsem`` in either sh    188 Callers commonly take ``i_rwsem`` in either shared or exclusive mode
189 before calling this function.                     189 before calling this function.
190                                                   190 
191 mmap Write Faults                                 191 mmap Write Faults
192 ~~~~~~~~~~~~~~~~~                                 192 ~~~~~~~~~~~~~~~~~
193                                                   193 
194 The ``iomap_page_mkwrite`` function handles a     194 The ``iomap_page_mkwrite`` function handles a write fault to a folio in
195 the pagecache.                                    195 the pagecache.
196 ``IOMAP_WRITE | IOMAP_FAULT`` will be passed a    196 ``IOMAP_WRITE | IOMAP_FAULT`` will be passed as the ``flags`` argument
197 to ``->iomap_begin``.                             197 to ``->iomap_begin``.
198 Callers commonly take the mmap ``invalidate_lo    198 Callers commonly take the mmap ``invalidate_lock`` in shared or
199 exclusive mode before calling this function.      199 exclusive mode before calling this function.
200                                                   200 
201 Buffered Write Failures                           201 Buffered Write Failures
202 ~~~~~~~~~~~~~~~~~~~~~~~                           202 ~~~~~~~~~~~~~~~~~~~~~~~
203                                                   203 
204 After a short write to the pagecache, the area    204 After a short write to the pagecache, the areas not written will not
205 become marked dirty.                              205 become marked dirty.
206 The filesystem must arrange to `cancel            206 The filesystem must arrange to `cancel
207 <https://lore.kernel.org/all/20221123055812.747    207 <https://lore.kernel.org/all/20221123055812.747923-6-david@fromorbit.com/">https://lore.kernel.org/all/20221123055812.747923-6-david@fromorbit.com/>`_
208 such `reservations                                208 such `reservations
209 <https://lore.kernel.org/linux-xfs/202208170936    209 <https://lore.kernel.org/linux-xfs/20220817093627.GZ3600936@dread.disaster.area/">https://lore.kernel.org/linux-xfs/20220817093627.GZ3600936@dread.disaster.area/>`_
210 because writeback will not consume the reserva    210 because writeback will not consume the reservation.
211 The ``iomap_write_delalloc_release`` can be ca    211 The ``iomap_write_delalloc_release`` can be called from a
212 ``->iomap_end`` function to find all the clean    212 ``->iomap_end`` function to find all the clean areas of the folios
213 caching a fresh (``IOMAP_F_NEW``) delalloc map    213 caching a fresh (``IOMAP_F_NEW``) delalloc mapping.
214 It takes the ``invalidate_lock``.                 214 It takes the ``invalidate_lock``.
215                                                   215 
216 The filesystem must supply a function ``punch`    216 The filesystem must supply a function ``punch`` to be called for
217 each file range in this state.                    217 each file range in this state.
218 This function must *only* remove delayed alloc    218 This function must *only* remove delayed allocation reservations, in
219 case another thread racing with the current th    219 case another thread racing with the current thread writes successfully
220 to the same region and triggers writeback to f    220 to the same region and triggers writeback to flush the dirty data out to
221 disk.                                             221 disk.
222                                                   222 
223 Zeroing for File Operations                       223 Zeroing for File Operations
224 ~~~~~~~~~~~~~~~~~~~~~~~~~~~                       224 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
225                                                   225 
226 Filesystems can call ``iomap_zero_range`` to p    226 Filesystems can call ``iomap_zero_range`` to perform zeroing of the
227 pagecache for non-truncation file operations t    227 pagecache for non-truncation file operations that are not aligned to
228 the fsblock size.                                 228 the fsblock size.
229 ``IOMAP_ZERO`` will be passed as the ``flags``    229 ``IOMAP_ZERO`` will be passed as the ``flags`` argument to
230 ``->iomap_begin``.                                230 ``->iomap_begin``.
231 Callers typically hold ``i_rwsem`` and ``inval    231 Callers typically hold ``i_rwsem`` and ``invalidate_lock`` in exclusive
232 mode before calling this function.                232 mode before calling this function.
233                                                   233 
234 Unsharing Reflinked File Data                     234 Unsharing Reflinked File Data
235 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                     235 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
236                                                   236 
237 Filesystems can call ``iomap_file_unshare`` to    237 Filesystems can call ``iomap_file_unshare`` to force a file sharing
238 storage with another file to preemptively copy    238 storage with another file to preemptively copy the shared data to newly
239 allocate storage.                                 239 allocate storage.
240 ``IOMAP_WRITE | IOMAP_UNSHARE`` will be passed    240 ``IOMAP_WRITE | IOMAP_UNSHARE`` will be passed as the ``flags`` argument
241 to ``->iomap_begin``.                             241 to ``->iomap_begin``.
242 Callers typically hold ``i_rwsem`` and ``inval    242 Callers typically hold ``i_rwsem`` and ``invalidate_lock`` in exclusive
243 mode before calling this function.                243 mode before calling this function.
244                                                   244 
245 Truncation                                        245 Truncation
246 ----------                                        246 ----------
247                                                   247 
248 Filesystems can call ``iomap_truncate_page`` t    248 Filesystems can call ``iomap_truncate_page`` to zero the bytes in the
249 pagecache from EOF to the end of the fsblock d    249 pagecache from EOF to the end of the fsblock during a file truncation
250 operation.                                        250 operation.
251 ``truncate_setsize`` or ``truncate_pagecache``    251 ``truncate_setsize`` or ``truncate_pagecache`` will take care of
252 everything after the EOF block.                   252 everything after the EOF block.
253 ``IOMAP_ZERO`` will be passed as the ``flags``    253 ``IOMAP_ZERO`` will be passed as the ``flags`` argument to
254 ``->iomap_begin``.                                254 ``->iomap_begin``.
255 Callers typically hold ``i_rwsem`` and ``inval    255 Callers typically hold ``i_rwsem`` and ``invalidate_lock`` in exclusive
256 mode before calling this function.                256 mode before calling this function.
257                                                   257 
258 Pagecache Writeback                               258 Pagecache Writeback
259 -------------------                               259 -------------------
260                                                   260 
261 Filesystems can call ``iomap_writepages`` to r    261 Filesystems can call ``iomap_writepages`` to respond to a request to
262 write dirty pagecache folios to disk.             262 write dirty pagecache folios to disk.
263 The ``mapping`` and ``wbc`` parameters should     263 The ``mapping`` and ``wbc`` parameters should be passed unchanged.
264 The ``wpc`` pointer should be allocated by the    264 The ``wpc`` pointer should be allocated by the filesystem and must
265 be initialized to zero.                           265 be initialized to zero.
266                                                   266 
267 The pagecache will lock each folio before tryi    267 The pagecache will lock each folio before trying to schedule it for
268 writeback.                                        268 writeback.
269 It does not lock ``i_rwsem`` or ``invalidate_l    269 It does not lock ``i_rwsem`` or ``invalidate_lock``.
270                                                   270 
271 The dirty bit will be cleared for all folios r    271 The dirty bit will be cleared for all folios run through the
272 ``->map_blocks`` machinery described below eve    272 ``->map_blocks`` machinery described below even if the writeback fails.
273 This is to prevent dirty folio clots when stor    273 This is to prevent dirty folio clots when storage devices fail; an
274 ``-EIO`` is recorded for userspace to collect     274 ``-EIO`` is recorded for userspace to collect via ``fsync``.
275                                                   275 
276 The ``ops`` structure must be specified and is    276 The ``ops`` structure must be specified and is as follows:
277                                                   277 
278 ``struct iomap_writeback_ops``                    278 ``struct iomap_writeback_ops``
279 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                    279 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
280                                                   280 
281 .. code-block:: c                                 281 .. code-block:: c
282                                                   282 
283  struct iomap_writeback_ops {                     283  struct iomap_writeback_ops {
284      int (*map_blocks)(struct iomap_writepage_    284      int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode,
285                        loff_t offset, unsigned    285                        loff_t offset, unsigned len);
286      int (*prepare_ioend)(struct iomap_ioend *    286      int (*prepare_ioend)(struct iomap_ioend *ioend, int status);
287      void (*discard_folio)(struct folio *folio    287      void (*discard_folio)(struct folio *folio, loff_t pos);
288  };                                               288  };
289                                                   289 
290 The fields are as follows:                        290 The fields are as follows:
291                                                   291 
292   - ``map_blocks``: Sets ``wpc->iomap`` to the    292   - ``map_blocks``: Sets ``wpc->iomap`` to the space mapping of the file
293     range (in bytes) given by ``offset`` and `    293     range (in bytes) given by ``offset`` and ``len``.
294     iomap calls this function for each dirty f    294     iomap calls this function for each dirty fs block in each dirty folio,
295     though it will `reuse mappings                295     though it will `reuse mappings
296     <https://lore.kernel.org/all/20231207072710    296     <https://lore.kernel.org/all/20231207072710.176093-15-hch@lst.de/">https://lore.kernel.org/all/20231207072710.176093-15-hch@lst.de/>`_
297     for runs of contiguous dirty fsblocks with    297     for runs of contiguous dirty fsblocks within a folio.
298     Do not return ``IOMAP_INLINE`` mappings he    298     Do not return ``IOMAP_INLINE`` mappings here; the ``->iomap_end``
299     function must deal with persisting written    299     function must deal with persisting written data.
300     Do not return ``IOMAP_DELALLOC`` mappings     300     Do not return ``IOMAP_DELALLOC`` mappings here; iomap currently
301     requires mapping to allocated space.          301     requires mapping to allocated space.
302     Filesystems can skip a potentially expensi    302     Filesystems can skip a potentially expensive mapping lookup if the
303     mappings have not changed.                    303     mappings have not changed.
304     This revalidation must be open-coded by th    304     This revalidation must be open-coded by the filesystem; it is
305     unclear if ``iomap::validity_cookie`` can     305     unclear if ``iomap::validity_cookie`` can be reused for this
306     purpose.                                      306     purpose.
307     This function must be supplied by the file    307     This function must be supplied by the filesystem.
308                                                   308 
309   - ``prepare_ioend``: Enables filesystems to     309   - ``prepare_ioend``: Enables filesystems to transform the writeback
310     ioend or perform any other preparatory wor    310     ioend or perform any other preparatory work before the writeback I/O
311     is submitted.                                 311     is submitted.
312     This might include pre-write space account    312     This might include pre-write space accounting updates, or installing
313     a custom ``->bi_end_io`` function for inte    313     a custom ``->bi_end_io`` function for internal purposes, such as
314     deferring the ioend completion to a workqu    314     deferring the ioend completion to a workqueue to run metadata update
315     transactions from process context.            315     transactions from process context.
316     This function is optional.                    316     This function is optional.
317                                                   317 
318   - ``discard_folio``: iomap calls this functi    318   - ``discard_folio``: iomap calls this function after ``->map_blocks``
319     fails to schedule I/O for any part of a di    319     fails to schedule I/O for any part of a dirty folio.
320     The function should throw away any reserva    320     The function should throw away any reservations that may have been
321     made for the write.                           321     made for the write.
322     The folio will be marked clean and an ``-E    322     The folio will be marked clean and an ``-EIO`` recorded in the
323     pagecache.                                    323     pagecache.
324     Filesystems can use this callback to `remo    324     Filesystems can use this callback to `remove
325     <https://lore.kernel.org/all/20201029163313    325     <https://lore.kernel.org/all/20201029163313.1766967-1-bfoster@redhat.com/">https://lore.kernel.org/all/20201029163313.1766967-1-bfoster@redhat.com/>`_
326     delalloc reservations to avoid having dela    326     delalloc reservations to avoid having delalloc reservations for
327     clean pagecache.                              327     clean pagecache.
328     This function is optional.                    328     This function is optional.
329                                                   329 
330 Pagecache Writeback Completion                    330 Pagecache Writeback Completion
331 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                    331 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
332                                                   332 
333 To handle the bookkeeping that must happen aft    333 To handle the bookkeeping that must happen after disk I/O for writeback
334 completes, iomap creates chains of ``struct io    334 completes, iomap creates chains of ``struct iomap_ioend`` objects that
335 wrap the ``bio`` that is used to write pagecac    335 wrap the ``bio`` that is used to write pagecache data to disk.
336 By default, iomap finishes writeback ioends by    336 By default, iomap finishes writeback ioends by clearing the writeback
337 bit on the folios attached to the ``ioend``.      337 bit on the folios attached to the ``ioend``.
338 If the write failed, it will also set the erro    338 If the write failed, it will also set the error bits on the folios and
339 the address space.                                339 the address space.
340 This can happen in interrupt or process contex    340 This can happen in interrupt or process context, depending on the
341 storage device.                                   341 storage device.
342                                                   342 
343 Filesystems that need to update internal bookk    343 Filesystems that need to update internal bookkeeping (e.g. unwritten
344 extent conversions) should provide a ``->prepa    344 extent conversions) should provide a ``->prepare_ioend`` function to
345 set ``struct iomap_end::bio::bi_end_io`` to it    345 set ``struct iomap_end::bio::bi_end_io`` to its own function.
346 This function should call ``iomap_finish_ioend    346 This function should call ``iomap_finish_ioends`` after finishing its
347 own work (e.g. unwritten extent conversion).      347 own work (e.g. unwritten extent conversion).
348                                                   348 
349 Some filesystems may wish to `amortize the cos    349 Some filesystems may wish to `amortize the cost of running metadata
350 transactions                                      350 transactions
351 <https://lore.kernel.org/all/20220120034733.221    351 <https://lore.kernel.org/all/20220120034733.221737-1-david@fromorbit.com/">https://lore.kernel.org/all/20220120034733.221737-1-david@fromorbit.com/>`_
352 for post-writeback updates by batching them.      352 for post-writeback updates by batching them.
353 They may also require transactions to run from    353 They may also require transactions to run from process context, which
354 implies punting batches to a workqueue.           354 implies punting batches to a workqueue.
355 iomap ioends contain a ``list_head`` to enable    355 iomap ioends contain a ``list_head`` to enable batching.
356                                                   356 
357 Given a batch of ioends, iomap has a few helpe    357 Given a batch of ioends, iomap has a few helpers to assist with
358 amortization:                                     358 amortization:
359                                                   359 
360  * ``iomap_sort_ioends``: Sort all the ioends     360  * ``iomap_sort_ioends``: Sort all the ioends in the list by file
361    offset.                                        361    offset.
362                                                   362 
363  * ``iomap_ioend_try_merge``: Given an ioend t    363  * ``iomap_ioend_try_merge``: Given an ioend that is not in any list and
364    a separate list of sorted ioends, merge as     364    a separate list of sorted ioends, merge as many of the ioends from
365    the head of the list into the given ioend.     365    the head of the list into the given ioend.
366    ioends can only be merged if the file range    366    ioends can only be merged if the file range and storage addresses are
367    contiguous; the unwritten and shared status    367    contiguous; the unwritten and shared status are the same; and the
368    write I/O outcome is the same.                 368    write I/O outcome is the same.
369    The merged ioends become their own list.       369    The merged ioends become their own list.
370                                                   370 
371  * ``iomap_finish_ioends``: Finish an ioend th    371  * ``iomap_finish_ioends``: Finish an ioend that possibly has other
372    ioends linked to it.                           372    ioends linked to it.
373                                                   373 
374 Direct I/O                                        374 Direct I/O
375 ==========                                        375 ==========
376                                                   376 
377 In Linux, direct I/O is defined as file I/O th    377 In Linux, direct I/O is defined as file I/O that is issued directly to
378 storage, bypassing the pagecache.                 378 storage, bypassing the pagecache.
379 The ``iomap_dio_rw`` function implements O_DIR    379 The ``iomap_dio_rw`` function implements O_DIRECT (direct I/O) reads and
380 writes for files.                                 380 writes for files.
381                                                   381 
382 .. code-block:: c                                 382 .. code-block:: c
383                                                   383 
384  ssize_t iomap_dio_rw(struct kiocb *iocb, stru    384  ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
385                       const struct iomap_ops *    385                       const struct iomap_ops *ops,
386                       const struct iomap_dio_o    386                       const struct iomap_dio_ops *dops,
387                       unsigned int dio_flags,     387                       unsigned int dio_flags, void *private,
388                       size_t done_before);        388                       size_t done_before);
389                                                   389 
390 The filesystem can provide the ``dops`` parame    390 The filesystem can provide the ``dops`` parameter if it needs to perform
391 extra work before or after the I/O is issued t    391 extra work before or after the I/O is issued to storage.
392 The ``done_before`` parameter tells the how mu    392 The ``done_before`` parameter tells the how much of the request has
393 already been transferred.                         393 already been transferred.
394 It is used to continue a request asynchronousl    394 It is used to continue a request asynchronously when `part of the
395 request                                           395 request
396 <https://git.kernel.org/pub/scm/linux/kernel/g    396 <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c03098d4b9ad76bca2966a8769dcfe59f7f85103>`_
397 has already been completed synchronously.         397 has already been completed synchronously.
398                                                   398 
399 The ``done_before`` parameter should be set if    399 The ``done_before`` parameter should be set if writes for the ``iocb``
400 have been initiated prior to the call.            400 have been initiated prior to the call.
401 The direction of the I/O is determined from th    401 The direction of the I/O is determined from the ``iocb`` passed in.
402                                                   402 
403 The ``dio_flags`` argument can be set to any c    403 The ``dio_flags`` argument can be set to any combination of the
404 following values:                                 404 following values:
405                                                   405 
406  * ``IOMAP_DIO_FORCE_WAIT``: Wait for the I/O     406  * ``IOMAP_DIO_FORCE_WAIT``: Wait for the I/O to complete even if the
407    kiocb is not synchronous.                      407    kiocb is not synchronous.
408                                                   408 
409  * ``IOMAP_DIO_OVERWRITE_ONLY``: Perform a pur    409  * ``IOMAP_DIO_OVERWRITE_ONLY``: Perform a pure overwrite for this range
410    or fail with ``-EAGAIN``.                      410    or fail with ``-EAGAIN``.
411    This can be used by filesystems with comple    411    This can be used by filesystems with complex unaligned I/O
412    write paths to provide an optimised fast pa    412    write paths to provide an optimised fast path for unaligned writes.
413    If a pure overwrite can be performed, then     413    If a pure overwrite can be performed, then serialisation against
414    other I/Os to the same filesystem block(s)     414    other I/Os to the same filesystem block(s) is unnecessary as there is
415    no risk of stale data exposure or data loss    415    no risk of stale data exposure or data loss.
416    If a pure overwrite cannot be performed, th    416    If a pure overwrite cannot be performed, then the filesystem can
417    perform the serialisation steps needed to p    417    perform the serialisation steps needed to provide exclusive access
418    to the unaligned I/O range so that it can p    418    to the unaligned I/O range so that it can perform allocation and
419    sub-block zeroing safely.                      419    sub-block zeroing safely.
420    Filesystems can use this flag to try to red    420    Filesystems can use this flag to try to reduce locking contention,
421    but a lot of `detailed checking                421    but a lot of `detailed checking
422    <https://lore.kernel.org/linux-ext4/20230314    422    <https://lore.kernel.org/linux-ext4/20230314130759.642710-1-bfoster@redhat.com/">https://lore.kernel.org/linux-ext4/20230314130759.642710-1-bfoster@redhat.com/>`_
423    is required to do it `correctly                423    is required to do it `correctly
424    <https://lore.kernel.org/linux-ext4/20230810    424    <https://lore.kernel.org/linux-ext4/20230810165559.946222-1-bfoster@redhat.com/">https://lore.kernel.org/linux-ext4/20230810165559.946222-1-bfoster@redhat.com/>`_.
425                                                   425 
426  * ``IOMAP_DIO_PARTIAL``: If a page fault occu    426  * ``IOMAP_DIO_PARTIAL``: If a page fault occurs, return whatever
427    progress has already been made.                427    progress has already been made.
428    The caller may deal with the page fault and    428    The caller may deal with the page fault and retry the operation.
429    If the caller decides to retry the operatio    429    If the caller decides to retry the operation, it should pass the
430    accumulated return values of all previous c    430    accumulated return values of all previous calls as the
431    ``done_before`` parameter to the next call.    431    ``done_before`` parameter to the next call.
432                                                   432 
433 These ``struct kiocb`` flags are significant f    433 These ``struct kiocb`` flags are significant for direct I/O with iomap:
434                                                   434 
435  * ``IOCB_NOWAIT``: Turns on ``IOMAP_NOWAIT``.    435  * ``IOCB_NOWAIT``: Turns on ``IOMAP_NOWAIT``.
436                                                   436 
437  * ``IOCB_SYNC``: Ensure that the device has p    437  * ``IOCB_SYNC``: Ensure that the device has persisted data to disk
438    before completing the call.                    438    before completing the call.
439    In the case of pure overwrites, the I/O may    439    In the case of pure overwrites, the I/O may be issued with FUA
440    enabled.                                       440    enabled.
441                                                   441 
442  * ``IOCB_HIPRI``: Poll for I/O completion ins    442  * ``IOCB_HIPRI``: Poll for I/O completion instead of waiting for an
443    interrupt.                                     443    interrupt.
444    Only meaningful for asynchronous I/O, and o    444    Only meaningful for asynchronous I/O, and only if the entire I/O can
445    be issued as a single ``struct bio``.          445    be issued as a single ``struct bio``.
446                                                   446 
447  * ``IOCB_DIO_CALLER_COMP``: Try to run I/O co    447  * ``IOCB_DIO_CALLER_COMP``: Try to run I/O completion from the caller's
448    process context.                               448    process context.
449    See ``linux/fs.h`` for more details.           449    See ``linux/fs.h`` for more details.
450                                                   450 
451 Filesystems should call ``iomap_dio_rw`` from     451 Filesystems should call ``iomap_dio_rw`` from ``->read_iter`` and
452 ``->write_iter``, and set ``FMODE_CAN_ODIRECT`    452 ``->write_iter``, and set ``FMODE_CAN_ODIRECT`` in the ``->open``
453 function for the file.                            453 function for the file.
454 They should not set ``->direct_IO``, which is     454 They should not set ``->direct_IO``, which is deprecated.
455                                                   455 
456 If a filesystem wishes to perform its own work    456 If a filesystem wishes to perform its own work before direct I/O
457 completion, it should call ``__iomap_dio_rw``.    457 completion, it should call ``__iomap_dio_rw``.
458 If its return value is not an error pointer or    458 If its return value is not an error pointer or a NULL pointer, the
459 filesystem should pass the return value to ``i    459 filesystem should pass the return value to ``iomap_dio_complete`` after
460 finishing its internal work.                      460 finishing its internal work.
461                                                   461 
462 Return Values                                     462 Return Values
463 -------------                                     463 -------------
464                                                   464 
465 ``iomap_dio_rw`` can return one of the followi    465 ``iomap_dio_rw`` can return one of the following:
466                                                   466 
467  * A non-negative number of bytes transferred.    467  * A non-negative number of bytes transferred.
468                                                   468 
469  * ``-ENOTBLK``: Fall back to buffered I/O.       469  * ``-ENOTBLK``: Fall back to buffered I/O.
470    iomap itself will return this value if it c    470    iomap itself will return this value if it cannot invalidate the page
471    cache before issuing the I/O to storage.       471    cache before issuing the I/O to storage.
472    The ``->iomap_begin`` or ``->iomap_end`` fu    472    The ``->iomap_begin`` or ``->iomap_end`` functions may also return
473    this value.                                    473    this value.
474                                                   474 
475  * ``-EIOCBQUEUED``: The asynchronous direct I    475  * ``-EIOCBQUEUED``: The asynchronous direct I/O request has been
476    queued and will be completed separately.       476    queued and will be completed separately.
477                                                   477 
478  * Any of the other negative error codes.         478  * Any of the other negative error codes.
479                                                   479 
480 Direct Reads                                      480 Direct Reads
481 ------------                                      481 ------------
482                                                   482 
483 A direct I/O read initiates a read I/O from th    483 A direct I/O read initiates a read I/O from the storage device to the
484 caller's buffer.                                  484 caller's buffer.
485 Dirty parts of the pagecache are flushed to st    485 Dirty parts of the pagecache are flushed to storage before initiating
486 the read io.                                      486 the read io.
487 The ``flags`` value for ``->iomap_begin`` will    487 The ``flags`` value for ``->iomap_begin`` will be ``IOMAP_DIRECT`` with
488 any combination of the following enhancements:    488 any combination of the following enhancements:
489                                                   489 
490  * ``IOMAP_NOWAIT``, as defined previously.       490  * ``IOMAP_NOWAIT``, as defined previously.
491                                                   491 
492 Callers commonly hold ``i_rwsem`` in shared mo    492 Callers commonly hold ``i_rwsem`` in shared mode before calling this
493 function.                                         493 function.
494                                                   494 
495 Direct Writes                                     495 Direct Writes
496 -------------                                     496 -------------
497                                                   497 
498 A direct I/O write initiates a write I/O to th    498 A direct I/O write initiates a write I/O to the storage device from the
499 caller's buffer.                                  499 caller's buffer.
500 Dirty parts of the pagecache are flushed to st    500 Dirty parts of the pagecache are flushed to storage before initiating
501 the write io.                                     501 the write io.
502 The pagecache is invalidated both before and a    502 The pagecache is invalidated both before and after the write io.
503 The ``flags`` value for ``->iomap_begin`` will    503 The ``flags`` value for ``->iomap_begin`` will be ``IOMAP_DIRECT |
504 IOMAP_WRITE`` with any combination of the foll    504 IOMAP_WRITE`` with any combination of the following enhancements:
505                                                   505 
506  * ``IOMAP_NOWAIT``, as defined previously.       506  * ``IOMAP_NOWAIT``, as defined previously.
507                                                   507 
508  * ``IOMAP_OVERWRITE_ONLY``: Allocating blocks    508  * ``IOMAP_OVERWRITE_ONLY``: Allocating blocks and zeroing partial
509    blocks is not allowed.                         509    blocks is not allowed.
510    The entire file range must map to a single     510    The entire file range must map to a single written or unwritten
511    extent.                                        511    extent.
512    The file I/O range must be aligned to the f    512    The file I/O range must be aligned to the filesystem block size
513    if the mapping is unwritten and the filesys    513    if the mapping is unwritten and the filesystem cannot handle zeroing
514    the unaligned regions without exposing stal    514    the unaligned regions without exposing stale contents.
515                                                   515 
516 Callers commonly hold ``i_rwsem`` in shared or    516 Callers commonly hold ``i_rwsem`` in shared or exclusive mode before
517 calling this function.                            517 calling this function.
518                                                   518 
519 ``struct iomap_dio_ops:``                         519 ``struct iomap_dio_ops:``
520 -------------------------                         520 -------------------------
521 .. code-block:: c                                 521 .. code-block:: c
522                                                   522 
523  struct iomap_dio_ops {                           523  struct iomap_dio_ops {
524      void (*submit_io)(const struct iomap_iter    524      void (*submit_io)(const struct iomap_iter *iter, struct bio *bio,
525                        loff_t file_offset);       525                        loff_t file_offset);
526      int (*end_io)(struct kiocb *iocb, ssize_t    526      int (*end_io)(struct kiocb *iocb, ssize_t size, int error,
527                    unsigned flags);               527                    unsigned flags);
528      struct bio_set *bio_set;                     528      struct bio_set *bio_set;
529  };                                               529  };
530                                                   530 
531 The fields of this structure are as follows:      531 The fields of this structure are as follows:
532                                                   532 
533   - ``submit_io``: iomap calls this function w    533   - ``submit_io``: iomap calls this function when it has constructed a
534     ``struct bio`` object for the I/O requeste    534     ``struct bio`` object for the I/O requested, and wishes to submit it
535     to the block device.                          535     to the block device.
536     If no function is provided, ``submit_bio``    536     If no function is provided, ``submit_bio`` will be called directly.
537     Filesystems that would like to perform add    537     Filesystems that would like to perform additional work before (e.g.
538     data replication for btrfs) should impleme    538     data replication for btrfs) should implement this function.
539                                                   539 
540   - ``end_io``: This is called after the ``str    540   - ``end_io``: This is called after the ``struct bio`` completes.
541     This function should perform post-write co    541     This function should perform post-write conversions of unwritten
542     extent mappings, handle write failures, et    542     extent mappings, handle write failures, etc.
543     The ``flags`` argument may be set to a com    543     The ``flags`` argument may be set to a combination of the following:
544                                                   544 
545     * ``IOMAP_DIO_UNWRITTEN``: The mapping was    545     * ``IOMAP_DIO_UNWRITTEN``: The mapping was unwritten, so the ioend
546       should mark the extent as written.          546       should mark the extent as written.
547                                                   547 
548     * ``IOMAP_DIO_COW``: Writing to the space     548     * ``IOMAP_DIO_COW``: Writing to the space in the mapping required a
549       copy on write operation, so the ioend sh    549       copy on write operation, so the ioend should switch mappings.
550                                                   550 
551   - ``bio_set``: This allows the filesystem to    551   - ``bio_set``: This allows the filesystem to provide a custom bio_set
552     for allocating direct I/O bios.               552     for allocating direct I/O bios.
553     This enables filesystems to `stash additio    553     This enables filesystems to `stash additional per-bio information
554     <https://lore.kernel.org/all/20220505201115    554     <https://lore.kernel.org/all/20220505201115.937837-3-hch@lst.de/">https://lore.kernel.org/all/20220505201115.937837-3-hch@lst.de/>`_
555     for private use.                              555     for private use.
556     If this field is NULL, generic ``struct bi    556     If this field is NULL, generic ``struct bio`` objects will be used.
557                                                   557 
558 Filesystems that want to perform extra work af    558 Filesystems that want to perform extra work after an I/O completion
559 should set a custom ``->bi_end_io`` function v    559 should set a custom ``->bi_end_io`` function via ``->submit_io``.
560 Afterwards, the custom endio function must cal    560 Afterwards, the custom endio function must call
561 ``iomap_dio_bio_end_io`` to finish the direct     561 ``iomap_dio_bio_end_io`` to finish the direct I/O.
562                                                   562 
563 DAX I/O                                           563 DAX I/O
564 =======                                           564 =======
565                                                   565 
566 Some storage devices can be directly mapped as    566 Some storage devices can be directly mapped as memory.
567 These devices support a new access mode known     567 These devices support a new access mode known as "fsdax" that allows
568 loads and stores through the CPU and memory co    568 loads and stores through the CPU and memory controller.
569                                                   569 
570 fsdax Reads                                       570 fsdax Reads
571 -----------                                       571 -----------
572                                                   572 
573 A fsdax read performs a memcpy from storage de    573 A fsdax read performs a memcpy from storage device to the caller's
574 buffer.                                           574 buffer.
575 The ``flags`` value for ``->iomap_begin`` will    575 The ``flags`` value for ``->iomap_begin`` will be ``IOMAP_DAX`` with any
576 combination of the following enhancements:        576 combination of the following enhancements:
577                                                   577 
578  * ``IOMAP_NOWAIT``, as defined previously.       578  * ``IOMAP_NOWAIT``, as defined previously.
579                                                   579 
580 Callers commonly hold ``i_rwsem`` in shared mo    580 Callers commonly hold ``i_rwsem`` in shared mode before calling this
581 function.                                         581 function.
582                                                   582 
583 fsdax Writes                                      583 fsdax Writes
584 ------------                                      584 ------------
585                                                   585 
586 A fsdax write initiates a memcpy to the storag    586 A fsdax write initiates a memcpy to the storage device from the caller's
587 buffer.                                           587 buffer.
588 The ``flags`` value for ``->iomap_begin`` will    588 The ``flags`` value for ``->iomap_begin`` will be ``IOMAP_DAX |
589 IOMAP_WRITE`` with any combination of the foll    589 IOMAP_WRITE`` with any combination of the following enhancements:
590                                                   590 
591  * ``IOMAP_NOWAIT``, as defined previously.       591  * ``IOMAP_NOWAIT``, as defined previously.
592                                                   592 
593  * ``IOMAP_OVERWRITE_ONLY``: The caller requir    593  * ``IOMAP_OVERWRITE_ONLY``: The caller requires a pure overwrite to be
594    performed from this mapping.                   594    performed from this mapping.
595    This requires the filesystem extent mapping    595    This requires the filesystem extent mapping to already exist as an
596    ``IOMAP_MAPPED`` type and span the entire r    596    ``IOMAP_MAPPED`` type and span the entire range of the write I/O
597    request.                                       597    request.
598    If the filesystem cannot map this request i    598    If the filesystem cannot map this request in a way that allows the
599    iomap infrastructure to perform a pure over    599    iomap infrastructure to perform a pure overwrite, it must fail the
600    mapping operation with ``-EAGAIN``.            600    mapping operation with ``-EAGAIN``.
601                                                   601 
602 Callers commonly hold ``i_rwsem`` in exclusive    602 Callers commonly hold ``i_rwsem`` in exclusive mode before calling this
603 function.                                         603 function.
604                                                   604 
605 fsdax mmap Faults                                 605 fsdax mmap Faults
606 ~~~~~~~~~~~~~~~~~                                 606 ~~~~~~~~~~~~~~~~~
607                                                   607 
608 The ``dax_iomap_fault`` function handles read     608 The ``dax_iomap_fault`` function handles read and write faults to fsdax
609 storage.                                          609 storage.
610 For a read fault, ``IOMAP_DAX | IOMAP_FAULT``     610 For a read fault, ``IOMAP_DAX | IOMAP_FAULT`` will be passed as the
611 ``flags`` argument to ``->iomap_begin``.          611 ``flags`` argument to ``->iomap_begin``.
612 For a write fault, ``IOMAP_DAX | IOMAP_FAULT |    612 For a write fault, ``IOMAP_DAX | IOMAP_FAULT | IOMAP_WRITE`` will be
613 passed as the ``flags`` argument to ``->iomap_    613 passed as the ``flags`` argument to ``->iomap_begin``.
614                                                   614 
615 Callers commonly hold the same locks as they d    615 Callers commonly hold the same locks as they do to call their iomap
616 pagecache counterparts.                           616 pagecache counterparts.
617                                                   617 
618 fsdax Truncation, fallocate, and Unsharing        618 fsdax Truncation, fallocate, and Unsharing
619 ------------------------------------------        619 ------------------------------------------
620                                                   620 
621 For fsdax files, the following functions are p    621 For fsdax files, the following functions are provided to replace their
622 iomap pagecache I/O counterparts.                 622 iomap pagecache I/O counterparts.
623 The ``flags`` argument to ``->iomap_begin`` ar    623 The ``flags`` argument to ``->iomap_begin`` are the same as the
624 pagecache counterparts, with ``IOMAP_DAX`` add    624 pagecache counterparts, with ``IOMAP_DAX`` added.
625                                                   625 
626  * ``dax_file_unshare``                           626  * ``dax_file_unshare``
627  * ``dax_zero_range``                             627  * ``dax_zero_range``
628  * ``dax_truncate_page``                          628  * ``dax_truncate_page``
629                                                   629 
630 Callers commonly hold the same locks as they d    630 Callers commonly hold the same locks as they do to call their iomap
631 pagecache counterparts.                           631 pagecache counterparts.
632                                                   632 
633 fsdax Deduplication                               633 fsdax Deduplication
634 -------------------                               634 -------------------
635                                                   635 
636 Filesystems implementing the ``FIDEDUPERANGE``    636 Filesystems implementing the ``FIDEDUPERANGE`` ioctl must call the
637 ``dax_remap_file_range_prep`` function with th    637 ``dax_remap_file_range_prep`` function with their own iomap read ops.
638                                                   638 
639 Seeking Files                                     639 Seeking Files
640 =============                                     640 =============
641                                                   641 
642 iomap implements the two iterating whence mode    642 iomap implements the two iterating whence modes of the ``llseek`` system
643 call.                                             643 call.
644                                                   644 
645 SEEK_DATA                                         645 SEEK_DATA
646 ---------                                         646 ---------
647                                                   647 
648 The ``iomap_seek_data`` function implements th    648 The ``iomap_seek_data`` function implements the SEEK_DATA "whence" value
649 for llseek.                                       649 for llseek.
650 ``IOMAP_REPORT`` will be passed as the ``flags    650 ``IOMAP_REPORT`` will be passed as the ``flags`` argument to
651 ``->iomap_begin``.                                651 ``->iomap_begin``.
652                                                   652 
653 For unwritten mappings, the pagecache will be     653 For unwritten mappings, the pagecache will be searched.
654 Regions of the pagecache with a folio mapped a    654 Regions of the pagecache with a folio mapped and uptodate fsblocks
655 within those folios will be reported as data a    655 within those folios will be reported as data areas.
656                                                   656 
657 Callers commonly hold ``i_rwsem`` in shared mo    657 Callers commonly hold ``i_rwsem`` in shared mode before calling this
658 function.                                         658 function.
659                                                   659 
660 SEEK_HOLE                                         660 SEEK_HOLE
661 ---------                                         661 ---------
662                                                   662 
663 The ``iomap_seek_hole`` function implements th    663 The ``iomap_seek_hole`` function implements the SEEK_HOLE "whence" value
664 for llseek.                                       664 for llseek.
665 ``IOMAP_REPORT`` will be passed as the ``flags    665 ``IOMAP_REPORT`` will be passed as the ``flags`` argument to
666 ``->iomap_begin``.                                666 ``->iomap_begin``.
667                                                   667 
668 For unwritten mappings, the pagecache will be     668 For unwritten mappings, the pagecache will be searched.
669 Regions of the pagecache with no folio mapped,    669 Regions of the pagecache with no folio mapped, or a !uptodate fsblock
670 within a folio will be reported as sparse hole    670 within a folio will be reported as sparse hole areas.
671                                                   671 
672 Callers commonly hold ``i_rwsem`` in shared mo    672 Callers commonly hold ``i_rwsem`` in shared mode before calling this
673 function.                                         673 function.
674                                                   674 
675 Swap File Activation                              675 Swap File Activation
676 ====================                              676 ====================
677                                                   677 
678 The ``iomap_swapfile_activate`` function finds    678 The ``iomap_swapfile_activate`` function finds all the base-page aligned
679 regions in a file and sets them up as swap spa    679 regions in a file and sets them up as swap space.
680 The file will be ``fsync()``'d before activati    680 The file will be ``fsync()``'d before activation.
681 ``IOMAP_REPORT`` will be passed as the ``flags    681 ``IOMAP_REPORT`` will be passed as the ``flags`` argument to
682 ``->iomap_begin``.                                682 ``->iomap_begin``.
683 All mappings must be mapped or unwritten; cann    683 All mappings must be mapped or unwritten; cannot be dirty or shared, and
684 cannot span multiple block devices.               684 cannot span multiple block devices.
685 Callers must hold ``i_rwsem`` in exclusive mod    685 Callers must hold ``i_rwsem`` in exclusive mode; this is already
686 provided by ``swapon``.                           686 provided by ``swapon``.
687                                                   687 
688 File Space Mapping Reporting                      688 File Space Mapping Reporting
689 ============================                      689 ============================
690                                                   690 
691 iomap implements two of the file space mapping    691 iomap implements two of the file space mapping system calls.
692                                                   692 
693 FS_IOC_FIEMAP                                     693 FS_IOC_FIEMAP
694 -------------                                     694 -------------
695                                                   695 
696 The ``iomap_fiemap`` function exports file ext    696 The ``iomap_fiemap`` function exports file extent mappings to userspace
697 in the format specified by the ``FS_IOC_FIEMAP    697 in the format specified by the ``FS_IOC_FIEMAP`` ioctl.
698 ``IOMAP_REPORT`` will be passed as the ``flags    698 ``IOMAP_REPORT`` will be passed as the ``flags`` argument to
699 ``->iomap_begin``.                                699 ``->iomap_begin``.
700 Callers commonly hold ``i_rwsem`` in shared mo    700 Callers commonly hold ``i_rwsem`` in shared mode before calling this
701 function.                                         701 function.
702                                                   702 
703 FIBMAP (deprecated)                               703 FIBMAP (deprecated)
704 -------------------                               704 -------------------
705                                                   705 
706 ``iomap_bmap`` implements FIBMAP.                 706 ``iomap_bmap`` implements FIBMAP.
707 The calling conventions are the same as for FI    707 The calling conventions are the same as for FIEMAP.
708 This function is only provided to maintain com    708 This function is only provided to maintain compatibility for filesystems
709 that implemented FIBMAP prior to conversion.      709 that implemented FIBMAP prior to conversion.
710 This ioctl is deprecated; do **not** add a FIB    710 This ioctl is deprecated; do **not** add a FIBMAP implementation to
711 filesystems that do not have it.                  711 filesystems that do not have it.
712 Callers should probably hold ``i_rwsem`` in sh    712 Callers should probably hold ``i_rwsem`` in shared mode before calling
713 this function, but this is unclear.               713 this function, but this is unclear.
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php