~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/block/biovecs.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 ======================================
  2 Immutable biovecs and biovec iterators
  3 ======================================
  4 
  5 Kent Overstreet <kmo@daterainc.com>
  6 
  7 As of 3.13, biovecs should never be modified after a bio has been submitted.
  8 Instead, we have a new struct bvec_iter which represents a range of a biovec -
  9 the iterator will be modified as the bio is completed, not the biovec.
 10 
 11 More specifically, old code that needed to partially complete a bio would
 12 update bi_sector and bi_size, and advance bi_idx to the next biovec. If it
 13 ended up partway through a biovec, it would increment bv_offset and decrement
 14 bv_len by the number of bytes completed in that biovec.
 15 
 16 In the new scheme of things, everything that must be mutated in order to
 17 partially complete a bio is segregated into struct bvec_iter: bi_sector,
 18 bi_size and bi_idx have been moved there; and instead of modifying bv_offset
 19 and bv_len, struct bvec_iter has bi_bvec_done, which represents the number of
 20 bytes completed in the current bvec.
 21 
 22 There are a bunch of new helper macros for hiding the gory details - in
 23 particular, presenting the illusion of partially completed biovecs so that
 24 normal code doesn't have to deal with bi_bvec_done.
 25 
 26  * Driver code should no longer refer to biovecs directly; we now have
 27    bio_iovec() and bio_iter_iovec() macros that return literal struct biovecs,
 28    constructed from the raw biovecs but taking into account bi_bvec_done and
 29    bi_size.
 30 
 31    bio_for_each_segment() has been updated to take a bvec_iter argument
 32    instead of an integer (that corresponded to bi_idx); for a lot of code the
 33    conversion just required changing the types of the arguments to
 34    bio_for_each_segment().
 35 
 36  * Advancing a bvec_iter is done with bio_advance_iter(); bio_advance() is a
 37    wrapper around bio_advance_iter() that operates on bio->bi_iter, and also
 38    advances the bio integrity's iter if present.
 39 
 40    There is a lower level advance function - bvec_iter_advance() - which takes
 41    a pointer to a biovec, not a bio; this is used by the bio integrity code.
 42 
 43 As of 5.12 bvec segments with zero bv_len are not supported.
 44 
 45 What's all this get us?
 46 =======================
 47 
 48 Having a real iterator, and making biovecs immutable, has a number of
 49 advantages:
 50 
 51  * Before, iterating over bios was very awkward when you weren't processing
 52    exactly one bvec at a time - for example, bio_copy_data() in block/bio.c,
 53    which copies the contents of one bio into another. Because the biovecs
 54    wouldn't necessarily be the same size, the old code was tricky convoluted -
 55    it had to walk two different bios at the same time, keeping both bi_idx and
 56    and offset into the current biovec for each.
 57 
 58    The new code is much more straightforward - have a look. This sort of
 59    pattern comes up in a lot of places; a lot of drivers were essentially open
 60    coding bvec iterators before, and having common implementation considerably
 61    simplifies a lot of code.
 62 
 63  * Before, any code that might need to use the biovec after the bio had been
 64    completed (perhaps to copy the data somewhere else, or perhaps to resubmit
 65    it somewhere else if there was an error) had to save the entire bvec array
 66    - again, this was being done in a fair number of places.
 67 
 68  * Biovecs can be shared between multiple bios - a bvec iter can represent an
 69    arbitrary range of an existing biovec, both starting and ending midway
 70    through biovecs. This is what enables efficient splitting of arbitrary
 71    bios. Note that this means we _only_ use bi_size to determine when we've
 72    reached the end of a bio, not bi_vcnt - and the bio_iovec() macro takes
 73    bi_size into account when constructing biovecs.
 74 
 75  * Splitting bios is now much simpler. The old bio_split() didn't even work on
 76    bios with more than a single bvec! Now, we can efficiently split arbitrary
 77    size bios - because the new bio can share the old bio's biovec.
 78 
 79    Care must be taken to ensure the biovec isn't freed while the split bio is
 80    still using it, in case the original bio completes first, though. Using
 81    bio_chain() when splitting bios helps with this.
 82 
 83  * Submitting partially completed bios is now perfectly fine - this comes up
 84    occasionally in stacking block drivers and various code (e.g. md and
 85    bcache) had some ugly workarounds for this.
 86 
 87    It used to be the case that submitting a partially completed bio would work
 88    fine to _most_ devices, but since accessing the raw bvec array was the
 89    norm, not all drivers would respect bi_idx and those would break. Now,
 90    since all drivers _must_ go through the bvec iterator - and have been
 91    audited to make sure they are - submitting partially completed bios is
 92    perfectly fine.
 93 
 94 Other implications:
 95 ===================
 96 
 97  * Almost all usage of bi_idx is now incorrect and has been removed; instead,
 98    where previously you would have used bi_idx you'd now use a bvec_iter,
 99    probably passing it to one of the helper macros.
100 
101    I.e. instead of using bio_iovec_idx() (or bio->bi_iovec[bio->bi_idx]), you
102    now use bio_iter_iovec(), which takes a bvec_iter and returns a
103    literal struct bio_vec - constructed on the fly from the raw biovec but
104    taking into account bi_bvec_done (and bi_size).
105 
106  * bi_vcnt can't be trusted or relied upon by driver code - i.e. anything that
107    doesn't actually own the bio. The reason is twofold: firstly, it's not
108    actually needed for iterating over the bio anymore - we only use bi_size.
109    Secondly, when cloning a bio and reusing (a portion of) the original bio's
110    biovec, in order to calculate bi_vcnt for the new bio we'd have to iterate
111    over all the biovecs in the new bio - which is silly as it's not needed.
112 
113    So, don't use bi_vcnt anymore.
114 
115  * The current interface allows the block layer to split bios as needed, so we
116    could eliminate a lot of complexity particularly in stacked drivers. Code
117    that creates bios can then create whatever size bios are convenient, and
118    more importantly stacked drivers don't have to deal with both their own bio
119    size limitations and the limitations of the underlying devices. Thus
120    there's no need to define ->merge_bvec_fn() callbacks for individual block
121    drivers.
122 
123 Usage of helpers:
124 =================
125 
126 * The following helpers whose names have the suffix of `_all` can only be used
127   on non-BIO_CLONED bio. They are usually used by filesystem code. Drivers
128   shouldn't use them because the bio may have been split before it reached the
129   driver.
130 
131 ::
132 
133         bio_for_each_segment_all()
134         bio_for_each_bvec_all()
135         bio_first_bvec_all()
136         bio_first_page_all()
137         bio_first_folio_all()
138         bio_last_bvec_all()
139 
140 * The following helpers iterate over single-page segment. The passed 'struct
141   bio_vec' will contain a single-page IO vector during the iteration::
142 
143         bio_for_each_segment()
144         bio_for_each_segment_all()
145 
146 * The following helpers iterate over multi-page bvec. The passed 'struct
147   bio_vec' will contain a multi-page IO vector during the iteration::
148 
149         bio_for_each_bvec()
150         bio_for_each_bvec_all()
151         rq_for_each_bvec()

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php