~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/admin-guide/device-mapper/dm-zoned.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/admin-guide/device-mapper/dm-zoned.rst (Version linux-6.12-rc7) and /Documentation/admin-guide/device-mapper/dm-zoned.rst (Version linux-5.16.20)


  1 ========                                            1 ========
  2 dm-zoned                                            2 dm-zoned
  3 ========                                            3 ========
  4                                                     4 
  5 The dm-zoned device mapper target exposes a zo      5 The dm-zoned device mapper target exposes a zoned block device (ZBC and
  6 ZAC compliant devices) as a regular block devi      6 ZAC compliant devices) as a regular block device without any write
  7 pattern constraints. In effect, it implements       7 pattern constraints. In effect, it implements a drive-managed zoned
  8 block device which hides from the user (a file      8 block device which hides from the user (a file system or an application
  9 doing raw block device accesses) the sequentia      9 doing raw block device accesses) the sequential write constraints of
 10 host-managed zoned block devices and can mitig     10 host-managed zoned block devices and can mitigate the potential
 11 device-side performance degradation due to exc     11 device-side performance degradation due to excessive random writes on
 12 host-aware zoned block devices.                    12 host-aware zoned block devices.
 13                                                    13 
 14 For a more detailed description of the zoned b     14 For a more detailed description of the zoned block device models and
 15 their constraints see (for SCSI devices):          15 their constraints see (for SCSI devices):
 16                                                    16 
 17 https://www.t10.org/drafts.htm#ZBC_Family          17 https://www.t10.org/drafts.htm#ZBC_Family
 18                                                    18 
 19 and (for ATA devices):                             19 and (for ATA devices):
 20                                                    20 
 21 http://www.t13.org/Documents/UploadedDocuments     21 http://www.t13.org/Documents/UploadedDocuments/docs2015/di537r05-Zoned_Device_ATA_Command_Set_ZAC.pdf
 22                                                    22 
 23 The dm-zoned implementation is simple and mini     23 The dm-zoned implementation is simple and minimizes system overhead (CPU
 24 and memory usage as well as storage capacity l     24 and memory usage as well as storage capacity loss). For a 10TB
 25 host-managed disk with 256 MB zones, dm-zoned      25 host-managed disk with 256 MB zones, dm-zoned memory usage per disk
 26 instance is at most 4.5 MB and as little as 5      26 instance is at most 4.5 MB and as little as 5 zones will be used
 27 internally for storing metadata and performing     27 internally for storing metadata and performing reclaim operations.
 28                                                    28 
 29 dm-zoned target devices are formatted and chec     29 dm-zoned target devices are formatted and checked using the dmzadm
 30 utility available at:                              30 utility available at:
 31                                                    31 
 32 https://github.com/hgst/dm-zoned-tools             32 https://github.com/hgst/dm-zoned-tools
 33                                                    33 
 34 Algorithm                                          34 Algorithm
 35 =========                                          35 =========
 36                                                    36 
 37 dm-zoned implements an on-disk buffering schem     37 dm-zoned implements an on-disk buffering scheme to handle non-sequential
 38 write accesses to the sequential zones of a zo     38 write accesses to the sequential zones of a zoned block device.
 39 Conventional zones are used for caching as wel     39 Conventional zones are used for caching as well as for storing internal
 40 metadata. It can also use a regular block devi     40 metadata. It can also use a regular block device together with the zoned
 41 block device; in that case the regular block d     41 block device; in that case the regular block device will be split logically
 42 in zones with the same size as the zoned block     42 in zones with the same size as the zoned block device. These zones will be
 43 placed in front of the zones from the zoned bl     43 placed in front of the zones from the zoned block device and will be handled
 44 just like conventional zones.                      44 just like conventional zones.
 45                                                    45 
 46 The zones of the device(s) are separated into      46 The zones of the device(s) are separated into 2 types:
 47                                                    47 
 48 1) Metadata zones: these are conventional zone     48 1) Metadata zones: these are conventional zones used to store metadata.
 49 Metadata zones are not reported as usable capa !!  49 Metadata zones are not reported as useable capacity to the user.
 50                                                    50 
 51 2) Data zones: all remaining zones, the vast m     51 2) Data zones: all remaining zones, the vast majority of which will be
 52 sequential zones used exclusively to store use     52 sequential zones used exclusively to store user data. The conventional
 53 zones of the device may be used also for buffe     53 zones of the device may be used also for buffering user random writes.
 54 Data in these zones may be directly mapped to      54 Data in these zones may be directly mapped to the conventional zone, but
 55 later moved to a sequential zone so that the c     55 later moved to a sequential zone so that the conventional zone can be
 56 reused for buffering incoming random writes.       56 reused for buffering incoming random writes.
 57                                                    57 
 58 dm-zoned exposes a logical device with a secto     58 dm-zoned exposes a logical device with a sector size of 4096 bytes,
 59 irrespective of the physical sector size of th     59 irrespective of the physical sector size of the backend zoned block
 60 device being used. This allows reducing the am     60 device being used. This allows reducing the amount of metadata needed to
 61 manage valid blocks (blocks written).              61 manage valid blocks (blocks written).
 62                                                    62 
 63 The on-disk metadata format is as follows:         63 The on-disk metadata format is as follows:
 64                                                    64 
 65 1) The first block of the first conventional z     65 1) The first block of the first conventional zone found contains the
 66 super block which describes the on disk amount     66 super block which describes the on disk amount and position of metadata
 67 blocks.                                            67 blocks.
 68                                                    68 
 69 2) Following the super block, a set of blocks      69 2) Following the super block, a set of blocks is used to describe the
 70 mapping of the logical device blocks. The mapp     70 mapping of the logical device blocks. The mapping is done per chunk of
 71 blocks, with the chunk size equal to the zoned     71 blocks, with the chunk size equal to the zoned block device size. The
 72 mapping table is indexed by chunk number and e     72 mapping table is indexed by chunk number and each mapping entry
 73 indicates the zone number of the device storin     73 indicates the zone number of the device storing the chunk of data. Each
 74 mapping entry may also indicate if the zone nu     74 mapping entry may also indicate if the zone number of a conventional
 75 zone used to buffer random modification to the     75 zone used to buffer random modification to the data zone.
 76                                                    76 
 77 3) A set of blocks used to store bitmaps indic     77 3) A set of blocks used to store bitmaps indicating the validity of
 78 blocks in the data zones follows the mapping t     78 blocks in the data zones follows the mapping table. A valid block is
 79 defined as a block that was written and not di     79 defined as a block that was written and not discarded. For a buffered
 80 data chunk, a block is always valid only in th     80 data chunk, a block is always valid only in the data zone mapping the
 81 chunk or in the buffer zone of the chunk.          81 chunk or in the buffer zone of the chunk.
 82                                                    82 
 83 For a logical chunk mapped to a conventional z     83 For a logical chunk mapped to a conventional zone, all write operations
 84 are processed by directly writing to the zone.     84 are processed by directly writing to the zone. If the mapping zone is a
 85 sequential zone, the write operation is proces     85 sequential zone, the write operation is processed directly only if the
 86 write offset within the logical chunk is equal     86 write offset within the logical chunk is equal to the write pointer
 87 offset within of the sequential data zone (i.e     87 offset within of the sequential data zone (i.e. the write operation is
 88 aligned on the zone write pointer). Otherwise,     88 aligned on the zone write pointer). Otherwise, write operations are
 89 processed indirectly using a buffer zone. In t     89 processed indirectly using a buffer zone. In that case, an unused
 90 conventional zone is allocated and assigned to     90 conventional zone is allocated and assigned to the chunk being
 91 accessed. Writing a block to the buffer zone o     91 accessed. Writing a block to the buffer zone of a chunk will
 92 automatically invalidate the same block in the     92 automatically invalidate the same block in the sequential zone mapping
 93 the chunk. If all blocks of the sequential zon     93 the chunk. If all blocks of the sequential zone become invalid, the zone
 94 is freed and the chunk buffer zone becomes the     94 is freed and the chunk buffer zone becomes the primary zone mapping the
 95 chunk, resulting in native random write perfor     95 chunk, resulting in native random write performance similar to a regular
 96 block device.                                      96 block device.
 97                                                    97 
 98 Read operations are processed according to the     98 Read operations are processed according to the block validity
 99 information provided by the bitmaps. Valid blo     99 information provided by the bitmaps. Valid blocks are read either from
100 the sequential zone mapping a chunk, or if the    100 the sequential zone mapping a chunk, or if the chunk is buffered, from
101 the buffer zone assigned. If the accessed chun    101 the buffer zone assigned. If the accessed chunk has no mapping, or the
102 accessed blocks are invalid, the read buffer i    102 accessed blocks are invalid, the read buffer is zeroed and the read
103 operation terminated.                             103 operation terminated.
104                                                   104 
105 After some time, the limited number of convent    105 After some time, the limited number of conventional zones available may
106 be exhausted (all used to map chunks or buffer    106 be exhausted (all used to map chunks or buffer sequential zones) and
107 unaligned writes to unbuffered chunks become i    107 unaligned writes to unbuffered chunks become impossible. To avoid this
108 situation, a reclaim process regularly scans u    108 situation, a reclaim process regularly scans used conventional zones and
109 tries to reclaim the least recently used zones    109 tries to reclaim the least recently used zones by copying the valid
110 blocks of the buffer zone to a free sequential    110 blocks of the buffer zone to a free sequential zone. Once the copy
111 completes, the chunk mapping is updated to poi    111 completes, the chunk mapping is updated to point to the sequential zone
112 and the buffer zone freed for reuse.              112 and the buffer zone freed for reuse.
113                                                   113 
114 Metadata Protection                               114 Metadata Protection
115 ===================                               115 ===================
116                                                   116 
117 To protect metadata against corruption in case    117 To protect metadata against corruption in case of sudden power loss or
118 system crash, 2 sets of metadata zones are use    118 system crash, 2 sets of metadata zones are used. One set, the primary
119 set, is used as the main metadata region, whil    119 set, is used as the main metadata region, while the secondary set is
120 used as a staging area. Modified metadata is f    120 used as a staging area. Modified metadata is first written to the
121 secondary set and validated by updating the su    121 secondary set and validated by updating the super block in the secondary
122 set, a generation counter is used to indicate     122 set, a generation counter is used to indicate that this set contains the
123 newest metadata. Once this operation completes    123 newest metadata. Once this operation completes, in place of metadata
124 block updates can be done in the primary metad    124 block updates can be done in the primary metadata set. This ensures that
125 one of the set is always consistent (all modif    125 one of the set is always consistent (all modifications committed or none
126 at all). Flush operations are used as a commit    126 at all). Flush operations are used as a commit point. Upon reception of
127 a flush request, metadata modification activit    127 a flush request, metadata modification activity is temporarily blocked
128 (for both incoming BIO processing and reclaim     128 (for both incoming BIO processing and reclaim process) and all dirty
129 metadata blocks are staged and updated. Normal    129 metadata blocks are staged and updated. Normal operation is then
130 resumed. Flushing metadata thus only temporari    130 resumed. Flushing metadata thus only temporarily delays write and
131 discard requests. Read requests can be process    131 discard requests. Read requests can be processed concurrently while
132 metadata flush is being executed.                 132 metadata flush is being executed.
133                                                   133 
134 If a regular device is used in conjunction wit    134 If a regular device is used in conjunction with the zoned block device,
135 a third set of metadata (without the zone bitm    135 a third set of metadata (without the zone bitmaps) is written to the
136 start of the zoned block device. This metadata    136 start of the zoned block device. This metadata has a generation counter of
137 '0' and will never be updated during normal op    137 '0' and will never be updated during normal operation; it just serves for
138 identification purposes. The first and second     138 identification purposes. The first and second copy of the metadata
139 are located at the start of the regular block     139 are located at the start of the regular block device.
140                                                   140 
141 Usage                                             141 Usage
142 =====                                             142 =====
143                                                   143 
144 A zoned block device must first be formatted u    144 A zoned block device must first be formatted using the dmzadm tool. This
145 will analyze the device zone configuration, de    145 will analyze the device zone configuration, determine where to place the
146 metadata sets on the device and initialize the    146 metadata sets on the device and initialize the metadata sets.
147                                                   147 
148 Ex::                                              148 Ex::
149                                                   149 
150         dmzadm --format /dev/sdxx                 150         dmzadm --format /dev/sdxx
151                                                   151 
152                                                   152 
153 If two drives are to be used, both devices mus    153 If two drives are to be used, both devices must be specified, with the
154 regular block device as the first device.         154 regular block device as the first device.
155                                                   155 
156 Ex::                                              156 Ex::
157                                                   157 
158         dmzadm --format /dev/sdxx /dev/sdyy       158         dmzadm --format /dev/sdxx /dev/sdyy
159                                                   159 
160                                                   160 
161 Formatted device(s) can be started with the dm    161 Formatted device(s) can be started with the dmzadm utility, too.:
162                                                   162 
163 Ex::                                              163 Ex::
164                                                   164 
165         dmzadm --start /dev/sdxx /dev/sdyy        165         dmzadm --start /dev/sdxx /dev/sdyy
166                                                   166 
167                                                   167 
168 Information about the internal layout and curr    168 Information about the internal layout and current usage of the zones can
169 be obtained with the 'status' callback from dm    169 be obtained with the 'status' callback from dmsetup:
170                                                   170 
171 Ex::                                              171 Ex::
172                                                   172 
173         dmsetup status /dev/dm-X                  173         dmsetup status /dev/dm-X
174                                                   174 
175 will return a line                                175 will return a line
176                                                   176 
177         0 <size> zoned <nr_zones> zones <nr_un    177         0 <size> zoned <nr_zones> zones <nr_unmap_rnd>/<nr_rnd> random <nr_unmap_seq>/<nr_seq> sequential
178                                                   178 
179 where <nr_zones> is the total number of zones,    179 where <nr_zones> is the total number of zones, <nr_unmap_rnd> is the number
180 of unmapped (ie free) random zones, <nr_rnd> t    180 of unmapped (ie free) random zones, <nr_rnd> the total number of zones,
181 <nr_unmap_seq> the number of unmapped sequenti    181 <nr_unmap_seq> the number of unmapped sequential zones, and <nr_seq> the
182 total number of sequential zones.                 182 total number of sequential zones.
183                                                   183 
184 Normally the reclaim process will be started o    184 Normally the reclaim process will be started once there are less than 50
185 percent free random zones. In order to start t    185 percent free random zones. In order to start the reclaim process manually
186 even before reaching this threshold the 'dmset    186 even before reaching this threshold the 'dmsetup message' function can be
187 used:                                             187 used:
188                                                   188 
189 Ex::                                              189 Ex::
190                                                   190 
191         dmsetup message /dev/dm-X 0 reclaim       191         dmsetup message /dev/dm-X 0 reclaim
192                                                   192 
193 will start the reclaim process and random zone    193 will start the reclaim process and random zones will be moved to sequential
194 zones.                                            194 zones.
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php