1 ======== 2 dm-zoned 3 ======== 4 5 The dm-zoned device mapper target exposes a zo 6 ZAC compliant devices) as a regular block devi 7 pattern constraints. In effect, it implements 8 block device which hides from the user (a file 9 doing raw block device accesses) the sequentia 10 host-managed zoned block devices and can mitig 11 device-side performance degradation due to exc 12 host-aware zoned block devices. 13 14 For a more detailed description of the zoned b 15 their constraints see (for SCSI devices): 16 17 https://www.t10.org/drafts.htm#ZBC_Family 18 19 and (for ATA devices): 20 21 http://www.t13.org/Documents/UploadedDocuments 22 23 The dm-zoned implementation is simple and mini 24 and memory usage as well as storage capacity l 25 host-managed disk with 256 MB zones, dm-zoned 26 instance is at most 4.5 MB and as little as 5 27 internally for storing metadata and performing 28 29 dm-zoned target devices are formatted and chec 30 utility available at: 31 32 https://github.com/hgst/dm-zoned-tools 33 34 Algorithm 35 ========= 36 37 dm-zoned implements an on-disk buffering schem 38 write accesses to the sequential zones of a zo 39 Conventional zones are used for caching as wel 40 metadata. It can also use a regular block devi 41 block device; in that case the regular block d 42 in zones with the same size as the zoned block 43 placed in front of the zones from the zoned bl 44 just like conventional zones. 45 46 The zones of the device(s) are separated into 47 48 1) Metadata zones: these are conventional zone 49 Metadata zones are not reported as usable capa 50 51 2) Data zones: all remaining zones, the vast m 52 sequential zones used exclusively to store use 53 zones of the device may be used also for buffe 54 Data in these zones may be directly mapped to 55 later moved to a sequential zone so that the c 56 reused for buffering incoming random writes. 57 58 dm-zoned exposes a logical device with a secto 59 irrespective of the physical sector size of th 60 device being used. This allows reducing the am 61 manage valid blocks (blocks written). 62 63 The on-disk metadata format is as follows: 64 65 1) The first block of the first conventional z 66 super block which describes the on disk amount 67 blocks. 68 69 2) Following the super block, a set of blocks 70 mapping of the logical device blocks. The mapp 71 blocks, with the chunk size equal to the zoned 72 mapping table is indexed by chunk number and e 73 indicates the zone number of the device storin 74 mapping entry may also indicate if the zone nu 75 zone used to buffer random modification to the 76 77 3) A set of blocks used to store bitmaps indic 78 blocks in the data zones follows the mapping t 79 defined as a block that was written and not di 80 data chunk, a block is always valid only in th 81 chunk or in the buffer zone of the chunk. 82 83 For a logical chunk mapped to a conventional z 84 are processed by directly writing to the zone. 85 sequential zone, the write operation is proces 86 write offset within the logical chunk is equal 87 offset within of the sequential data zone (i.e 88 aligned on the zone write pointer). Otherwise, 89 processed indirectly using a buffer zone. In t 90 conventional zone is allocated and assigned to 91 accessed. Writing a block to the buffer zone o 92 automatically invalidate the same block in the 93 the chunk. If all blocks of the sequential zon 94 is freed and the chunk buffer zone becomes the 95 chunk, resulting in native random write perfor 96 block device. 97 98 Read operations are processed according to the 99 information provided by the bitmaps. Valid blo 100 the sequential zone mapping a chunk, or if the 101 the buffer zone assigned. If the accessed chun 102 accessed blocks are invalid, the read buffer i 103 operation terminated. 104 105 After some time, the limited number of convent 106 be exhausted (all used to map chunks or buffer 107 unaligned writes to unbuffered chunks become i 108 situation, a reclaim process regularly scans u 109 tries to reclaim the least recently used zones 110 blocks of the buffer zone to a free sequential 111 completes, the chunk mapping is updated to poi 112 and the buffer zone freed for reuse. 113 114 Metadata Protection 115 =================== 116 117 To protect metadata against corruption in case 118 system crash, 2 sets of metadata zones are use 119 set, is used as the main metadata region, whil 120 used as a staging area. Modified metadata is f 121 secondary set and validated by updating the su 122 set, a generation counter is used to indicate 123 newest metadata. Once this operation completes 124 block updates can be done in the primary metad 125 one of the set is always consistent (all modif 126 at all). Flush operations are used as a commit 127 a flush request, metadata modification activit 128 (for both incoming BIO processing and reclaim 129 metadata blocks are staged and updated. Normal 130 resumed. Flushing metadata thus only temporari 131 discard requests. Read requests can be process 132 metadata flush is being executed. 133 134 If a regular device is used in conjunction wit 135 a third set of metadata (without the zone bitm 136 start of the zoned block device. This metadata 137 '0' and will never be updated during normal op 138 identification purposes. The first and second 139 are located at the start of the regular block 140 141 Usage 142 ===== 143 144 A zoned block device must first be formatted u 145 will analyze the device zone configuration, de 146 metadata sets on the device and initialize the 147 148 Ex:: 149 150 dmzadm --format /dev/sdxx 151 152 153 If two drives are to be used, both devices mus 154 regular block device as the first device. 155 156 Ex:: 157 158 dmzadm --format /dev/sdxx /dev/sdyy 159 160 161 Formatted device(s) can be started with the dm 162 163 Ex:: 164 165 dmzadm --start /dev/sdxx /dev/sdyy 166 167 168 Information about the internal layout and curr 169 be obtained with the 'status' callback from dm 170 171 Ex:: 172 173 dmsetup status /dev/dm-X 174 175 will return a line 176 177 0 <size> zoned <nr_zones> zones <nr_un 178 179 where <nr_zones> is the total number of zones, 180 of unmapped (ie free) random zones, <nr_rnd> t 181 <nr_unmap_seq> the number of unmapped sequenti 182 total number of sequential zones. 183 184 Normally the reclaim process will be started o 185 percent free random zones. In order to start t 186 even before reaching this threshold the 'dmset 187 used: 188 189 Ex:: 190 191 dmsetup message /dev/dm-X 0 reclaim 192 193 will start the reclaim process and random zone 194 zones.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.