1 ============ 1 ============ 2 dm-integrity 2 dm-integrity 3 ============ 3 ============ 4 4 5 The dm-integrity target emulates a block devic 5 The dm-integrity target emulates a block device that has additional 6 per-sector tags that can be used for storing i 6 per-sector tags that can be used for storing integrity information. 7 7 8 A general problem with storing integrity tags 8 A general problem with storing integrity tags with every sector is that 9 writing the sector and the integrity tag must 9 writing the sector and the integrity tag must be atomic - i.e. in case of 10 crash, either both sector and integrity tag or 10 crash, either both sector and integrity tag or none of them is written. 11 11 12 To guarantee write atomicity, the dm-integrity 12 To guarantee write atomicity, the dm-integrity target uses journal, it 13 writes sector data and integrity tags into a j 13 writes sector data and integrity tags into a journal, commits the journal 14 and then copies the data and integrity tags to 14 and then copies the data and integrity tags to their respective location. 15 15 16 The dm-integrity target can be used with the d 16 The dm-integrity target can be used with the dm-crypt target - in this 17 situation the dm-crypt target creates the inte 17 situation the dm-crypt target creates the integrity data and passes them 18 to the dm-integrity target via bio_integrity_p 18 to the dm-integrity target via bio_integrity_payload attached to the bio. 19 In this mode, the dm-crypt and dm-integrity ta 19 In this mode, the dm-crypt and dm-integrity targets provide authenticated 20 disk encryption - if the attacker modifies the 20 disk encryption - if the attacker modifies the encrypted device, an I/O 21 error is returned instead of random data. 21 error is returned instead of random data. 22 22 23 The dm-integrity target can also be used as a 23 The dm-integrity target can also be used as a standalone target, in this 24 mode it calculates and verifies the integrity 24 mode it calculates and verifies the integrity tag internally. In this 25 mode, the dm-integrity target can be used to d 25 mode, the dm-integrity target can be used to detect silent data 26 corruption on the disk or in the I/O path. 26 corruption on the disk or in the I/O path. 27 27 28 There's an alternate mode of operation where d 28 There's an alternate mode of operation where dm-integrity uses a bitmap 29 instead of a journal. If a bit in the bitmap i 29 instead of a journal. If a bit in the bitmap is 1, the corresponding 30 region's data and integrity tags are not synch 30 region's data and integrity tags are not synchronized - if the machine 31 crashes, the unsynchronized regions will be re 31 crashes, the unsynchronized regions will be recalculated. The bitmap mode 32 is faster than the journal mode, because we do 32 is faster than the journal mode, because we don't have to write the data 33 twice, but it is also less reliable, because i 33 twice, but it is also less reliable, because if data corruption happens 34 when the machine crashes, it may not be detect 34 when the machine crashes, it may not be detected. 35 35 36 When loading the target for the first time, th 36 When loading the target for the first time, the kernel driver will format 37 the device. But it will only format the device 37 the device. But it will only format the device if the superblock contains 38 zeroes. If the superblock is neither valid nor 38 zeroes. If the superblock is neither valid nor zeroed, the dm-integrity 39 target can't be loaded. 39 target can't be loaded. 40 40 41 Accesses to the on-disk metadata area containi 41 Accesses to the on-disk metadata area containing checksums (aka tags) are 42 buffered using dm-bufio. When an access to any 42 buffered using dm-bufio. When an access to any given metadata area 43 occurs, each unique metadata area gets its own 43 occurs, each unique metadata area gets its own buffer(s). The buffer size 44 is capped at the size of the metadata area, bu 44 is capped at the size of the metadata area, but may be smaller, thereby 45 requiring multiple buffers to represent the fu 45 requiring multiple buffers to represent the full metadata area. A smaller 46 buffer size will produce a smaller resulting r 46 buffer size will produce a smaller resulting read/write operation to the 47 metadata area for small reads/writes. The meta 47 metadata area for small reads/writes. The metadata is still read even in 48 a full write to the data covered by a single b 48 a full write to the data covered by a single buffer. 49 49 50 To use the target for the first time: 50 To use the target for the first time: 51 51 52 1. overwrite the superblock with zeroes 52 1. overwrite the superblock with zeroes 53 2. load the dm-integrity target with one-secto 53 2. load the dm-integrity target with one-sector size, the kernel driver 54 will format the device 54 will format the device 55 3. unload the dm-integrity target 55 3. unload the dm-integrity target 56 4. read the "provided_data_sectors" value from 56 4. read the "provided_data_sectors" value from the superblock 57 5. load the dm-integrity target with the targe 57 5. load the dm-integrity target with the target size 58 "provided_data_sectors" 58 "provided_data_sectors" 59 6. if you want to use dm-integrity with dm-cry 59 6. if you want to use dm-integrity with dm-crypt, load the dm-crypt target 60 with the size "provided_data_sectors" 60 with the size "provided_data_sectors" 61 61 62 62 63 Target arguments: 63 Target arguments: 64 64 65 1. the underlying block device 65 1. the underlying block device 66 66 67 2. the number of reserved sector at the beginn 67 2. the number of reserved sector at the beginning of the device - the 68 dm-integrity won't read of write these sect 68 dm-integrity won't read of write these sectors 69 69 70 3. the size of the integrity tag (if "-" is us 70 3. the size of the integrity tag (if "-" is used, the size is taken from 71 the internal-hash algorithm) 71 the internal-hash algorithm) 72 72 73 4. mode: 73 4. mode: 74 74 75 D - direct writes (without journal) 75 D - direct writes (without journal) 76 in this mode, journaling is 76 in this mode, journaling is 77 not used and data sectors and 77 not used and data sectors and integrity tags are written 78 separately. In case of crash, 78 separately. In case of crash, it is possible that the data 79 and integrity tag doesn't matc 79 and integrity tag doesn't match. 80 J - journaled writes 80 J - journaled writes 81 data and integrity tags are wr 81 data and integrity tags are written to the 82 journal and atomicity is guara 82 journal and atomicity is guaranteed. In case of crash, 83 either both data and tag or no 83 either both data and tag or none of them are written. The 84 journaled mode degrades write 84 journaled mode degrades write throughput twice because the 85 data have to be written twice. 85 data have to be written twice. 86 B - bitmap mode - data and metadata ar 86 B - bitmap mode - data and metadata are written without any 87 synchronization, the driver ma 87 synchronization, the driver maintains a bitmap of dirty 88 regions where data and metadat 88 regions where data and metadata don't match. This mode can 89 only be used with internal has 89 only be used with internal hash. 90 R - recovery mode - in this mode, jour 90 R - recovery mode - in this mode, journal is not replayed, 91 checksums are not checked and 91 checksums are not checked and writes to the device are not 92 allowed. This mode is useful f 92 allowed. This mode is useful for data recovery if the 93 device cannot be activated in 93 device cannot be activated in any of the other standard 94 modes. 94 modes. 95 95 96 5. the number of additional arguments 96 5. the number of additional arguments 97 97 98 Additional arguments: 98 Additional arguments: 99 99 100 journal_sectors:number 100 journal_sectors:number 101 The size of journal, this argument is 101 The size of journal, this argument is used only if formatting the 102 device. If the device is already forma 102 device. If the device is already formatted, the value from the 103 superblock is used. 103 superblock is used. 104 104 105 interleave_sectors:number (default 32768) 105 interleave_sectors:number (default 32768) 106 The number of interleaved sectors. Thi 106 The number of interleaved sectors. This values is rounded down to 107 a power of two. If the device is alrea 107 a power of two. If the device is already formatted, the value from 108 the superblock is used. 108 the superblock is used. 109 109 110 meta_device:device 110 meta_device:device 111 Don't interleave the data and metadata 111 Don't interleave the data and metadata on the device. Use a 112 separate device for metadata. 112 separate device for metadata. 113 113 114 buffer_sectors:number (default 128) 114 buffer_sectors:number (default 128) 115 The number of sectors in one metadata 115 The number of sectors in one metadata buffer. The value is rounded 116 down to a power of two. 116 down to a power of two. 117 117 118 journal_watermark:number (default 50) 118 journal_watermark:number (default 50) 119 The journal watermark in percents. Whe 119 The journal watermark in percents. When the size of the journal 120 exceeds this watermark, the thread tha 120 exceeds this watermark, the thread that flushes the journal will 121 be started. 121 be started. 122 122 123 commit_time:number (default 10000) 123 commit_time:number (default 10000) 124 Commit time in milliseconds. When this 124 Commit time in milliseconds. When this time passes, the journal is 125 written. The journal is also written i 125 written. The journal is also written immediately if the FLUSH 126 request is received. 126 request is received. 127 127 128 internal_hash:algorithm(:key) (the key is op 128 internal_hash:algorithm(:key) (the key is optional) 129 Use internal hash or crc. 129 Use internal hash or crc. 130 When this argument is used, the dm-int 130 When this argument is used, the dm-integrity target won't accept 131 integrity tags from the upper target, 131 integrity tags from the upper target, but it will automatically 132 generate and verify the integrity tags 132 generate and verify the integrity tags. 133 133 134 You can use a crc algorithm (such as c 134 You can use a crc algorithm (such as crc32), then integrity target 135 will protect the data against accident 135 will protect the data against accidental corruption. 136 You can also use a hmac algorithm (for 136 You can also use a hmac algorithm (for example 137 "hmac(sha256):0123456789abcdef"), in t 137 "hmac(sha256):0123456789abcdef"), in this mode it will provide 138 cryptographic authentication of the da 138 cryptographic authentication of the data without encryption. 139 139 140 When this argument is not used, the in 140 When this argument is not used, the integrity tags are accepted 141 from an upper layer target, such as dm 141 from an upper layer target, such as dm-crypt. The upper layer 142 target should check the validity of th 142 target should check the validity of the integrity tags. 143 143 144 recalculate 144 recalculate 145 Recalculate the integrity tags automat 145 Recalculate the integrity tags automatically. It is only valid 146 when using internal hash. 146 when using internal hash. 147 147 148 journal_crypt:algorithm(:key) (the key is op 148 journal_crypt:algorithm(:key) (the key is optional) 149 Encrypt the journal using given algori 149 Encrypt the journal using given algorithm to make sure that the 150 attacker can't read the journal. You c 150 attacker can't read the journal. You can use a block cipher here 151 (such as "cbc(aes)") or a stream ciphe 151 (such as "cbc(aes)") or a stream cipher (for example "chacha20" 152 or "ctr(aes)"). 152 or "ctr(aes)"). 153 153 154 The journal contains history of last w 154 The journal contains history of last writes to the block device, 155 an attacker reading the journal could 155 an attacker reading the journal could see the last sector numbers 156 that were written. From the sector num 156 that were written. From the sector numbers, the attacker can infer 157 the size of files that were written. T 157 the size of files that were written. To protect against this 158 situation, you can encrypt the journal 158 situation, you can encrypt the journal. 159 159 160 journal_mac:algorithm(:key) (the key is op 160 journal_mac:algorithm(:key) (the key is optional) 161 Protect sector numbers in the journal 161 Protect sector numbers in the journal from accidental or malicious 162 modification. To protect against accid 162 modification. To protect against accidental modification, use a 163 crc algorithm, to protect against mali 163 crc algorithm, to protect against malicious modification, use a 164 hmac algorithm with a key. 164 hmac algorithm with a key. 165 165 166 This option is not needed when using i 166 This option is not needed when using internal-hash because in this 167 mode, the integrity of journal entries 167 mode, the integrity of journal entries is checked when replaying 168 the journal. Thus, modified sector num 168 the journal. Thus, modified sector number would be detected at 169 this stage. 169 this stage. 170 170 171 block_size:number (default 512) 171 block_size:number (default 512) 172 The size of a data block in bytes. The 172 The size of a data block in bytes. The larger the block size the 173 less overhead there is for per-block i 173 less overhead there is for per-block integrity metadata. 174 Supported values are 512, 1024, 2048 a 174 Supported values are 512, 1024, 2048 and 4096 bytes. 175 175 176 sectors_per_bit:number 176 sectors_per_bit:number 177 In the bitmap mode, this parameter spe 177 In the bitmap mode, this parameter specifies the number of 178 512-byte sectors that corresponds to o 178 512-byte sectors that corresponds to one bitmap bit. 179 179 180 bitmap_flush_interval:number 180 bitmap_flush_interval:number 181 The bitmap flush interval in milliseco 181 The bitmap flush interval in milliseconds. The metadata buffers 182 are synchronized when this interval ex 182 are synchronized when this interval expires. 183 183 184 allow_discards 184 allow_discards 185 Allow block discard requests (a.k.a. T 185 Allow block discard requests (a.k.a. TRIM) for the integrity device. 186 Discards are only allowed to devices u 186 Discards are only allowed to devices using internal hash. 187 187 188 fix_padding 188 fix_padding 189 Use a smaller padding of the tag area 189 Use a smaller padding of the tag area that is more 190 space-efficient. If this option is not 190 space-efficient. If this option is not present, large padding is 191 used - that is for compatibility with 191 used - that is for compatibility with older kernels. 192 192 193 fix_hmac 193 fix_hmac 194 Improve security of internal_hash and 194 Improve security of internal_hash and journal_mac: 195 195 196 - the section number is mixed to the m 196 - the section number is mixed to the mac, so that an attacker can't 197 copy sectors from one journal sectio 197 copy sectors from one journal section to another journal section 198 - the superblock is protected by journ 198 - the superblock is protected by journal_mac 199 - a 16-byte salt stored in the superbl 199 - a 16-byte salt stored in the superblock is mixed to the mac, so 200 that the attacker can't detect that 200 that the attacker can't detect that two disks have the same hmac 201 key and also to disallow the attacke 201 key and also to disallow the attacker to move sectors from one 202 disk to another 202 disk to another 203 203 204 legacy_recalculate 204 legacy_recalculate 205 Allow recalculating of volumes with HM 205 Allow recalculating of volumes with HMAC keys. This is disabled by 206 default for security reasons - an atta 206 default for security reasons - an attacker could modify the volume, 207 set recalc_sector to zero, and the ker 207 set recalc_sector to zero, and the kernel would not detect the 208 modification. 208 modification. 209 209 210 The journal mode (D/J), buffer_sectors, journa 210 The journal mode (D/J), buffer_sectors, journal_watermark, commit_time and 211 allow_discards can be changed when reloading t 211 allow_discards can be changed when reloading the target (load an inactive 212 table and swap the tables with suspend and res 212 table and swap the tables with suspend and resume). The other arguments 213 should not be changed when reloading the targe 213 should not be changed when reloading the target because the layout of disk 214 data depend on them and the reloaded target wo 214 data depend on them and the reloaded target would be non-functional. 215 215 216 For example, on a device using the default int 216 For example, on a device using the default interleave_sectors of 32768, a 217 block_size of 512, and an internal_hash of crc 217 block_size of 512, and an internal_hash of crc32c with a tag size of 4 218 bytes, it will take 128 KiB of tags to track a 218 bytes, it will take 128 KiB of tags to track a full data area, requiring 219 256 sectors of metadata per data area. With th 219 256 sectors of metadata per data area. With the default buffer_sectors of 220 128, that means there will be 2 buffers per me 220 128, that means there will be 2 buffers per metadata area, or 2 buffers 221 per 16 MiB of data. 221 per 16 MiB of data. 222 222 223 Status line: 223 Status line: 224 224 225 1. the number of integrity mismatches 225 1. the number of integrity mismatches 226 2. provided data sectors - that is the number 226 2. provided data sectors - that is the number of sectors that the user 227 could use 227 could use 228 3. the current recalculating position (or '-' 228 3. the current recalculating position (or '-' if we didn't recalculate) 229 229 230 230 231 The layout of the formatted block device: 231 The layout of the formatted block device: 232 232 233 * reserved sectors 233 * reserved sectors 234 (they are not used by this target, they ca 234 (they are not used by this target, they can be used for 235 storing LUKS metadata or for other purpose 235 storing LUKS metadata or for other purpose), the size of the reserved 236 area is specified in the target arguments 236 area is specified in the target arguments 237 237 238 * superblock (4kiB) 238 * superblock (4kiB) 239 * magic string - identifies that the d 239 * magic string - identifies that the device was formatted 240 * version 240 * version 241 * log2(interleave sectors) 241 * log2(interleave sectors) 242 * integrity tag size 242 * integrity tag size 243 * the number of journal sections 243 * the number of journal sections 244 * provided data sectors - the number o 244 * provided data sectors - the number of sectors that this target 245 provides (i.e. the size of the devic 245 provides (i.e. the size of the device minus the size of all 246 metadata and padding). The user of t 246 metadata and padding). The user of this target should not send 247 bios that access data beyond the "pr 247 bios that access data beyond the "provided data sectors" limit. 248 * flags 248 * flags 249 SB_FLAG_HAVE_JOURNAL_MAC 249 SB_FLAG_HAVE_JOURNAL_MAC 250 - a flag is set if journal_mac 250 - a flag is set if journal_mac is used 251 SB_FLAG_RECALCULATING 251 SB_FLAG_RECALCULATING 252 - recalculating is in progress 252 - recalculating is in progress 253 SB_FLAG_DIRTY_BITMAP 253 SB_FLAG_DIRTY_BITMAP 254 - journal area contains the bi 254 - journal area contains the bitmap of dirty 255 blocks 255 blocks 256 * log2(sectors per block) 256 * log2(sectors per block) 257 * a position where recalculating finis 257 * a position where recalculating finished 258 * journal 258 * journal 259 The journal is divided into sections, 259 The journal is divided into sections, each section contains: 260 260 261 * metadata area (4kiB), it contains jo 261 * metadata area (4kiB), it contains journal entries 262 262 263 - every journal entry contains: 263 - every journal entry contains: 264 264 265 * logical sector (specifies wh 265 * logical sector (specifies where the data and tag should 266 be written) 266 be written) 267 * last 8 bytes of data 267 * last 8 bytes of data 268 * integrity tag (the size is s 268 * integrity tag (the size is specified in the superblock) 269 269 270 - every metadata sector ends with 270 - every metadata sector ends with 271 271 272 * mac (8-bytes), all the macs 272 * mac (8-bytes), all the macs in 8 metadata sectors form a 273 64-byte value. It is used to 273 64-byte value. It is used to store hmac of sector 274 numbers in the journal secti 274 numbers in the journal section, to protect against a 275 possibility that the attacke 275 possibility that the attacker tampers with sector 276 numbers in the journal. 276 numbers in the journal. 277 * commit id 277 * commit id 278 278 279 * data area (the size is variable; it 279 * data area (the size is variable; it depends on how many journal 280 entries fit into the metadata area) 280 entries fit into the metadata area) 281 281 282 - every sector in the data area co 282 - every sector in the data area contains: 283 283 284 * data (504 bytes of data, the 284 * data (504 bytes of data, the last 8 bytes are stored in 285 the journal entry) 285 the journal entry) 286 * commit id 286 * commit id 287 287 288 To test if the whole journal section w 288 To test if the whole journal section was written correctly, every 289 512-byte sector of the journal ends wi 289 512-byte sector of the journal ends with 8-byte commit id. If the 290 commit id matches on all sectors in a 290 commit id matches on all sectors in a journal section, then it is 291 assumed that the section was written c 291 assumed that the section was written correctly. If the commit id 292 doesn't match, the section was written 292 doesn't match, the section was written partially and it should not 293 be replayed. 293 be replayed. 294 294 295 * one or more runs of interleaved tags and dat 295 * one or more runs of interleaved tags and data. 296 Each run contains: 296 Each run contains: 297 297 298 * tag area - it contains integrity tag 298 * tag area - it contains integrity tags. There is one tag for each 299 sector in the data area. The size of 299 sector in the data area. The size of this area is always 4KiB or 300 greater. 300 greater. 301 * data area - it contains data sectors 301 * data area - it contains data sectors. The number of data sectors 302 in one run must be a power of two. l 302 in one run must be a power of two. log2 of this value is stored 303 in the superblock. 303 in the superblock.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.