1 ============= 1 ============= 2 dm-log-writes 2 dm-log-writes 3 ============= 3 ============= 4 4 5 This target takes 2 devices, one to pass all I 5 This target takes 2 devices, one to pass all IO to normally, and one to log all 6 of the write operations to. This is intended 6 of the write operations to. This is intended for file system developers wishing 7 to verify the integrity of metadata or data as 7 to verify the integrity of metadata or data as the file system is written to. 8 There is a log_write_entry written for every W 8 There is a log_write_entry written for every WRITE request and the target is 9 able to take arbitrary data from userspace to 9 able to take arbitrary data from userspace to insert into the log. The data 10 that is in the WRITE requests is copied into t 10 that is in the WRITE requests is copied into the log to make the replay happen 11 exactly as it happened originally. 11 exactly as it happened originally. 12 12 13 Log Ordering 13 Log Ordering 14 ============ 14 ============ 15 15 16 We log things in order of completion once we a 16 We log things in order of completion once we are sure the write is no longer in 17 cache. This means that normal WRITE requests 17 cache. This means that normal WRITE requests are not actually logged until the 18 next REQ_PREFLUSH request. This is to make it 18 next REQ_PREFLUSH request. This is to make it easier for userspace to replay 19 the log in a way that correlates to what is on 19 the log in a way that correlates to what is on disk and not what is in cache, 20 to make it easier to detect improper waiting/f 20 to make it easier to detect improper waiting/flushing. 21 21 22 This works by attaching all WRITE requests to 22 This works by attaching all WRITE requests to a list once the write completes. 23 Once we see a REQ_PREFLUSH request we splice t 23 Once we see a REQ_PREFLUSH request we splice this list onto the request and once 24 the FLUSH request completes we log all of the 24 the FLUSH request completes we log all of the WRITEs and then the FLUSH. Only 25 completed WRITEs, at the time the REQ_PREFLUSH 25 completed WRITEs, at the time the REQ_PREFLUSH is issued, are added in order to 26 simulate the worst case scenario with regard t 26 simulate the worst case scenario with regard to power failures. Consider the 27 following example (W means write, C means comp 27 following example (W means write, C means complete): 28 28 29 W1,W2,W3,C3,C2,Wflush,C1,Cflush 29 W1,W2,W3,C3,C2,Wflush,C1,Cflush 30 30 31 The log would show the following: 31 The log would show the following: 32 32 33 W3,W2,flush,W1.... 33 W3,W2,flush,W1.... 34 34 35 Again this is to simulate what is actually on 35 Again this is to simulate what is actually on disk, this allows us to detect 36 cases where a power failure at a particular po 36 cases where a power failure at a particular point in time would create an 37 inconsistent file system. 37 inconsistent file system. 38 38 39 Any REQ_FUA requests bypass this flushing mech 39 Any REQ_FUA requests bypass this flushing mechanism and are logged as soon as 40 they complete as those requests will obviously 40 they complete as those requests will obviously bypass the device cache. 41 41 42 Any REQ_OP_DISCARD requests are treated like W 42 Any REQ_OP_DISCARD requests are treated like WRITE requests. Otherwise we would 43 have all the DISCARD requests, and then the WR 43 have all the DISCARD requests, and then the WRITE requests and then the FLUSH 44 request. Consider the following example: 44 request. Consider the following example: 45 45 46 WRITE block 1, DISCARD block 1, FLUSH 46 WRITE block 1, DISCARD block 1, FLUSH 47 47 48 If we logged DISCARD when it completed, the re 48 If we logged DISCARD when it completed, the replay would look like this: 49 49 50 DISCARD 1, WRITE 1, FLUSH 50 DISCARD 1, WRITE 1, FLUSH 51 51 52 which isn't quite what happened and wouldn't b 52 which isn't quite what happened and wouldn't be caught during the log replay. 53 53 54 Target interface 54 Target interface 55 ================ 55 ================ 56 56 57 i) Constructor 57 i) Constructor 58 58 59 log-writes <dev_path> <log_dev_path> 59 log-writes <dev_path> <log_dev_path> 60 60 61 ============= ============================= 61 ============= ============================================== 62 dev_path Device that all of the IO wil 62 dev_path Device that all of the IO will go to normally. 63 log_dev_path Device where the log entries 63 log_dev_path Device where the log entries are written to. 64 ============= ============================= 64 ============= ============================================== 65 65 66 ii) Status 66 ii) Status 67 67 68 <#logged entries> <highest allocated secto 68 <#logged entries> <highest allocated sector> 69 69 70 =========================== ============== 70 =========================== ======================== 71 #logged entries Number of logg 71 #logged entries Number of logged entries 72 highest allocated sector Highest alloca 72 highest allocated sector Highest allocated sector 73 =========================== ============== 73 =========================== ======================== 74 74 75 iii) Messages 75 iii) Messages 76 76 77 mark <description> 77 mark <description> 78 78 79 You can use a dmsetup message to set a 79 You can use a dmsetup message to set an arbitrary mark in a log. 80 For example say you want to fsck a fil 80 For example say you want to fsck a file system after every 81 write, but first you need to replay up 81 write, but first you need to replay up to the mkfs to make sure 82 we're fsck'ing something reasonable, y 82 we're fsck'ing something reasonable, you would do something like 83 this:: 83 this:: 84 84 85 mkfs.btrfs -f /dev/mapper/log 85 mkfs.btrfs -f /dev/mapper/log 86 dmsetup message log 0 mark mkfs 86 dmsetup message log 0 mark mkfs 87 <run test> 87 <run test> 88 88 89 This would allow you to replay the log 89 This would allow you to replay the log up to the mkfs mark and 90 then replay from that point on doing t 90 then replay from that point on doing the fsck check in the 91 interval that you want. 91 interval that you want. 92 92 93 Every log has a mark at the end labele 93 Every log has a mark at the end labeled "dm-log-writes-end". 94 94 95 Userspace component 95 Userspace component 96 =================== 96 =================== 97 97 98 There is a userspace tool that will replay the 98 There is a userspace tool that will replay the log for you in various ways. 99 It can be found here: https://github.com/josef 99 It can be found here: https://github.com/josefbacik/log-writes 100 100 101 Example usage 101 Example usage 102 ============= 102 ============= 103 103 104 Say you want to test fsync on your file system 104 Say you want to test fsync on your file system. You would do something like 105 this:: 105 this:: 106 106 107 TABLE="0 $(blockdev --getsz /dev/sdb) log-wr 107 TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" 108 dmsetup create log --table "$TABLE" 108 dmsetup create log --table "$TABLE" 109 mkfs.btrfs -f /dev/mapper/log 109 mkfs.btrfs -f /dev/mapper/log 110 dmsetup message log 0 mark mkfs 110 dmsetup message log 0 mark mkfs 111 111 112 mount /dev/mapper/log /mnt/btrfs-test 112 mount /dev/mapper/log /mnt/btrfs-test 113 <some test that does fsync at the end> 113 <some test that does fsync at the end> 114 dmsetup message log 0 mark fsync 114 dmsetup message log 0 mark fsync 115 md5sum /mnt/btrfs-test/foo 115 md5sum /mnt/btrfs-test/foo 116 umount /mnt/btrfs-test 116 umount /mnt/btrfs-test 117 117 118 dmsetup remove log 118 dmsetup remove log 119 replay-log --log /dev/sdc --replay /dev/sdb 119 replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync 120 mount /dev/sdb /mnt/btrfs-test 120 mount /dev/sdb /mnt/btrfs-test 121 md5sum /mnt/btrfs-test/foo 121 md5sum /mnt/btrfs-test/foo 122 <verify md5sum's are correct> 122 <verify md5sum's are correct> 123 123 124 Another option is to do a complicated file s 124 Another option is to do a complicated file system operation and verify the file 125 system is consistent during the entire opera 125 system is consistent during the entire operation. You could do this with: 126 126 127 TABLE="0 $(blockdev --getsz /dev/sdb) log-wr 127 TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" 128 dmsetup create log --table "$TABLE" 128 dmsetup create log --table "$TABLE" 129 mkfs.btrfs -f /dev/mapper/log 129 mkfs.btrfs -f /dev/mapper/log 130 dmsetup message log 0 mark mkfs 130 dmsetup message log 0 mark mkfs 131 131 132 mount /dev/mapper/log /mnt/btrfs-test 132 mount /dev/mapper/log /mnt/btrfs-test 133 <fsstress to dirty the fs> 133 <fsstress to dirty the fs> 134 btrfs filesystem balance /mnt/btrfs-test 134 btrfs filesystem balance /mnt/btrfs-test 135 umount /mnt/btrfs-test 135 umount /mnt/btrfs-test 136 dmsetup remove log 136 dmsetup remove log 137 137 138 replay-log --log /dev/sdc --replay /dev/sdb 138 replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs 139 btrfsck /dev/sdb 139 btrfsck /dev/sdb 140 replay-log --log /dev/sdc --replay /dev/sdb 140 replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \ 141 --fsck "btrfsck /dev/sdb" --check fua 141 --fsck "btrfsck /dev/sdb" --check fua 142 142 143 And that will replay the log until it sees a F 143 And that will replay the log until it sees a FUA request, run the fsck command 144 and if the fsck passes it will replay to the n 144 and if the fsck passes it will replay to the next FUA, until it is completed or 145 the fsck command exists abnormally. 145 the fsck command exists abnormally.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.