~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/admin-guide/bcache.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/admin-guide/bcache.rst (Version linux-6.12-rc7) and /Documentation/admin-guide/bcache.rst (Version linux-4.19.323)


  1 ============================                        1 ============================
  2 A block layer cache (bcache)                        2 A block layer cache (bcache)
  3 ============================                        3 ============================
  4                                                     4 
  5 Say you've got a big slow raid 6, and an ssd o      5 Say you've got a big slow raid 6, and an ssd or three. Wouldn't it be
  6 nice if you could use them as cache... Hence b      6 nice if you could use them as cache... Hence bcache.
  7                                                     7 
  8 The bcache wiki can be found at:               !!   8 Wiki and git repositories are at:
  9   https://bcache.evilpiepirate.org             << 
 10                                                     9 
 11 This is the git repository of bcache-tools:    !!  10   - http://bcache.evilpiepirate.org
 12   https://git.kernel.org/pub/scm/linux/kernel/ !!  11   - http://evilpiepirate.org/git/linux-bcache.git
 13                                                !!  12   - http://evilpiepirate.org/git/bcache-tools.git
 14 The latest bcache kernel code can be found fro << 
 15   https://git.kernel.org/pub/scm/linux/kernel/ << 
 16                                                    13 
 17 It's designed around the performance character     14 It's designed around the performance characteristics of SSDs - it only allocates
 18 in erase block sized buckets, and it uses a hy     15 in erase block sized buckets, and it uses a hybrid btree/log to track cached
 19 extents (which can be anywhere from a single s     16 extents (which can be anywhere from a single sector to the bucket size). It's
 20 designed to avoid random writes at all costs;      17 designed to avoid random writes at all costs; it fills up an erase block
 21 sequentially, then issues a discard before reu     18 sequentially, then issues a discard before reusing it.
 22                                                    19 
 23 Both writethrough and writeback caching are su     20 Both writethrough and writeback caching are supported. Writeback defaults to
 24 off, but can be switched on and off arbitraril     21 off, but can be switched on and off arbitrarily at runtime. Bcache goes to
 25 great lengths to protect your data - it reliab     22 great lengths to protect your data - it reliably handles unclean shutdown. (It
 26 doesn't even have a notion of a clean shutdown     23 doesn't even have a notion of a clean shutdown; bcache simply doesn't return
 27 writes as completed until they're on stable st     24 writes as completed until they're on stable storage).
 28                                                    25 
 29 Writeback caching can use most of the cache fo     26 Writeback caching can use most of the cache for buffering writes - writing
 30 dirty data to the backing device is always don     27 dirty data to the backing device is always done sequentially, scanning from the
 31 start to the end of the index.                     28 start to the end of the index.
 32                                                    29 
 33 Since random IO is what SSDs excel at, there g     30 Since random IO is what SSDs excel at, there generally won't be much benefit
 34 to caching large sequential IO. Bcache detects     31 to caching large sequential IO. Bcache detects sequential IO and skips it;
 35 it also keeps a rolling average of the IO size     32 it also keeps a rolling average of the IO sizes per task, and as long as the
 36 average is above the cutoff it will skip all I     33 average is above the cutoff it will skip all IO from that task - instead of
 37 caching the first 512k after every seek. Backu     34 caching the first 512k after every seek. Backups and large file copies should
 38 thus entirely bypass the cache.                    35 thus entirely bypass the cache.
 39                                                    36 
 40 In the event of a data IO error on the flash i     37 In the event of a data IO error on the flash it will try to recover by reading
 41 from disk or invalidating cache entries.  For      38 from disk or invalidating cache entries.  For unrecoverable errors (meta data
 42 or dirty data), caching is automatically disab     39 or dirty data), caching is automatically disabled; if dirty data was present
 43 in the cache it first disables writeback cachi     40 in the cache it first disables writeback caching and waits for all dirty data
 44 to be flushed.                                     41 to be flushed.
 45                                                    42 
 46 Getting started:                                   43 Getting started:
 47 You'll need bcache util from the bcache-tools  !!  44 You'll need make-bcache from the bcache-tools repository. Both the cache device
 48 and backing device must be formatted before us     45 and backing device must be formatted before use::
 49                                                    46 
 50   bcache make -B /dev/sdb                      !!  47   make-bcache -B /dev/sdb
 51   bcache make -C /dev/sdc                      !!  48   make-bcache -C /dev/sdc
 52                                                    49 
 53 `bcache make` has the ability to format multip !!  50 make-bcache has the ability to format multiple devices at the same time - if
 54 you format your backing devices and cache devi     51 you format your backing devices and cache device at the same time, you won't
 55 have to manually attach::                          52 have to manually attach::
 56                                                    53 
 57   bcache make -B /dev/sda /dev/sdb -C /dev/sdc !!  54   make-bcache -B /dev/sda /dev/sdb -C /dev/sdc
 58                                                << 
 59 If your bcache-tools is not updated to latest  << 
 60 unified `bcache` utility, you may use the lega << 
 61 bcache device with same -B and -C parameters.  << 
 62                                                    55 
 63 bcache-tools now ships udev rules, and bcache      56 bcache-tools now ships udev rules, and bcache devices are known to the kernel
 64 immediately.  Without udev, you can manually r     57 immediately.  Without udev, you can manually register devices like this::
 65                                                    58 
 66   echo /dev/sdb > /sys/fs/bcache/register          59   echo /dev/sdb > /sys/fs/bcache/register
 67   echo /dev/sdc > /sys/fs/bcache/register          60   echo /dev/sdc > /sys/fs/bcache/register
 68                                                    61 
 69 Registering the backing device makes the bcach     62 Registering the backing device makes the bcache device show up in /dev; you can
 70 now format it and use it as normal. But the fi     63 now format it and use it as normal. But the first time using a new bcache
 71 device, it'll be running in passthrough mode u     64 device, it'll be running in passthrough mode until you attach it to a cache.
 72 If you are thinking about using bcache later,      65 If you are thinking about using bcache later, it is recommended to setup all your
 73 slow devices as bcache backing devices without     66 slow devices as bcache backing devices without a cache, and you can choose to add
 74 a caching device later.                            67 a caching device later.
 75 See 'ATTACHING' section below.                     68 See 'ATTACHING' section below.
 76                                                    69 
 77 The devices show up as::                           70 The devices show up as::
 78                                                    71 
 79   /dev/bcache<N>                                   72   /dev/bcache<N>
 80                                                    73 
 81 As well as (with udev)::                           74 As well as (with udev)::
 82                                                    75 
 83   /dev/bcache/by-uuid/<uuid>                       76   /dev/bcache/by-uuid/<uuid>
 84   /dev/bcache/by-label/<label>                     77   /dev/bcache/by-label/<label>
 85                                                    78 
 86 To get started::                                   79 To get started::
 87                                                    80 
 88   mkfs.ext4 /dev/bcache0                           81   mkfs.ext4 /dev/bcache0
 89   mount /dev/bcache0 /mnt                          82   mount /dev/bcache0 /mnt
 90                                                    83 
 91 You can control bcache devices through sysfs a     84 You can control bcache devices through sysfs at /sys/block/bcache<N>/bcache .
 92 You can also control them through /sys/fs//bca     85 You can also control them through /sys/fs//bcache/<cset-uuid>/ .
 93                                                    86 
 94 Cache devices are managed as sets; multiple ca     87 Cache devices are managed as sets; multiple caches per set isn't supported yet
 95 but will allow for mirroring of metadata and d     88 but will allow for mirroring of metadata and dirty data in the future. Your new
 96 cache set shows up as /sys/fs/bcache/<UUID>        89 cache set shows up as /sys/fs/bcache/<UUID>
 97                                                    90 
 98 Attaching                                          91 Attaching
 99 ---------                                          92 ---------
100                                                    93 
101 After your cache device and backing device are     94 After your cache device and backing device are registered, the backing device
102 must be attached to your cache set to enable c     95 must be attached to your cache set to enable caching. Attaching a backing
103 device to a cache set is done thusly, with the     96 device to a cache set is done thusly, with the UUID of the cache set in
104 /sys/fs/bcache::                                   97 /sys/fs/bcache::
105                                                    98 
106   echo <CSET-UUID> > /sys/block/bcache0/bcache     99   echo <CSET-UUID> > /sys/block/bcache0/bcache/attach
107                                                   100 
108 This only has to be done once. The next time y    101 This only has to be done once. The next time you reboot, just reregister all
109 your bcache devices. If a backing device has d    102 your bcache devices. If a backing device has data in a cache somewhere, the
110 /dev/bcache<N> device won't be created until t    103 /dev/bcache<N> device won't be created until the cache shows up - particularly
111 important if you have writeback caching turned    104 important if you have writeback caching turned on.
112                                                   105 
113 If you're booting up and your cache device is     106 If you're booting up and your cache device is gone and never coming back, you
114 can force run the backing device::                107 can force run the backing device::
115                                                   108 
116   echo 1 > /sys/block/sdb/bcache/running          109   echo 1 > /sys/block/sdb/bcache/running
117                                                   110 
118 (You need to use /sys/block/sdb (or whatever y    111 (You need to use /sys/block/sdb (or whatever your backing device is called), not
119 /sys/block/bcache0, because bcache0 doesn't ex    112 /sys/block/bcache0, because bcache0 doesn't exist yet. If you're using a
120 partition, the bcache directory would be at /s    113 partition, the bcache directory would be at /sys/block/sdb/sdb2/bcache)
121                                                   114 
122 The backing device will still use that cache s    115 The backing device will still use that cache set if it shows up in the future,
123 but all the cached data will be invalidated. I    116 but all the cached data will be invalidated. If there was dirty data in the
124 cache, don't expect the filesystem to be recov    117 cache, don't expect the filesystem to be recoverable - you will have massive
125 filesystem corruption, though ext4's fsck does    118 filesystem corruption, though ext4's fsck does work miracles.
126                                                   119 
127 Error Handling                                    120 Error Handling
128 --------------                                    121 --------------
129                                                   122 
130 Bcache tries to transparently handle IO errors    123 Bcache tries to transparently handle IO errors to/from the cache device without
131 affecting normal operation; if it sees too man    124 affecting normal operation; if it sees too many errors (the threshold is
132 configurable, and defaults to 0) it shuts down    125 configurable, and defaults to 0) it shuts down the cache device and switches all
133 the backing devices to passthrough mode.          126 the backing devices to passthrough mode.
134                                                   127 
135  - For reads from the cache, if they error we     128  - For reads from the cache, if they error we just retry the read from the
136    backing device.                                129    backing device.
137                                                   130 
138  - For writethrough writes, if the write to th    131  - For writethrough writes, if the write to the cache errors we just switch to
139    invalidating the data at that lba in the ca    132    invalidating the data at that lba in the cache (i.e. the same thing we do for
140    a write that bypasses the cache)               133    a write that bypasses the cache)
141                                                   134 
142  - For writeback writes, we currently pass tha    135  - For writeback writes, we currently pass that error back up to the
143    filesystem/userspace. This could be improve    136    filesystem/userspace. This could be improved - we could retry it as a write
144    that skips the cache so we don't have to er    137    that skips the cache so we don't have to error the write.
145                                                   138 
146  - When we detach, we first try to flush any d    139  - When we detach, we first try to flush any dirty data (if we were running in
147    writeback mode). It currently doesn't do an    140    writeback mode). It currently doesn't do anything intelligent if it fails to
148    read some of the dirty data, though.           141    read some of the dirty data, though.
149                                                   142 
150                                                   143 
151 Howto/cookbook                                    144 Howto/cookbook
152 --------------                                    145 --------------
153                                                   146 
154 A) Starting a bcache with a missing caching de    147 A) Starting a bcache with a missing caching device
155                                                   148 
156 If registering the backing device doesn't help    149 If registering the backing device doesn't help, it's already there, you just need
157 to force it to run without the cache::            150 to force it to run without the cache::
158                                                   151 
159         host:~# echo /dev/sdb1 > /sys/fs/bcach    152         host:~# echo /dev/sdb1 > /sys/fs/bcache/register
160         [  119.844831] bcache: register_bcache    153         [  119.844831] bcache: register_bcache() error opening /dev/sdb1: device already registered
161                                                   154 
162 Next, you try to register your caching device     155 Next, you try to register your caching device if it's present. However
163 if it's absent, or registration fails for some    156 if it's absent, or registration fails for some reason, you can still
164 start your bcache without its cache, like so::    157 start your bcache without its cache, like so::
165                                                   158 
166         host:/sys/block/sdb/sdb1/bcache# echo     159         host:/sys/block/sdb/sdb1/bcache# echo 1 > running
167                                                   160 
168 Note that this may cause data loss if you were    161 Note that this may cause data loss if you were running in writeback mode.
169                                                   162 
170                                                   163 
171 B) Bcache does not find its cache::               164 B) Bcache does not find its cache::
172                                                   165 
173         host:/sys/block/md5/bcache# echo 02265    166         host:/sys/block/md5/bcache# echo 0226553a-37cf-41d5-b3ce-8b1e944543a8 > attach
174         [ 1933.455082] bcache: bch_cached_dev_    167         [ 1933.455082] bcache: bch_cached_dev_attach() Couldn't find uuid for md5 in set
175         [ 1933.478179] bcache: __cached_dev_st    168         [ 1933.478179] bcache: __cached_dev_store() Can't attach 0226553a-37cf-41d5-b3ce-8b1e944543a8
176         [ 1933.478179] : cache set not found      169         [ 1933.478179] : cache set not found
177                                                   170 
178 In this case, the caching device was simply no    171 In this case, the caching device was simply not registered at boot
179 or disappeared and came back, and needs to be     172 or disappeared and came back, and needs to be (re-)registered::
180                                                   173 
181         host:/sys/block/md5/bcache# echo /dev/    174         host:/sys/block/md5/bcache# echo /dev/sdh2 > /sys/fs/bcache/register
182                                                   175 
183                                                   176 
184 C) Corrupt bcache crashes the kernel at device    177 C) Corrupt bcache crashes the kernel at device registration time:
185                                                   178 
186 This should never happen.  If it does happen,     179 This should never happen.  If it does happen, then you have found a bug!
187 Please report it to the bcache development lis    180 Please report it to the bcache development list: linux-bcache@vger.kernel.org
188                                                   181 
189 Be sure to provide as much information that yo    182 Be sure to provide as much information that you can including kernel dmesg
190 output if available so that we may assist.        183 output if available so that we may assist.
191                                                   184 
192                                                   185 
193 D) Recovering data without bcache:                186 D) Recovering data without bcache:
194                                                   187 
195 If bcache is not available in the kernel, a fi    188 If bcache is not available in the kernel, a filesystem on the backing
196 device is still available at an 8KiB offset. S    189 device is still available at an 8KiB offset. So either via a loopdev
197 of the backing device created with --offset 8K    190 of the backing device created with --offset 8K, or any value defined by
198 --data-offset when you originally formatted bc !! 191 --data-offset when you originally formatted bcache with `make-bcache`.
199                                                   192 
200 For example::                                     193 For example::
201                                                   194 
202         losetup -o 8192 /dev/loop0 /dev/your_b    195         losetup -o 8192 /dev/loop0 /dev/your_bcache_backing_dev
203                                                   196 
204 This should present your unmodified backing de    197 This should present your unmodified backing device data in /dev/loop0
205                                                   198 
206 If your cache is in writethrough mode, then yo    199 If your cache is in writethrough mode, then you can safely discard the
207 cache device without losing data.              !! 200 cache device without loosing data.
208                                                   201 
209                                                   202 
210 E) Wiping a cache device                          203 E) Wiping a cache device
211                                                   204 
212 ::                                                205 ::
213                                                   206 
214         host:~# wipefs -a /dev/sdh2               207         host:~# wipefs -a /dev/sdh2
215         16 bytes were erased at offset 0x1018     208         16 bytes were erased at offset 0x1018 (bcache)
216         they were: c6 85 73 f6 4e 1a 45 ca 82     209         they were: c6 85 73 f6 4e 1a 45 ca 82 65 f5 7f 48 ba 6d 81
217                                                   210 
218 After you boot back with bcache enabled, you r    211 After you boot back with bcache enabled, you recreate the cache and attach it::
219                                                   212 
220         host:~# bcache make -C /dev/sdh2       !! 213         host:~# make-bcache -C /dev/sdh2
221         UUID:                   7be7e175-8f4c-    214         UUID:                   7be7e175-8f4c-4f99-94b2-9c904d227045
222         Set UUID:               5bc072a8-ab17-    215         Set UUID:               5bc072a8-ab17-446d-9744-e247949913c1
223         version:                0                 216         version:                0
224         nbuckets:               106874            217         nbuckets:               106874
225         block_size:             1                 218         block_size:             1
226         bucket_size:            1024              219         bucket_size:            1024
227         nr_in_set:              1                 220         nr_in_set:              1
228         nr_this_dev:            0                 221         nr_this_dev:            0
229         first_bucket:           1                 222         first_bucket:           1
230         [  650.511912] bcache: run_cache_set()    223         [  650.511912] bcache: run_cache_set() invalidating existing data
231         [  650.549228] bcache: register_cache(    224         [  650.549228] bcache: register_cache() registered cache device sdh2
232                                                   225 
233 start backing device with missing cache::         226 start backing device with missing cache::
234                                                   227 
235         host:/sys/block/md5/bcache# echo 1 > r    228         host:/sys/block/md5/bcache# echo 1 > running
236                                                   229 
237 attach new cache::                                230 attach new cache::
238                                                   231 
239         host:/sys/block/md5/bcache# echo 5bc07    232         host:/sys/block/md5/bcache# echo 5bc072a8-ab17-446d-9744-e247949913c1 > attach
240         [  865.276616] bcache: bch_cached_dev_    233         [  865.276616] bcache: bch_cached_dev_attach() Caching md5 as bcache0 on set 5bc072a8-ab17-446d-9744-e247949913c1
241                                                   234 
242                                                   235 
243 F) Remove or replace a caching device::           236 F) Remove or replace a caching device::
244                                                   237 
245         host:/sys/block/sda/sda7/bcache# echo     238         host:/sys/block/sda/sda7/bcache# echo 1 > detach
246         [  695.872542] bcache: cached_dev_deta    239         [  695.872542] bcache: cached_dev_detach_finish() Caching disabled for sda7
247                                                   240 
248         host:~# wipefs -a /dev/nvme0n1p4          241         host:~# wipefs -a /dev/nvme0n1p4
249         wipefs: error: /dev/nvme0n1p4: probing    242         wipefs: error: /dev/nvme0n1p4: probing initialization failed: Device or resource busy
250         Ooops, it's disabled, but not unregist    243         Ooops, it's disabled, but not unregistered, so it's still protected
251                                                   244 
252 We need to go and unregister it::                 245 We need to go and unregister it::
253                                                   246 
254         host:/sys/fs/bcache/b7ba27a1-2398-4649    247         host:/sys/fs/bcache/b7ba27a1-2398-4649-8ae3-0959f57ba128# ls -l cache0
255         lrwxrwxrwx 1 root root 0 Feb 25 18:33     248         lrwxrwxrwx 1 root root 0 Feb 25 18:33 cache0 -> ../../../devices/pci0000:00/0000:00:1d.0/0000:70:00.0/nvme/nvme0/nvme0n1/nvme0n1p4/bcache/
256         host:/sys/fs/bcache/b7ba27a1-2398-4649    249         host:/sys/fs/bcache/b7ba27a1-2398-4649-8ae3-0959f57ba128# echo 1 > stop
257         kernel: [  917.041908] bcache: cache_s    250         kernel: [  917.041908] bcache: cache_set_free() Cache set b7ba27a1-2398-4649-8ae3-0959f57ba128 unregistered
258                                                   251 
259 Now we can wipe it::                              252 Now we can wipe it::
260                                                   253 
261         host:~# wipefs -a /dev/nvme0n1p4          254         host:~# wipefs -a /dev/nvme0n1p4
262         /dev/nvme0n1p4: 16 bytes were erased a    255         /dev/nvme0n1p4: 16 bytes were erased at offset 0x00001018 (bcache): c6 85 73 f6 4e 1a 45 ca 82 65 f5 7f 48 ba 6d 81
263                                                   256 
264                                                   257 
265 G) dm-crypt and bcache                            258 G) dm-crypt and bcache
266                                                   259 
267 First setup bcache unencrypted and then instal    260 First setup bcache unencrypted and then install dmcrypt on top of
268 /dev/bcache<N> This will work faster than if y    261 /dev/bcache<N> This will work faster than if you dmcrypt both the backing
269 and caching devices and then install bcache on    262 and caching devices and then install bcache on top. [benchmarks?]
270                                                   263 
271                                                   264 
272 H) Stop/free a registered bcache to wipe and/o    265 H) Stop/free a registered bcache to wipe and/or recreate it
273                                                   266 
274 Suppose that you need to free up all bcache re    267 Suppose that you need to free up all bcache references so that you can
275 fdisk run and re-register a changed partition     268 fdisk run and re-register a changed partition table, which won't work
276 if there are any active backing or caching dev    269 if there are any active backing or caching devices left on it:
277                                                   270 
278 1) Is it present in /dev/bcache* ? (there are     271 1) Is it present in /dev/bcache* ? (there are times where it won't be)
279                                                   272 
280    If so, it's easy::                             273    If so, it's easy::
281                                                   274 
282         host:/sys/block/bcache0/bcache# echo 1    275         host:/sys/block/bcache0/bcache# echo 1 > stop
283                                                   276 
284 2) But if your backing device is gone, this wo    277 2) But if your backing device is gone, this won't work::
285                                                   278 
286         host:/sys/block/bcache0# cd bcache        279         host:/sys/block/bcache0# cd bcache
287         bash: cd: bcache: No such file or dire    280         bash: cd: bcache: No such file or directory
288                                                   281 
289    In this case, you may have to unregister th    282    In this case, you may have to unregister the dmcrypt block device that
290    references this bcache to free it up::         283    references this bcache to free it up::
291                                                   284 
292         host:~# dmsetup remove oldds1             285         host:~# dmsetup remove oldds1
293         bcache: bcache_device_free() bcache0 s    286         bcache: bcache_device_free() bcache0 stopped
294         bcache: cache_set_free() Cache set 5bc    287         bcache: cache_set_free() Cache set 5bc072a8-ab17-446d-9744-e247949913c1 unregistered
295                                                   288 
296    This causes the backing bcache to be remove    289    This causes the backing bcache to be removed from /sys/fs/bcache and
297    then it can be reused.  This would be true     290    then it can be reused.  This would be true of any block device stacking
298    where bcache is a lower device.                291    where bcache is a lower device.
299                                                   292 
300 3) In other cases, you can also look in /sys/f    293 3) In other cases, you can also look in /sys/fs/bcache/::
301                                                   294 
302         host:/sys/fs/bcache# ls -l */{cache?,b    295         host:/sys/fs/bcache# ls -l */{cache?,bdev?}
303         lrwxrwxrwx 1 root root 0 Mar  5 09:39     296         lrwxrwxrwx 1 root root 0 Mar  5 09:39 0226553a-37cf-41d5-b3ce-8b1e944543a8/bdev1 -> ../../../devices/virtual/block/dm-1/bcache/
304         lrwxrwxrwx 1 root root 0 Mar  5 09:39     297         lrwxrwxrwx 1 root root 0 Mar  5 09:39 0226553a-37cf-41d5-b3ce-8b1e944543a8/cache0 -> ../../../devices/virtual/block/dm-4/bcache/
305         lrwxrwxrwx 1 root root 0 Mar  5 09:39     298         lrwxrwxrwx 1 root root 0 Mar  5 09:39 5bc072a8-ab17-446d-9744-e247949913c1/cache0 -> ../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/ata10/host9/target9:0:0/9:0:0:0/block/sdl/sdl2/bcache/
306                                                   299 
307    The device names will show which UUID is re    300    The device names will show which UUID is relevant, cd in that directory
308    and stop the cache::                           301    and stop the cache::
309                                                   302 
310         host:/sys/fs/bcache/5bc072a8-ab17-446d    303         host:/sys/fs/bcache/5bc072a8-ab17-446d-9744-e247949913c1# echo 1 > stop
311                                                   304 
312    This will free up bcache references and let    305    This will free up bcache references and let you reuse the partition for
313    other purposes.                                306    other purposes.
314                                                   307 
315                                                   308 
316                                                   309 
317 Troubleshooting performance                       310 Troubleshooting performance
318 ---------------------------                       311 ---------------------------
319                                                   312 
320 Bcache has a bunch of config options and tunab    313 Bcache has a bunch of config options and tunables. The defaults are intended to
321 be reasonable for typical desktop and server w    314 be reasonable for typical desktop and server workloads, but they're not what you
322 want for getting the best possible numbers whe    315 want for getting the best possible numbers when benchmarking.
323                                                   316 
324  - Backing device alignment                       317  - Backing device alignment
325                                                   318 
326    The default metadata size in bcache is 8k.     319    The default metadata size in bcache is 8k.  If your backing device is
327    RAID based, then be sure to align this by a    320    RAID based, then be sure to align this by a multiple of your stride
328    width using `bcache make --data-offset`. If !! 321    width using `make-bcache --data-offset`. If you intend to expand your
329    disk array in the future, then multiply a s    322    disk array in the future, then multiply a series of primes by your
330    raid stripe size to get the disk multiples     323    raid stripe size to get the disk multiples that you would like.
331                                                   324 
332    For example:  If you have a 64k stripe size    325    For example:  If you have a 64k stripe size, then the following offset
333    would provide alignment for many common RAI    326    would provide alignment for many common RAID5 data spindle counts::
334                                                   327 
335         64k * 2*2*2*3*3*5*7 bytes = 161280k       328         64k * 2*2*2*3*3*5*7 bytes = 161280k
336                                                   329 
337    That space is wasted, but for only 157.5MB     330    That space is wasted, but for only 157.5MB you can grow your RAID 5
338    volume to the following data-spindle counts    331    volume to the following data-spindle counts without re-aligning::
339                                                   332 
340         3,4,5,6,7,8,9,10,12,14,15,18,20,21 ...    333         3,4,5,6,7,8,9,10,12,14,15,18,20,21 ...
341                                                   334 
342  - Bad write performance                          335  - Bad write performance
343                                                   336 
344    If write performance is not what you expect    337    If write performance is not what you expected, you probably wanted to be
345    running in writeback mode, which isn't the     338    running in writeback mode, which isn't the default (not due to a lack of
346    maturity, but simply because in writeback m    339    maturity, but simply because in writeback mode you'll lose data if something
347    happens to your SSD)::                         340    happens to your SSD)::
348                                                   341 
349         # echo writeback > /sys/block/bcache0/    342         # echo writeback > /sys/block/bcache0/bcache/cache_mode
350                                                   343 
351  - Bad performance, or traffic not going to th    344  - Bad performance, or traffic not going to the SSD that you'd expect
352                                                   345 
353    By default, bcache doesn't cache everything    346    By default, bcache doesn't cache everything. It tries to skip sequential IO -
354    because you really want to be caching the r    347    because you really want to be caching the random IO, and if you copy a 10
355    gigabyte file you probably don't want that     348    gigabyte file you probably don't want that pushing 10 gigabytes of randomly
356    accessed data out of your cache.               349    accessed data out of your cache.
357                                                   350 
358    But if you want to benchmark reads from cac    351    But if you want to benchmark reads from cache, and you start out with fio
359    writing an 8 gigabyte test file - so you wa    352    writing an 8 gigabyte test file - so you want to disable that::
360                                                   353 
361         # echo 0 > /sys/block/bcache0/bcache/s    354         # echo 0 > /sys/block/bcache0/bcache/sequential_cutoff
362                                                   355 
363    To set it back to the default (4 mb), do::     356    To set it back to the default (4 mb), do::
364                                                   357 
365         # echo 4M > /sys/block/bcache0/bcache/    358         # echo 4M > /sys/block/bcache0/bcache/sequential_cutoff
366                                                   359 
367  - Traffic's still going to the spindle/still     360  - Traffic's still going to the spindle/still getting cache misses
368                                                   361 
369    In the real world, SSDs don't always keep u    362    In the real world, SSDs don't always keep up with disks - particularly with
370    slower SSDs, many disks being cached by one    363    slower SSDs, many disks being cached by one SSD, or mostly sequential IO. So
371    you want to avoid being bottlenecked by the    364    you want to avoid being bottlenecked by the SSD and having it slow everything
372    down.                                          365    down.
373                                                   366 
374    To avoid that bcache tracks latency to the     367    To avoid that bcache tracks latency to the cache device, and gradually
375    throttles traffic if the latency exceeds a     368    throttles traffic if the latency exceeds a threshold (it does this by
376    cranking down the sequential bypass).          369    cranking down the sequential bypass).
377                                                   370 
378    You can disable this if you need to by sett    371    You can disable this if you need to by setting the thresholds to 0::
379                                                   372 
380         # echo 0 > /sys/fs/bcache/<cache set>/    373         # echo 0 > /sys/fs/bcache/<cache set>/congested_read_threshold_us
381         # echo 0 > /sys/fs/bcache/<cache set>/    374         # echo 0 > /sys/fs/bcache/<cache set>/congested_write_threshold_us
382                                                   375 
383    The default is 2000 us (2 milliseconds) for    376    The default is 2000 us (2 milliseconds) for reads, and 20000 for writes.
384                                                   377 
385  - Still getting cache misses, of the same dat    378  - Still getting cache misses, of the same data
386                                                   379 
387    One last issue that sometimes trips people     380    One last issue that sometimes trips people up is actually an old bug, due to
388    the way cache coherency is handled for cach    381    the way cache coherency is handled for cache misses. If a btree node is full,
389    a cache miss won't be able to insert a key     382    a cache miss won't be able to insert a key for the new data and the data
390    won't be written to the cache.                 383    won't be written to the cache.
391                                                   384 
392    In practice this isn't an issue because as     385    In practice this isn't an issue because as soon as a write comes along it'll
393    cause the btree node to be split, and you n    386    cause the btree node to be split, and you need almost no write traffic for
394    this to not show up enough to be noticeable    387    this to not show up enough to be noticeable (especially since bcache's btree
395    nodes are huge and index large regions of t    388    nodes are huge and index large regions of the device). But when you're
396    benchmarking, if you're trying to warm the     389    benchmarking, if you're trying to warm the cache by reading a bunch of data
397    and there's no other traffic - that can be     390    and there's no other traffic - that can be a problem.
398                                                   391 
399    Solution: warm the cache by doing writes, o    392    Solution: warm the cache by doing writes, or use the testing branch (there's
400    a fix for the issue there).                    393    a fix for the issue there).
401                                                   394 
402                                                   395 
403 Sysfs - backing device                            396 Sysfs - backing device
404 ----------------------                            397 ----------------------
405                                                   398 
406 Available at /sys/block/<bdev>/bcache, /sys/bl    399 Available at /sys/block/<bdev>/bcache, /sys/block/bcache*/bcache and
407 (if attached) /sys/fs/bcache/<cset-uuid>/bdev*    400 (if attached) /sys/fs/bcache/<cset-uuid>/bdev*
408                                                   401 
409 attach                                            402 attach
410   Echo the UUID of a cache set to this file to    403   Echo the UUID of a cache set to this file to enable caching.
411                                                   404 
412 cache_mode                                        405 cache_mode
413   Can be one of either writethrough, writeback    406   Can be one of either writethrough, writeback, writearound or none.
414                                                   407 
415 clear_stats                                       408 clear_stats
416   Writing to this file resets the running tota    409   Writing to this file resets the running total stats (not the day/hour/5 minute
417   decaying versions).                             410   decaying versions).
418                                                   411 
419 detach                                            412 detach
420   Write to this file to detach from a cache se    413   Write to this file to detach from a cache set. If there is dirty data in the
421   cache, it will be flushed first.                414   cache, it will be flushed first.
422                                                   415 
423 dirty_data                                        416 dirty_data
424   Amount of dirty data for this backing device    417   Amount of dirty data for this backing device in the cache. Continuously
425   updated unlike the cache set's version, but     418   updated unlike the cache set's version, but may be slightly off.
426                                                   419 
427 label                                             420 label
428   Name of underlying device.                      421   Name of underlying device.
429                                                   422 
430 readahead                                         423 readahead
431   Size of readahead that should be performed.     424   Size of readahead that should be performed.  Defaults to 0.  If set to e.g.
432   1M, it will round cache miss reads up to tha    425   1M, it will round cache miss reads up to that size, but without overlapping
433   existing cache entries.                         426   existing cache entries.
434                                                   427 
435 running                                           428 running
436   1 if bcache is running (i.e. whether the /de    429   1 if bcache is running (i.e. whether the /dev/bcache device exists, whether
437   it's in passthrough mode or caching).           430   it's in passthrough mode or caching).
438                                                   431 
439 sequential_cutoff                                 432 sequential_cutoff
440   A sequential IO will bypass the cache once i    433   A sequential IO will bypass the cache once it passes this threshold; the
441   most recent 128 IOs are tracked so sequentia    434   most recent 128 IOs are tracked so sequential IO can be detected even when
442   it isn't all done at once.                      435   it isn't all done at once.
443                                                   436 
444 sequential_merge                                  437 sequential_merge
445   If non zero, bcache keeps a list of the last    438   If non zero, bcache keeps a list of the last 128 requests submitted to compare
446   against all new requests to determine which     439   against all new requests to determine which new requests are sequential
447   continuations of previous requests for the p    440   continuations of previous requests for the purpose of determining sequential
448   cutoff. This is necessary if the sequential     441   cutoff. This is necessary if the sequential cutoff value is greater than the
449   maximum acceptable sequential size for any s    442   maximum acceptable sequential size for any single request.
450                                                   443 
451 state                                             444 state
452   The backing device can be in one of four dif    445   The backing device can be in one of four different states:
453                                                   446 
454   no cache: Has never been attached to a cache    447   no cache: Has never been attached to a cache set.
455                                                   448 
456   clean: Part of a cache set, and there is no     449   clean: Part of a cache set, and there is no cached dirty data.
457                                                   450 
458   dirty: Part of a cache set, and there is cac    451   dirty: Part of a cache set, and there is cached dirty data.
459                                                   452 
460   inconsistent: The backing device was forcibl    453   inconsistent: The backing device was forcibly run by the user when there was
461   dirty data cached but the cache set was unav    454   dirty data cached but the cache set was unavailable; whatever data was on the
462   backing device has likely been corrupted.       455   backing device has likely been corrupted.
463                                                   456 
464 stop                                              457 stop
465   Write to this file to shut down the bcache d    458   Write to this file to shut down the bcache device and close the backing
466   device.                                         459   device.
467                                                   460 
468 writeback_delay                                   461 writeback_delay
469   When dirty data is written to the cache and     462   When dirty data is written to the cache and it previously did not contain
470   any, waits some number of seconds before ini    463   any, waits some number of seconds before initiating writeback. Defaults to
471   30.                                             464   30.
472                                                   465 
473 writeback_percent                                 466 writeback_percent
474   If nonzero, bcache tries to keep around this    467   If nonzero, bcache tries to keep around this percentage of the cache dirty by
475   throttling background writeback and using a     468   throttling background writeback and using a PD controller to smoothly adjust
476   the rate.                                       469   the rate.
477                                                   470 
478 writeback_rate                                    471 writeback_rate
479   Rate in sectors per second - if writeback_pe    472   Rate in sectors per second - if writeback_percent is nonzero, background
480   writeback is throttled to this rate. Continu    473   writeback is throttled to this rate. Continuously adjusted by bcache but may
481   also be set by the user.                        474   also be set by the user.
482                                                   475 
483 writeback_running                                 476 writeback_running
484   If off, writeback of dirty data will not tak    477   If off, writeback of dirty data will not take place at all. Dirty data will
485   still be added to the cache until it is most    478   still be added to the cache until it is mostly full; only meant for
486   benchmarking. Defaults to on.                   479   benchmarking. Defaults to on.
487                                                   480 
488 Sysfs - backing device stats                      481 Sysfs - backing device stats
489 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~                      482 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
490                                                   483 
491 There are directories with these numbers for a    484 There are directories with these numbers for a running total, as well as
492 versions that decay over the past day, hour an    485 versions that decay over the past day, hour and 5 minutes; they're also
493 aggregated in the cache set directory as well.    486 aggregated in the cache set directory as well.
494                                                   487 
495 bypassed                                          488 bypassed
496   Amount of IO (both reads and writes) that ha    489   Amount of IO (both reads and writes) that has bypassed the cache
497                                                   490 
498 cache_hits, cache_misses, cache_hit_ratio         491 cache_hits, cache_misses, cache_hit_ratio
499   Hits and misses are counted per individual I    492   Hits and misses are counted per individual IO as bcache sees them; a
500   partial hit is counted as a miss.               493   partial hit is counted as a miss.
501                                                   494 
502 cache_bypass_hits, cache_bypass_misses            495 cache_bypass_hits, cache_bypass_misses
503   Hits and misses for IO that is intended to s    496   Hits and misses for IO that is intended to skip the cache are still counted,
504   but broken out here.                            497   but broken out here.
505                                                   498 
506 cache_miss_collisions                             499 cache_miss_collisions
507   Counts instances where data was going to be     500   Counts instances where data was going to be inserted into the cache from a
508   cache miss, but raced with a write and data     501   cache miss, but raced with a write and data was already present (usually 0
509   since the synchronization for cache misses w    502   since the synchronization for cache misses was rewritten)
                                                   >> 503 
                                                   >> 504 cache_readaheads
                                                   >> 505   Count of times readahead occurred.
510                                                   506 
511 Sysfs - cache set                                 507 Sysfs - cache set
512 ~~~~~~~~~~~~~~~~~                                 508 ~~~~~~~~~~~~~~~~~
513                                                   509 
514 Available at /sys/fs/bcache/<cset-uuid>           510 Available at /sys/fs/bcache/<cset-uuid>
515                                                   511 
516 average_key_size                                  512 average_key_size
517   Average data per key in the btree.              513   Average data per key in the btree.
518                                                   514 
519 bdev<0..n>                                        515 bdev<0..n>
520   Symlink to each of the attached backing devi    516   Symlink to each of the attached backing devices.
521                                                   517 
522 block_size                                        518 block_size
523   Block size of the cache devices.                519   Block size of the cache devices.
524                                                   520 
525 btree_cache_size                                  521 btree_cache_size
526   Amount of memory currently used by the btree    522   Amount of memory currently used by the btree cache
527                                                   523 
528 bucket_size                                       524 bucket_size
529   Size of buckets                                 525   Size of buckets
530                                                   526 
531 cache<0..n>                                       527 cache<0..n>
532   Symlink to each of the cache devices compris    528   Symlink to each of the cache devices comprising this cache set.
533                                                   529 
534 cache_available_percent                           530 cache_available_percent
535   Percentage of cache device which doesn't con    531   Percentage of cache device which doesn't contain dirty data, and could
536   potentially be used for writeback.  This doe    532   potentially be used for writeback.  This doesn't mean this space isn't used
537   for clean cached data; the unused statistic     533   for clean cached data; the unused statistic (in priority_stats) is typically
538   much lower.                                     534   much lower.
539                                                   535 
540 clear_stats                                       536 clear_stats
541   Clears the statistics associated with this c    537   Clears the statistics associated with this cache
542                                                   538 
543 dirty_data                                        539 dirty_data
544   Amount of dirty data is in the cache (update    540   Amount of dirty data is in the cache (updated when garbage collection runs).
545                                                   541 
546 flash_vol_create                                  542 flash_vol_create
547   Echoing a size to this file (in human readab    543   Echoing a size to this file (in human readable units, k/M/G) creates a thinly
548   provisioned volume backed by the cache set.     544   provisioned volume backed by the cache set.
549                                                   545 
550 io_error_halflife, io_error_limit                 546 io_error_halflife, io_error_limit
551   These determines how many errors we accept b    547   These determines how many errors we accept before disabling the cache.
552   Each error is decayed by the half life (in #    548   Each error is decayed by the half life (in # ios).  If the decaying count
553   reaches io_error_limit dirty data is written    549   reaches io_error_limit dirty data is written out and the cache is disabled.
554                                                   550 
555 journal_delay_ms                                  551 journal_delay_ms
556   Journal writes will delay for up to this man    552   Journal writes will delay for up to this many milliseconds, unless a cache
557   flush happens sooner. Defaults to 100.          553   flush happens sooner. Defaults to 100.
558                                                   554 
559 root_usage_percent                                555 root_usage_percent
560   Percentage of the root btree node in use.  I    556   Percentage of the root btree node in use.  If this gets too high the node
561   will split, increasing the tree depth.          557   will split, increasing the tree depth.
562                                                   558 
563 stop                                              559 stop
564   Write to this file to shut down the cache se    560   Write to this file to shut down the cache set - waits until all attached
565   backing devices have been shut down.            561   backing devices have been shut down.
566                                                   562 
567 tree_depth                                        563 tree_depth
568   Depth of the btree (A single node btree has     564   Depth of the btree (A single node btree has depth 0).
569                                                   565 
570 unregister                                        566 unregister
571   Detaches all backing devices and closes the     567   Detaches all backing devices and closes the cache devices; if dirty data is
572   present it will disable writeback caching an    568   present it will disable writeback caching and wait for it to be flushed.
573                                                   569 
574 Sysfs - cache set internal                        570 Sysfs - cache set internal
575 ~~~~~~~~~~~~~~~~~~~~~~~~~~                        571 ~~~~~~~~~~~~~~~~~~~~~~~~~~
576                                                   572 
577 This directory also exposes timings for a numb    573 This directory also exposes timings for a number of internal operations, with
578 separate files for average duration, average f    574 separate files for average duration, average frequency, last occurrence and max
579 duration: garbage collection, btree read, btre    575 duration: garbage collection, btree read, btree node sorts and btree splits.
580                                                   576 
581 active_journal_entries                            577 active_journal_entries
582   Number of journal entries that are newer tha    578   Number of journal entries that are newer than the index.
583                                                   579 
584 btree_nodes                                       580 btree_nodes
585   Total nodes in the btree.                       581   Total nodes in the btree.
586                                                   582 
587 btree_used_percent                                583 btree_used_percent
588   Average fraction of btree in use.               584   Average fraction of btree in use.
589                                                   585 
590 bset_tree_stats                                   586 bset_tree_stats
591   Statistics about the auxiliary search trees     587   Statistics about the auxiliary search trees
592                                                   588 
593 btree_cache_max_chain                             589 btree_cache_max_chain
594   Longest chain in the btree node cache's hash    590   Longest chain in the btree node cache's hash table
595                                                   591 
596 cache_read_races                                  592 cache_read_races
597   Counts instances where while data was being     593   Counts instances where while data was being read from the cache, the bucket
598   was reused and invalidated - i.e. where the     594   was reused and invalidated - i.e. where the pointer was stale after the read
599   completed. When this occurs the data is rere    595   completed. When this occurs the data is reread from the backing device.
600                                                   596 
601 trigger_gc                                        597 trigger_gc
602   Writing to this file forces garbage collecti    598   Writing to this file forces garbage collection to run.
603                                                   599 
604 Sysfs - Cache device                              600 Sysfs - Cache device
605 ~~~~~~~~~~~~~~~~~~~~                              601 ~~~~~~~~~~~~~~~~~~~~
606                                                   602 
607 Available at /sys/block/<cdev>/bcache             603 Available at /sys/block/<cdev>/bcache
608                                                   604 
609 block_size                                        605 block_size
610   Minimum granularity of writes - should match    606   Minimum granularity of writes - should match hardware sector size.
611                                                   607 
612 btree_written                                     608 btree_written
613   Sum of all btree writes, in (kilo/mega/giga)    609   Sum of all btree writes, in (kilo/mega/giga) bytes
614                                                   610 
615 bucket_size                                       611 bucket_size
616   Size of buckets                                 612   Size of buckets
617                                                   613 
618 cache_replacement_policy                          614 cache_replacement_policy
619   One of either lru, fifo or random.              615   One of either lru, fifo or random.
620                                                   616 
621 discard                                           617 discard
622   Boolean; if on a discard/TRIM will be issued    618   Boolean; if on a discard/TRIM will be issued to each bucket before it is
623   reused. Defaults to off, since SATA TRIM is     619   reused. Defaults to off, since SATA TRIM is an unqueued command (and thus
624   slow).                                          620   slow).
625                                                   621 
626 freelist_percent                                  622 freelist_percent
627   Size of the freelist as a percentage of nbuc    623   Size of the freelist as a percentage of nbuckets. Can be written to to
628   increase the number of buckets kept on the f    624   increase the number of buckets kept on the freelist, which lets you
629   artificially reduce the size of the cache at    625   artificially reduce the size of the cache at runtime. Mostly for testing
630   purposes (i.e. testing how different size ca    626   purposes (i.e. testing how different size caches affect your hit rate), but
631   since buckets are discarded when they move o    627   since buckets are discarded when they move on to the freelist will also make
632   the SSD's garbage collection easier by effec    628   the SSD's garbage collection easier by effectively giving it more reserved
633   space.                                          629   space.
634                                                   630 
635 io_errors                                         631 io_errors
636   Number of errors that have occurred, decayed    632   Number of errors that have occurred, decayed by io_error_halflife.
637                                                   633 
638 metadata_written                                  634 metadata_written
639   Sum of all non data writes (btree writes and    635   Sum of all non data writes (btree writes and all other metadata).
640                                                   636 
641 nbuckets                                          637 nbuckets
642   Total buckets in this cache                     638   Total buckets in this cache
643                                                   639 
644 priority_stats                                    640 priority_stats
645   Statistics about how recently data in the ca    641   Statistics about how recently data in the cache has been accessed.
646   This can reveal your working set size.  Unus    642   This can reveal your working set size.  Unused is the percentage of
647   the cache that doesn't contain any data.  Me    643   the cache that doesn't contain any data.  Metadata is bcache's
648   metadata overhead.  Average is the average p    644   metadata overhead.  Average is the average priority of cache buckets.
649   Next is a list of quantiles with the priorit    645   Next is a list of quantiles with the priority threshold of each.
650                                                   646 
651 written                                           647 written
652   Sum of all data that has been written to the    648   Sum of all data that has been written to the cache; comparison with
653   btree_written gives the amount of write infl    649   btree_written gives the amount of write inflation in bcache.
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php