~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/admin-guide/bcache.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/admin-guide/bcache.rst (Version linux-6.12-rc7) and /Documentation/admin-guide/bcache.rst (Version linux-5.11.22)


  1 ============================                        1 ============================
  2 A block layer cache (bcache)                        2 A block layer cache (bcache)
  3 ============================                        3 ============================
  4                                                     4 
  5 Say you've got a big slow raid 6, and an ssd o      5 Say you've got a big slow raid 6, and an ssd or three. Wouldn't it be
  6 nice if you could use them as cache... Hence b      6 nice if you could use them as cache... Hence bcache.
  7                                                     7 
  8 The bcache wiki can be found at:                    8 The bcache wiki can be found at:
  9   https://bcache.evilpiepirate.org                  9   https://bcache.evilpiepirate.org
 10                                                    10 
 11 This is the git repository of bcache-tools:        11 This is the git repository of bcache-tools:
 12   https://git.kernel.org/pub/scm/linux/kernel/     12   https://git.kernel.org/pub/scm/linux/kernel/git/colyli/bcache-tools.git/
 13                                                    13 
 14 The latest bcache kernel code can be found fro     14 The latest bcache kernel code can be found from mainline Linux kernel:
 15   https://git.kernel.org/pub/scm/linux/kernel/     15   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
 16                                                    16 
 17 It's designed around the performance character     17 It's designed around the performance characteristics of SSDs - it only allocates
 18 in erase block sized buckets, and it uses a hy     18 in erase block sized buckets, and it uses a hybrid btree/log to track cached
 19 extents (which can be anywhere from a single s     19 extents (which can be anywhere from a single sector to the bucket size). It's
 20 designed to avoid random writes at all costs;      20 designed to avoid random writes at all costs; it fills up an erase block
 21 sequentially, then issues a discard before reu     21 sequentially, then issues a discard before reusing it.
 22                                                    22 
 23 Both writethrough and writeback caching are su     23 Both writethrough and writeback caching are supported. Writeback defaults to
 24 off, but can be switched on and off arbitraril     24 off, but can be switched on and off arbitrarily at runtime. Bcache goes to
 25 great lengths to protect your data - it reliab     25 great lengths to protect your data - it reliably handles unclean shutdown. (It
 26 doesn't even have a notion of a clean shutdown     26 doesn't even have a notion of a clean shutdown; bcache simply doesn't return
 27 writes as completed until they're on stable st     27 writes as completed until they're on stable storage).
 28                                                    28 
 29 Writeback caching can use most of the cache fo     29 Writeback caching can use most of the cache for buffering writes - writing
 30 dirty data to the backing device is always don     30 dirty data to the backing device is always done sequentially, scanning from the
 31 start to the end of the index.                     31 start to the end of the index.
 32                                                    32 
 33 Since random IO is what SSDs excel at, there g     33 Since random IO is what SSDs excel at, there generally won't be much benefit
 34 to caching large sequential IO. Bcache detects     34 to caching large sequential IO. Bcache detects sequential IO and skips it;
 35 it also keeps a rolling average of the IO size     35 it also keeps a rolling average of the IO sizes per task, and as long as the
 36 average is above the cutoff it will skip all I     36 average is above the cutoff it will skip all IO from that task - instead of
 37 caching the first 512k after every seek. Backu     37 caching the first 512k after every seek. Backups and large file copies should
 38 thus entirely bypass the cache.                    38 thus entirely bypass the cache.
 39                                                    39 
 40 In the event of a data IO error on the flash i     40 In the event of a data IO error on the flash it will try to recover by reading
 41 from disk or invalidating cache entries.  For      41 from disk or invalidating cache entries.  For unrecoverable errors (meta data
 42 or dirty data), caching is automatically disab     42 or dirty data), caching is automatically disabled; if dirty data was present
 43 in the cache it first disables writeback cachi     43 in the cache it first disables writeback caching and waits for all dirty data
 44 to be flushed.                                     44 to be flushed.
 45                                                    45 
 46 Getting started:                                   46 Getting started:
 47 You'll need bcache util from the bcache-tools      47 You'll need bcache util from the bcache-tools repository. Both the cache device
 48 and backing device must be formatted before us     48 and backing device must be formatted before use::
 49                                                    49 
 50   bcache make -B /dev/sdb                          50   bcache make -B /dev/sdb
 51   bcache make -C /dev/sdc                          51   bcache make -C /dev/sdc
 52                                                    52 
 53 `bcache make` has the ability to format multip     53 `bcache make` has the ability to format multiple devices at the same time - if
 54 you format your backing devices and cache devi     54 you format your backing devices and cache device at the same time, you won't
 55 have to manually attach::                          55 have to manually attach::
 56                                                    56 
 57   bcache make -B /dev/sda /dev/sdb -C /dev/sdc     57   bcache make -B /dev/sda /dev/sdb -C /dev/sdc
 58                                                    58 
 59 If your bcache-tools is not updated to latest      59 If your bcache-tools is not updated to latest version and does not have the
 60 unified `bcache` utility, you may use the lega     60 unified `bcache` utility, you may use the legacy `make-bcache` utility to format
 61 bcache device with same -B and -C parameters.      61 bcache device with same -B and -C parameters.
 62                                                    62 
 63 bcache-tools now ships udev rules, and bcache      63 bcache-tools now ships udev rules, and bcache devices are known to the kernel
 64 immediately.  Without udev, you can manually r     64 immediately.  Without udev, you can manually register devices like this::
 65                                                    65 
 66   echo /dev/sdb > /sys/fs/bcache/register          66   echo /dev/sdb > /sys/fs/bcache/register
 67   echo /dev/sdc > /sys/fs/bcache/register          67   echo /dev/sdc > /sys/fs/bcache/register
 68                                                    68 
 69 Registering the backing device makes the bcach     69 Registering the backing device makes the bcache device show up in /dev; you can
 70 now format it and use it as normal. But the fi     70 now format it and use it as normal. But the first time using a new bcache
 71 device, it'll be running in passthrough mode u     71 device, it'll be running in passthrough mode until you attach it to a cache.
 72 If you are thinking about using bcache later,      72 If you are thinking about using bcache later, it is recommended to setup all your
 73 slow devices as bcache backing devices without     73 slow devices as bcache backing devices without a cache, and you can choose to add
 74 a caching device later.                            74 a caching device later.
 75 See 'ATTACHING' section below.                     75 See 'ATTACHING' section below.
 76                                                    76 
 77 The devices show up as::                           77 The devices show up as::
 78                                                    78 
 79   /dev/bcache<N>                                   79   /dev/bcache<N>
 80                                                    80 
 81 As well as (with udev)::                           81 As well as (with udev)::
 82                                                    82 
 83   /dev/bcache/by-uuid/<uuid>                       83   /dev/bcache/by-uuid/<uuid>
 84   /dev/bcache/by-label/<label>                     84   /dev/bcache/by-label/<label>
 85                                                    85 
 86 To get started::                                   86 To get started::
 87                                                    87 
 88   mkfs.ext4 /dev/bcache0                           88   mkfs.ext4 /dev/bcache0
 89   mount /dev/bcache0 /mnt                          89   mount /dev/bcache0 /mnt
 90                                                    90 
 91 You can control bcache devices through sysfs a     91 You can control bcache devices through sysfs at /sys/block/bcache<N>/bcache .
 92 You can also control them through /sys/fs//bca     92 You can also control them through /sys/fs//bcache/<cset-uuid>/ .
 93                                                    93 
 94 Cache devices are managed as sets; multiple ca     94 Cache devices are managed as sets; multiple caches per set isn't supported yet
 95 but will allow for mirroring of metadata and d     95 but will allow for mirroring of metadata and dirty data in the future. Your new
 96 cache set shows up as /sys/fs/bcache/<UUID>        96 cache set shows up as /sys/fs/bcache/<UUID>
 97                                                    97 
 98 Attaching                                          98 Attaching
 99 ---------                                          99 ---------
100                                                   100 
101 After your cache device and backing device are    101 After your cache device and backing device are registered, the backing device
102 must be attached to your cache set to enable c    102 must be attached to your cache set to enable caching. Attaching a backing
103 device to a cache set is done thusly, with the    103 device to a cache set is done thusly, with the UUID of the cache set in
104 /sys/fs/bcache::                                  104 /sys/fs/bcache::
105                                                   105 
106   echo <CSET-UUID> > /sys/block/bcache0/bcache    106   echo <CSET-UUID> > /sys/block/bcache0/bcache/attach
107                                                   107 
108 This only has to be done once. The next time y    108 This only has to be done once. The next time you reboot, just reregister all
109 your bcache devices. If a backing device has d    109 your bcache devices. If a backing device has data in a cache somewhere, the
110 /dev/bcache<N> device won't be created until t    110 /dev/bcache<N> device won't be created until the cache shows up - particularly
111 important if you have writeback caching turned    111 important if you have writeback caching turned on.
112                                                   112 
113 If you're booting up and your cache device is     113 If you're booting up and your cache device is gone and never coming back, you
114 can force run the backing device::                114 can force run the backing device::
115                                                   115 
116   echo 1 > /sys/block/sdb/bcache/running          116   echo 1 > /sys/block/sdb/bcache/running
117                                                   117 
118 (You need to use /sys/block/sdb (or whatever y    118 (You need to use /sys/block/sdb (or whatever your backing device is called), not
119 /sys/block/bcache0, because bcache0 doesn't ex    119 /sys/block/bcache0, because bcache0 doesn't exist yet. If you're using a
120 partition, the bcache directory would be at /s    120 partition, the bcache directory would be at /sys/block/sdb/sdb2/bcache)
121                                                   121 
122 The backing device will still use that cache s    122 The backing device will still use that cache set if it shows up in the future,
123 but all the cached data will be invalidated. I    123 but all the cached data will be invalidated. If there was dirty data in the
124 cache, don't expect the filesystem to be recov    124 cache, don't expect the filesystem to be recoverable - you will have massive
125 filesystem corruption, though ext4's fsck does    125 filesystem corruption, though ext4's fsck does work miracles.
126                                                   126 
127 Error Handling                                    127 Error Handling
128 --------------                                    128 --------------
129                                                   129 
130 Bcache tries to transparently handle IO errors    130 Bcache tries to transparently handle IO errors to/from the cache device without
131 affecting normal operation; if it sees too man    131 affecting normal operation; if it sees too many errors (the threshold is
132 configurable, and defaults to 0) it shuts down    132 configurable, and defaults to 0) it shuts down the cache device and switches all
133 the backing devices to passthrough mode.          133 the backing devices to passthrough mode.
134                                                   134 
135  - For reads from the cache, if they error we     135  - For reads from the cache, if they error we just retry the read from the
136    backing device.                                136    backing device.
137                                                   137 
138  - For writethrough writes, if the write to th    138  - For writethrough writes, if the write to the cache errors we just switch to
139    invalidating the data at that lba in the ca    139    invalidating the data at that lba in the cache (i.e. the same thing we do for
140    a write that bypasses the cache)               140    a write that bypasses the cache)
141                                                   141 
142  - For writeback writes, we currently pass tha    142  - For writeback writes, we currently pass that error back up to the
143    filesystem/userspace. This could be improve    143    filesystem/userspace. This could be improved - we could retry it as a write
144    that skips the cache so we don't have to er    144    that skips the cache so we don't have to error the write.
145                                                   145 
146  - When we detach, we first try to flush any d    146  - When we detach, we first try to flush any dirty data (if we were running in
147    writeback mode). It currently doesn't do an    147    writeback mode). It currently doesn't do anything intelligent if it fails to
148    read some of the dirty data, though.           148    read some of the dirty data, though.
149                                                   149 
150                                                   150 
151 Howto/cookbook                                    151 Howto/cookbook
152 --------------                                    152 --------------
153                                                   153 
154 A) Starting a bcache with a missing caching de    154 A) Starting a bcache with a missing caching device
155                                                   155 
156 If registering the backing device doesn't help    156 If registering the backing device doesn't help, it's already there, you just need
157 to force it to run without the cache::            157 to force it to run without the cache::
158                                                   158 
159         host:~# echo /dev/sdb1 > /sys/fs/bcach    159         host:~# echo /dev/sdb1 > /sys/fs/bcache/register
160         [  119.844831] bcache: register_bcache    160         [  119.844831] bcache: register_bcache() error opening /dev/sdb1: device already registered
161                                                   161 
162 Next, you try to register your caching device     162 Next, you try to register your caching device if it's present. However
163 if it's absent, or registration fails for some    163 if it's absent, or registration fails for some reason, you can still
164 start your bcache without its cache, like so::    164 start your bcache without its cache, like so::
165                                                   165 
166         host:/sys/block/sdb/sdb1/bcache# echo     166         host:/sys/block/sdb/sdb1/bcache# echo 1 > running
167                                                   167 
168 Note that this may cause data loss if you were    168 Note that this may cause data loss if you were running in writeback mode.
169                                                   169 
170                                                   170 
171 B) Bcache does not find its cache::               171 B) Bcache does not find its cache::
172                                                   172 
173         host:/sys/block/md5/bcache# echo 02265    173         host:/sys/block/md5/bcache# echo 0226553a-37cf-41d5-b3ce-8b1e944543a8 > attach
174         [ 1933.455082] bcache: bch_cached_dev_    174         [ 1933.455082] bcache: bch_cached_dev_attach() Couldn't find uuid for md5 in set
175         [ 1933.478179] bcache: __cached_dev_st    175         [ 1933.478179] bcache: __cached_dev_store() Can't attach 0226553a-37cf-41d5-b3ce-8b1e944543a8
176         [ 1933.478179] : cache set not found      176         [ 1933.478179] : cache set not found
177                                                   177 
178 In this case, the caching device was simply no    178 In this case, the caching device was simply not registered at boot
179 or disappeared and came back, and needs to be     179 or disappeared and came back, and needs to be (re-)registered::
180                                                   180 
181         host:/sys/block/md5/bcache# echo /dev/    181         host:/sys/block/md5/bcache# echo /dev/sdh2 > /sys/fs/bcache/register
182                                                   182 
183                                                   183 
184 C) Corrupt bcache crashes the kernel at device    184 C) Corrupt bcache crashes the kernel at device registration time:
185                                                   185 
186 This should never happen.  If it does happen,     186 This should never happen.  If it does happen, then you have found a bug!
187 Please report it to the bcache development lis    187 Please report it to the bcache development list: linux-bcache@vger.kernel.org
188                                                   188 
189 Be sure to provide as much information that yo    189 Be sure to provide as much information that you can including kernel dmesg
190 output if available so that we may assist.        190 output if available so that we may assist.
191                                                   191 
192                                                   192 
193 D) Recovering data without bcache:                193 D) Recovering data without bcache:
194                                                   194 
195 If bcache is not available in the kernel, a fi    195 If bcache is not available in the kernel, a filesystem on the backing
196 device is still available at an 8KiB offset. S    196 device is still available at an 8KiB offset. So either via a loopdev
197 of the backing device created with --offset 8K    197 of the backing device created with --offset 8K, or any value defined by
198 --data-offset when you originally formatted bc    198 --data-offset when you originally formatted bcache with `bcache make`.
199                                                   199 
200 For example::                                     200 For example::
201                                                   201 
202         losetup -o 8192 /dev/loop0 /dev/your_b    202         losetup -o 8192 /dev/loop0 /dev/your_bcache_backing_dev
203                                                   203 
204 This should present your unmodified backing de    204 This should present your unmodified backing device data in /dev/loop0
205                                                   205 
206 If your cache is in writethrough mode, then yo    206 If your cache is in writethrough mode, then you can safely discard the
207 cache device without losing data.              !! 207 cache device without loosing data.
208                                                   208 
209                                                   209 
210 E) Wiping a cache device                          210 E) Wiping a cache device
211                                                   211 
212 ::                                                212 ::
213                                                   213 
214         host:~# wipefs -a /dev/sdh2               214         host:~# wipefs -a /dev/sdh2
215         16 bytes were erased at offset 0x1018     215         16 bytes were erased at offset 0x1018 (bcache)
216         they were: c6 85 73 f6 4e 1a 45 ca 82     216         they were: c6 85 73 f6 4e 1a 45 ca 82 65 f5 7f 48 ba 6d 81
217                                                   217 
218 After you boot back with bcache enabled, you r    218 After you boot back with bcache enabled, you recreate the cache and attach it::
219                                                   219 
220         host:~# bcache make -C /dev/sdh2          220         host:~# bcache make -C /dev/sdh2
221         UUID:                   7be7e175-8f4c-    221         UUID:                   7be7e175-8f4c-4f99-94b2-9c904d227045
222         Set UUID:               5bc072a8-ab17-    222         Set UUID:               5bc072a8-ab17-446d-9744-e247949913c1
223         version:                0                 223         version:                0
224         nbuckets:               106874            224         nbuckets:               106874
225         block_size:             1                 225         block_size:             1
226         bucket_size:            1024              226         bucket_size:            1024
227         nr_in_set:              1                 227         nr_in_set:              1
228         nr_this_dev:            0                 228         nr_this_dev:            0
229         first_bucket:           1                 229         first_bucket:           1
230         [  650.511912] bcache: run_cache_set()    230         [  650.511912] bcache: run_cache_set() invalidating existing data
231         [  650.549228] bcache: register_cache(    231         [  650.549228] bcache: register_cache() registered cache device sdh2
232                                                   232 
233 start backing device with missing cache::         233 start backing device with missing cache::
234                                                   234 
235         host:/sys/block/md5/bcache# echo 1 > r    235         host:/sys/block/md5/bcache# echo 1 > running
236                                                   236 
237 attach new cache::                                237 attach new cache::
238                                                   238 
239         host:/sys/block/md5/bcache# echo 5bc07    239         host:/sys/block/md5/bcache# echo 5bc072a8-ab17-446d-9744-e247949913c1 > attach
240         [  865.276616] bcache: bch_cached_dev_    240         [  865.276616] bcache: bch_cached_dev_attach() Caching md5 as bcache0 on set 5bc072a8-ab17-446d-9744-e247949913c1
241                                                   241 
242                                                   242 
243 F) Remove or replace a caching device::           243 F) Remove or replace a caching device::
244                                                   244 
245         host:/sys/block/sda/sda7/bcache# echo     245         host:/sys/block/sda/sda7/bcache# echo 1 > detach
246         [  695.872542] bcache: cached_dev_deta    246         [  695.872542] bcache: cached_dev_detach_finish() Caching disabled for sda7
247                                                   247 
248         host:~# wipefs -a /dev/nvme0n1p4          248         host:~# wipefs -a /dev/nvme0n1p4
249         wipefs: error: /dev/nvme0n1p4: probing    249         wipefs: error: /dev/nvme0n1p4: probing initialization failed: Device or resource busy
250         Ooops, it's disabled, but not unregist    250         Ooops, it's disabled, but not unregistered, so it's still protected
251                                                   251 
252 We need to go and unregister it::                 252 We need to go and unregister it::
253                                                   253 
254         host:/sys/fs/bcache/b7ba27a1-2398-4649    254         host:/sys/fs/bcache/b7ba27a1-2398-4649-8ae3-0959f57ba128# ls -l cache0
255         lrwxrwxrwx 1 root root 0 Feb 25 18:33     255         lrwxrwxrwx 1 root root 0 Feb 25 18:33 cache0 -> ../../../devices/pci0000:00/0000:00:1d.0/0000:70:00.0/nvme/nvme0/nvme0n1/nvme0n1p4/bcache/
256         host:/sys/fs/bcache/b7ba27a1-2398-4649    256         host:/sys/fs/bcache/b7ba27a1-2398-4649-8ae3-0959f57ba128# echo 1 > stop
257         kernel: [  917.041908] bcache: cache_s    257         kernel: [  917.041908] bcache: cache_set_free() Cache set b7ba27a1-2398-4649-8ae3-0959f57ba128 unregistered
258                                                   258 
259 Now we can wipe it::                              259 Now we can wipe it::
260                                                   260 
261         host:~# wipefs -a /dev/nvme0n1p4          261         host:~# wipefs -a /dev/nvme0n1p4
262         /dev/nvme0n1p4: 16 bytes were erased a    262         /dev/nvme0n1p4: 16 bytes were erased at offset 0x00001018 (bcache): c6 85 73 f6 4e 1a 45 ca 82 65 f5 7f 48 ba 6d 81
263                                                   263 
264                                                   264 
265 G) dm-crypt and bcache                            265 G) dm-crypt and bcache
266                                                   266 
267 First setup bcache unencrypted and then instal    267 First setup bcache unencrypted and then install dmcrypt on top of
268 /dev/bcache<N> This will work faster than if y    268 /dev/bcache<N> This will work faster than if you dmcrypt both the backing
269 and caching devices and then install bcache on    269 and caching devices and then install bcache on top. [benchmarks?]
270                                                   270 
271                                                   271 
272 H) Stop/free a registered bcache to wipe and/o    272 H) Stop/free a registered bcache to wipe and/or recreate it
273                                                   273 
274 Suppose that you need to free up all bcache re    274 Suppose that you need to free up all bcache references so that you can
275 fdisk run and re-register a changed partition     275 fdisk run and re-register a changed partition table, which won't work
276 if there are any active backing or caching dev    276 if there are any active backing or caching devices left on it:
277                                                   277 
278 1) Is it present in /dev/bcache* ? (there are     278 1) Is it present in /dev/bcache* ? (there are times where it won't be)
279                                                   279 
280    If so, it's easy::                             280    If so, it's easy::
281                                                   281 
282         host:/sys/block/bcache0/bcache# echo 1    282         host:/sys/block/bcache0/bcache# echo 1 > stop
283                                                   283 
284 2) But if your backing device is gone, this wo    284 2) But if your backing device is gone, this won't work::
285                                                   285 
286         host:/sys/block/bcache0# cd bcache        286         host:/sys/block/bcache0# cd bcache
287         bash: cd: bcache: No such file or dire    287         bash: cd: bcache: No such file or directory
288                                                   288 
289    In this case, you may have to unregister th    289    In this case, you may have to unregister the dmcrypt block device that
290    references this bcache to free it up::         290    references this bcache to free it up::
291                                                   291 
292         host:~# dmsetup remove oldds1             292         host:~# dmsetup remove oldds1
293         bcache: bcache_device_free() bcache0 s    293         bcache: bcache_device_free() bcache0 stopped
294         bcache: cache_set_free() Cache set 5bc    294         bcache: cache_set_free() Cache set 5bc072a8-ab17-446d-9744-e247949913c1 unregistered
295                                                   295 
296    This causes the backing bcache to be remove    296    This causes the backing bcache to be removed from /sys/fs/bcache and
297    then it can be reused.  This would be true     297    then it can be reused.  This would be true of any block device stacking
298    where bcache is a lower device.                298    where bcache is a lower device.
299                                                   299 
300 3) In other cases, you can also look in /sys/f    300 3) In other cases, you can also look in /sys/fs/bcache/::
301                                                   301 
302         host:/sys/fs/bcache# ls -l */{cache?,b    302         host:/sys/fs/bcache# ls -l */{cache?,bdev?}
303         lrwxrwxrwx 1 root root 0 Mar  5 09:39     303         lrwxrwxrwx 1 root root 0 Mar  5 09:39 0226553a-37cf-41d5-b3ce-8b1e944543a8/bdev1 -> ../../../devices/virtual/block/dm-1/bcache/
304         lrwxrwxrwx 1 root root 0 Mar  5 09:39     304         lrwxrwxrwx 1 root root 0 Mar  5 09:39 0226553a-37cf-41d5-b3ce-8b1e944543a8/cache0 -> ../../../devices/virtual/block/dm-4/bcache/
305         lrwxrwxrwx 1 root root 0 Mar  5 09:39     305         lrwxrwxrwx 1 root root 0 Mar  5 09:39 5bc072a8-ab17-446d-9744-e247949913c1/cache0 -> ../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/ata10/host9/target9:0:0/9:0:0:0/block/sdl/sdl2/bcache/
306                                                   306 
307    The device names will show which UUID is re    307    The device names will show which UUID is relevant, cd in that directory
308    and stop the cache::                           308    and stop the cache::
309                                                   309 
310         host:/sys/fs/bcache/5bc072a8-ab17-446d    310         host:/sys/fs/bcache/5bc072a8-ab17-446d-9744-e247949913c1# echo 1 > stop
311                                                   311 
312    This will free up bcache references and let    312    This will free up bcache references and let you reuse the partition for
313    other purposes.                                313    other purposes.
314                                                   314 
315                                                   315 
316                                                   316 
317 Troubleshooting performance                       317 Troubleshooting performance
318 ---------------------------                       318 ---------------------------
319                                                   319 
320 Bcache has a bunch of config options and tunab    320 Bcache has a bunch of config options and tunables. The defaults are intended to
321 be reasonable for typical desktop and server w    321 be reasonable for typical desktop and server workloads, but they're not what you
322 want for getting the best possible numbers whe    322 want for getting the best possible numbers when benchmarking.
323                                                   323 
324  - Backing device alignment                       324  - Backing device alignment
325                                                   325 
326    The default metadata size in bcache is 8k.     326    The default metadata size in bcache is 8k.  If your backing device is
327    RAID based, then be sure to align this by a    327    RAID based, then be sure to align this by a multiple of your stride
328    width using `bcache make --data-offset`. If    328    width using `bcache make --data-offset`. If you intend to expand your
329    disk array in the future, then multiply a s    329    disk array in the future, then multiply a series of primes by your
330    raid stripe size to get the disk multiples     330    raid stripe size to get the disk multiples that you would like.
331                                                   331 
332    For example:  If you have a 64k stripe size    332    For example:  If you have a 64k stripe size, then the following offset
333    would provide alignment for many common RAI    333    would provide alignment for many common RAID5 data spindle counts::
334                                                   334 
335         64k * 2*2*2*3*3*5*7 bytes = 161280k       335         64k * 2*2*2*3*3*5*7 bytes = 161280k
336                                                   336 
337    That space is wasted, but for only 157.5MB     337    That space is wasted, but for only 157.5MB you can grow your RAID 5
338    volume to the following data-spindle counts    338    volume to the following data-spindle counts without re-aligning::
339                                                   339 
340         3,4,5,6,7,8,9,10,12,14,15,18,20,21 ...    340         3,4,5,6,7,8,9,10,12,14,15,18,20,21 ...
341                                                   341 
342  - Bad write performance                          342  - Bad write performance
343                                                   343 
344    If write performance is not what you expect    344    If write performance is not what you expected, you probably wanted to be
345    running in writeback mode, which isn't the     345    running in writeback mode, which isn't the default (not due to a lack of
346    maturity, but simply because in writeback m    346    maturity, but simply because in writeback mode you'll lose data if something
347    happens to your SSD)::                         347    happens to your SSD)::
348                                                   348 
349         # echo writeback > /sys/block/bcache0/    349         # echo writeback > /sys/block/bcache0/bcache/cache_mode
350                                                   350 
351  - Bad performance, or traffic not going to th    351  - Bad performance, or traffic not going to the SSD that you'd expect
352                                                   352 
353    By default, bcache doesn't cache everything    353    By default, bcache doesn't cache everything. It tries to skip sequential IO -
354    because you really want to be caching the r    354    because you really want to be caching the random IO, and if you copy a 10
355    gigabyte file you probably don't want that     355    gigabyte file you probably don't want that pushing 10 gigabytes of randomly
356    accessed data out of your cache.               356    accessed data out of your cache.
357                                                   357 
358    But if you want to benchmark reads from cac    358    But if you want to benchmark reads from cache, and you start out with fio
359    writing an 8 gigabyte test file - so you wa    359    writing an 8 gigabyte test file - so you want to disable that::
360                                                   360 
361         # echo 0 > /sys/block/bcache0/bcache/s    361         # echo 0 > /sys/block/bcache0/bcache/sequential_cutoff
362                                                   362 
363    To set it back to the default (4 mb), do::     363    To set it back to the default (4 mb), do::
364                                                   364 
365         # echo 4M > /sys/block/bcache0/bcache/    365         # echo 4M > /sys/block/bcache0/bcache/sequential_cutoff
366                                                   366 
367  - Traffic's still going to the spindle/still     367  - Traffic's still going to the spindle/still getting cache misses
368                                                   368 
369    In the real world, SSDs don't always keep u    369    In the real world, SSDs don't always keep up with disks - particularly with
370    slower SSDs, many disks being cached by one    370    slower SSDs, many disks being cached by one SSD, or mostly sequential IO. So
371    you want to avoid being bottlenecked by the    371    you want to avoid being bottlenecked by the SSD and having it slow everything
372    down.                                          372    down.
373                                                   373 
374    To avoid that bcache tracks latency to the     374    To avoid that bcache tracks latency to the cache device, and gradually
375    throttles traffic if the latency exceeds a     375    throttles traffic if the latency exceeds a threshold (it does this by
376    cranking down the sequential bypass).          376    cranking down the sequential bypass).
377                                                   377 
378    You can disable this if you need to by sett    378    You can disable this if you need to by setting the thresholds to 0::
379                                                   379 
380         # echo 0 > /sys/fs/bcache/<cache set>/    380         # echo 0 > /sys/fs/bcache/<cache set>/congested_read_threshold_us
381         # echo 0 > /sys/fs/bcache/<cache set>/    381         # echo 0 > /sys/fs/bcache/<cache set>/congested_write_threshold_us
382                                                   382 
383    The default is 2000 us (2 milliseconds) for    383    The default is 2000 us (2 milliseconds) for reads, and 20000 for writes.
384                                                   384 
385  - Still getting cache misses, of the same dat    385  - Still getting cache misses, of the same data
386                                                   386 
387    One last issue that sometimes trips people     387    One last issue that sometimes trips people up is actually an old bug, due to
388    the way cache coherency is handled for cach    388    the way cache coherency is handled for cache misses. If a btree node is full,
389    a cache miss won't be able to insert a key     389    a cache miss won't be able to insert a key for the new data and the data
390    won't be written to the cache.                 390    won't be written to the cache.
391                                                   391 
392    In practice this isn't an issue because as     392    In practice this isn't an issue because as soon as a write comes along it'll
393    cause the btree node to be split, and you n    393    cause the btree node to be split, and you need almost no write traffic for
394    this to not show up enough to be noticeable    394    this to not show up enough to be noticeable (especially since bcache's btree
395    nodes are huge and index large regions of t    395    nodes are huge and index large regions of the device). But when you're
396    benchmarking, if you're trying to warm the     396    benchmarking, if you're trying to warm the cache by reading a bunch of data
397    and there's no other traffic - that can be     397    and there's no other traffic - that can be a problem.
398                                                   398 
399    Solution: warm the cache by doing writes, o    399    Solution: warm the cache by doing writes, or use the testing branch (there's
400    a fix for the issue there).                    400    a fix for the issue there).
401                                                   401 
402                                                   402 
403 Sysfs - backing device                            403 Sysfs - backing device
404 ----------------------                            404 ----------------------
405                                                   405 
406 Available at /sys/block/<bdev>/bcache, /sys/bl    406 Available at /sys/block/<bdev>/bcache, /sys/block/bcache*/bcache and
407 (if attached) /sys/fs/bcache/<cset-uuid>/bdev*    407 (if attached) /sys/fs/bcache/<cset-uuid>/bdev*
408                                                   408 
409 attach                                            409 attach
410   Echo the UUID of a cache set to this file to    410   Echo the UUID of a cache set to this file to enable caching.
411                                                   411 
412 cache_mode                                        412 cache_mode
413   Can be one of either writethrough, writeback    413   Can be one of either writethrough, writeback, writearound or none.
414                                                   414 
415 clear_stats                                       415 clear_stats
416   Writing to this file resets the running tota    416   Writing to this file resets the running total stats (not the day/hour/5 minute
417   decaying versions).                             417   decaying versions).
418                                                   418 
419 detach                                            419 detach
420   Write to this file to detach from a cache se    420   Write to this file to detach from a cache set. If there is dirty data in the
421   cache, it will be flushed first.                421   cache, it will be flushed first.
422                                                   422 
423 dirty_data                                        423 dirty_data
424   Amount of dirty data for this backing device    424   Amount of dirty data for this backing device in the cache. Continuously
425   updated unlike the cache set's version, but     425   updated unlike the cache set's version, but may be slightly off.
426                                                   426 
427 label                                             427 label
428   Name of underlying device.                      428   Name of underlying device.
429                                                   429 
430 readahead                                         430 readahead
431   Size of readahead that should be performed.     431   Size of readahead that should be performed.  Defaults to 0.  If set to e.g.
432   1M, it will round cache miss reads up to tha    432   1M, it will round cache miss reads up to that size, but without overlapping
433   existing cache entries.                         433   existing cache entries.
434                                                   434 
435 running                                           435 running
436   1 if bcache is running (i.e. whether the /de    436   1 if bcache is running (i.e. whether the /dev/bcache device exists, whether
437   it's in passthrough mode or caching).           437   it's in passthrough mode or caching).
438                                                   438 
439 sequential_cutoff                                 439 sequential_cutoff
440   A sequential IO will bypass the cache once i    440   A sequential IO will bypass the cache once it passes this threshold; the
441   most recent 128 IOs are tracked so sequentia    441   most recent 128 IOs are tracked so sequential IO can be detected even when
442   it isn't all done at once.                      442   it isn't all done at once.
443                                                   443 
444 sequential_merge                                  444 sequential_merge
445   If non zero, bcache keeps a list of the last    445   If non zero, bcache keeps a list of the last 128 requests submitted to compare
446   against all new requests to determine which     446   against all new requests to determine which new requests are sequential
447   continuations of previous requests for the p    447   continuations of previous requests for the purpose of determining sequential
448   cutoff. This is necessary if the sequential     448   cutoff. This is necessary if the sequential cutoff value is greater than the
449   maximum acceptable sequential size for any s    449   maximum acceptable sequential size for any single request.
450                                                   450 
451 state                                             451 state
452   The backing device can be in one of four dif    452   The backing device can be in one of four different states:
453                                                   453 
454   no cache: Has never been attached to a cache    454   no cache: Has never been attached to a cache set.
455                                                   455 
456   clean: Part of a cache set, and there is no     456   clean: Part of a cache set, and there is no cached dirty data.
457                                                   457 
458   dirty: Part of a cache set, and there is cac    458   dirty: Part of a cache set, and there is cached dirty data.
459                                                   459 
460   inconsistent: The backing device was forcibl    460   inconsistent: The backing device was forcibly run by the user when there was
461   dirty data cached but the cache set was unav    461   dirty data cached but the cache set was unavailable; whatever data was on the
462   backing device has likely been corrupted.       462   backing device has likely been corrupted.
463                                                   463 
464 stop                                              464 stop
465   Write to this file to shut down the bcache d    465   Write to this file to shut down the bcache device and close the backing
466   device.                                         466   device.
467                                                   467 
468 writeback_delay                                   468 writeback_delay
469   When dirty data is written to the cache and     469   When dirty data is written to the cache and it previously did not contain
470   any, waits some number of seconds before ini    470   any, waits some number of seconds before initiating writeback. Defaults to
471   30.                                             471   30.
472                                                   472 
473 writeback_percent                                 473 writeback_percent
474   If nonzero, bcache tries to keep around this    474   If nonzero, bcache tries to keep around this percentage of the cache dirty by
475   throttling background writeback and using a     475   throttling background writeback and using a PD controller to smoothly adjust
476   the rate.                                       476   the rate.
477                                                   477 
478 writeback_rate                                    478 writeback_rate
479   Rate in sectors per second - if writeback_pe    479   Rate in sectors per second - if writeback_percent is nonzero, background
480   writeback is throttled to this rate. Continu    480   writeback is throttled to this rate. Continuously adjusted by bcache but may
481   also be set by the user.                        481   also be set by the user.
482                                                   482 
483 writeback_running                                 483 writeback_running
484   If off, writeback of dirty data will not tak    484   If off, writeback of dirty data will not take place at all. Dirty data will
485   still be added to the cache until it is most    485   still be added to the cache until it is mostly full; only meant for
486   benchmarking. Defaults to on.                   486   benchmarking. Defaults to on.
487                                                   487 
488 Sysfs - backing device stats                      488 Sysfs - backing device stats
489 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~                      489 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
490                                                   490 
491 There are directories with these numbers for a    491 There are directories with these numbers for a running total, as well as
492 versions that decay over the past day, hour an    492 versions that decay over the past day, hour and 5 minutes; they're also
493 aggregated in the cache set directory as well.    493 aggregated in the cache set directory as well.
494                                                   494 
495 bypassed                                          495 bypassed
496   Amount of IO (both reads and writes) that ha    496   Amount of IO (both reads and writes) that has bypassed the cache
497                                                   497 
498 cache_hits, cache_misses, cache_hit_ratio         498 cache_hits, cache_misses, cache_hit_ratio
499   Hits and misses are counted per individual I    499   Hits and misses are counted per individual IO as bcache sees them; a
500   partial hit is counted as a miss.               500   partial hit is counted as a miss.
501                                                   501 
502 cache_bypass_hits, cache_bypass_misses            502 cache_bypass_hits, cache_bypass_misses
503   Hits and misses for IO that is intended to s    503   Hits and misses for IO that is intended to skip the cache are still counted,
504   but broken out here.                            504   but broken out here.
505                                                   505 
506 cache_miss_collisions                             506 cache_miss_collisions
507   Counts instances where data was going to be     507   Counts instances where data was going to be inserted into the cache from a
508   cache miss, but raced with a write and data     508   cache miss, but raced with a write and data was already present (usually 0
509   since the synchronization for cache misses w    509   since the synchronization for cache misses was rewritten)
                                                   >> 510 
                                                   >> 511 cache_readaheads
                                                   >> 512   Count of times readahead occurred.
510                                                   513 
511 Sysfs - cache set                                 514 Sysfs - cache set
512 ~~~~~~~~~~~~~~~~~                                 515 ~~~~~~~~~~~~~~~~~
513                                                   516 
514 Available at /sys/fs/bcache/<cset-uuid>           517 Available at /sys/fs/bcache/<cset-uuid>
515                                                   518 
516 average_key_size                                  519 average_key_size
517   Average data per key in the btree.              520   Average data per key in the btree.
518                                                   521 
519 bdev<0..n>                                        522 bdev<0..n>
520   Symlink to each of the attached backing devi    523   Symlink to each of the attached backing devices.
521                                                   524 
522 block_size                                        525 block_size
523   Block size of the cache devices.                526   Block size of the cache devices.
524                                                   527 
525 btree_cache_size                                  528 btree_cache_size
526   Amount of memory currently used by the btree    529   Amount of memory currently used by the btree cache
527                                                   530 
528 bucket_size                                       531 bucket_size
529   Size of buckets                                 532   Size of buckets
530                                                   533 
531 cache<0..n>                                       534 cache<0..n>
532   Symlink to each of the cache devices compris    535   Symlink to each of the cache devices comprising this cache set.
533                                                   536 
534 cache_available_percent                           537 cache_available_percent
535   Percentage of cache device which doesn't con    538   Percentage of cache device which doesn't contain dirty data, and could
536   potentially be used for writeback.  This doe    539   potentially be used for writeback.  This doesn't mean this space isn't used
537   for clean cached data; the unused statistic     540   for clean cached data; the unused statistic (in priority_stats) is typically
538   much lower.                                     541   much lower.
539                                                   542 
540 clear_stats                                       543 clear_stats
541   Clears the statistics associated with this c    544   Clears the statistics associated with this cache
542                                                   545 
543 dirty_data                                        546 dirty_data
544   Amount of dirty data is in the cache (update    547   Amount of dirty data is in the cache (updated when garbage collection runs).
545                                                   548 
546 flash_vol_create                                  549 flash_vol_create
547   Echoing a size to this file (in human readab    550   Echoing a size to this file (in human readable units, k/M/G) creates a thinly
548   provisioned volume backed by the cache set.     551   provisioned volume backed by the cache set.
549                                                   552 
550 io_error_halflife, io_error_limit                 553 io_error_halflife, io_error_limit
551   These determines how many errors we accept b    554   These determines how many errors we accept before disabling the cache.
552   Each error is decayed by the half life (in #    555   Each error is decayed by the half life (in # ios).  If the decaying count
553   reaches io_error_limit dirty data is written    556   reaches io_error_limit dirty data is written out and the cache is disabled.
554                                                   557 
555 journal_delay_ms                                  558 journal_delay_ms
556   Journal writes will delay for up to this man    559   Journal writes will delay for up to this many milliseconds, unless a cache
557   flush happens sooner. Defaults to 100.          560   flush happens sooner. Defaults to 100.
558                                                   561 
559 root_usage_percent                                562 root_usage_percent
560   Percentage of the root btree node in use.  I    563   Percentage of the root btree node in use.  If this gets too high the node
561   will split, increasing the tree depth.          564   will split, increasing the tree depth.
562                                                   565 
563 stop                                              566 stop
564   Write to this file to shut down the cache se    567   Write to this file to shut down the cache set - waits until all attached
565   backing devices have been shut down.            568   backing devices have been shut down.
566                                                   569 
567 tree_depth                                        570 tree_depth
568   Depth of the btree (A single node btree has     571   Depth of the btree (A single node btree has depth 0).
569                                                   572 
570 unregister                                        573 unregister
571   Detaches all backing devices and closes the     574   Detaches all backing devices and closes the cache devices; if dirty data is
572   present it will disable writeback caching an    575   present it will disable writeback caching and wait for it to be flushed.
573                                                   576 
574 Sysfs - cache set internal                        577 Sysfs - cache set internal
575 ~~~~~~~~~~~~~~~~~~~~~~~~~~                        578 ~~~~~~~~~~~~~~~~~~~~~~~~~~
576                                                   579 
577 This directory also exposes timings for a numb    580 This directory also exposes timings for a number of internal operations, with
578 separate files for average duration, average f    581 separate files for average duration, average frequency, last occurrence and max
579 duration: garbage collection, btree read, btre    582 duration: garbage collection, btree read, btree node sorts and btree splits.
580                                                   583 
581 active_journal_entries                            584 active_journal_entries
582   Number of journal entries that are newer tha    585   Number of journal entries that are newer than the index.
583                                                   586 
584 btree_nodes                                       587 btree_nodes
585   Total nodes in the btree.                       588   Total nodes in the btree.
586                                                   589 
587 btree_used_percent                                590 btree_used_percent
588   Average fraction of btree in use.               591   Average fraction of btree in use.
589                                                   592 
590 bset_tree_stats                                   593 bset_tree_stats
591   Statistics about the auxiliary search trees     594   Statistics about the auxiliary search trees
592                                                   595 
593 btree_cache_max_chain                             596 btree_cache_max_chain
594   Longest chain in the btree node cache's hash    597   Longest chain in the btree node cache's hash table
595                                                   598 
596 cache_read_races                                  599 cache_read_races
597   Counts instances where while data was being     600   Counts instances where while data was being read from the cache, the bucket
598   was reused and invalidated - i.e. where the     601   was reused and invalidated - i.e. where the pointer was stale after the read
599   completed. When this occurs the data is rere    602   completed. When this occurs the data is reread from the backing device.
600                                                   603 
601 trigger_gc                                        604 trigger_gc
602   Writing to this file forces garbage collecti    605   Writing to this file forces garbage collection to run.
603                                                   606 
604 Sysfs - Cache device                              607 Sysfs - Cache device
605 ~~~~~~~~~~~~~~~~~~~~                              608 ~~~~~~~~~~~~~~~~~~~~
606                                                   609 
607 Available at /sys/block/<cdev>/bcache             610 Available at /sys/block/<cdev>/bcache
608                                                   611 
609 block_size                                        612 block_size
610   Minimum granularity of writes - should match    613   Minimum granularity of writes - should match hardware sector size.
611                                                   614 
612 btree_written                                     615 btree_written
613   Sum of all btree writes, in (kilo/mega/giga)    616   Sum of all btree writes, in (kilo/mega/giga) bytes
614                                                   617 
615 bucket_size                                       618 bucket_size
616   Size of buckets                                 619   Size of buckets
617                                                   620 
618 cache_replacement_policy                          621 cache_replacement_policy
619   One of either lru, fifo or random.              622   One of either lru, fifo or random.
620                                                   623 
621 discard                                           624 discard
622   Boolean; if on a discard/TRIM will be issued    625   Boolean; if on a discard/TRIM will be issued to each bucket before it is
623   reused. Defaults to off, since SATA TRIM is     626   reused. Defaults to off, since SATA TRIM is an unqueued command (and thus
624   slow).                                          627   slow).
625                                                   628 
626 freelist_percent                                  629 freelist_percent
627   Size of the freelist as a percentage of nbuc    630   Size of the freelist as a percentage of nbuckets. Can be written to to
628   increase the number of buckets kept on the f    631   increase the number of buckets kept on the freelist, which lets you
629   artificially reduce the size of the cache at    632   artificially reduce the size of the cache at runtime. Mostly for testing
630   purposes (i.e. testing how different size ca    633   purposes (i.e. testing how different size caches affect your hit rate), but
631   since buckets are discarded when they move o    634   since buckets are discarded when they move on to the freelist will also make
632   the SSD's garbage collection easier by effec    635   the SSD's garbage collection easier by effectively giving it more reserved
633   space.                                          636   space.
634                                                   637 
635 io_errors                                         638 io_errors
636   Number of errors that have occurred, decayed    639   Number of errors that have occurred, decayed by io_error_halflife.
637                                                   640 
638 metadata_written                                  641 metadata_written
639   Sum of all non data writes (btree writes and    642   Sum of all non data writes (btree writes and all other metadata).
640                                                   643 
641 nbuckets                                          644 nbuckets
642   Total buckets in this cache                     645   Total buckets in this cache
643                                                   646 
644 priority_stats                                    647 priority_stats
645   Statistics about how recently data in the ca    648   Statistics about how recently data in the cache has been accessed.
646   This can reveal your working set size.  Unus    649   This can reveal your working set size.  Unused is the percentage of
647   the cache that doesn't contain any data.  Me    650   the cache that doesn't contain any data.  Metadata is bcache's
648   metadata overhead.  Average is the average p    651   metadata overhead.  Average is the average priority of cache buckets.
649   Next is a list of quantiles with the priorit    652   Next is a list of quantiles with the priority threshold of each.
650                                                   653 
651 written                                           654 written
652   Sum of all data that has been written to the    655   Sum of all data that has been written to the cache; comparison with
653   btree_written gives the amount of write infl    656   btree_written gives the amount of write inflation in bcache.
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php