1 ======================================== 2 zram: Compressed RAM-based block devices 3 ======================================== 4 5 Introduction 6 ============ 7 8 The zram module creates RAM-based block device 9 (<id> = 0, 1, ...). Pages written to these dis 10 in memory itself. These disks allow very fast 11 good amounts of memory savings. Some of the us 12 use as swap disks, various caches under /var a 13 14 Statistics for individual zram devices are exp 15 /sys/block/zram<id>/ 16 17 Usage 18 ===== 19 20 There are several ways to configure and manage 21 22 a) using zram and zram_control sysfs attribute 23 b) using zramctl utility, provided by util-lin 24 25 In this document we will describe only 'manual 26 IOW, zram and zram_control sysfs attributes. 27 28 In order to get a better idea about zramctl pl 29 documentation, zramctl man-page or `zramctl -- 30 that zram maintainers do not develop/maintain 31 you have any questions please contact util-lin 32 33 Following shows a typical sequence of steps fo 34 35 WARNING 36 ======= 37 38 For the sake of simplicity we skip error check 39 examples below. However, it is your sole respo 40 41 zram sysfs attributes always return negative v 42 The list of possible return codes: 43 44 ======== ==================================== 45 -EBUSY an attempt to modify an attribute th 46 the device has been initialised. Ple 47 -ENOMEM zram was not able to allocate enough 48 needs. 49 -EINVAL invalid input has been provided. 50 ======== ==================================== 51 52 If you use 'echo', the returned value is set b 53 and, in general case, something like:: 54 55 echo 3 > /sys/block/zram0/max_comp_str 56 if [ $? -ne 0 ]; then 57 handle_error 58 fi 59 60 should suffice. 61 62 1) Load Module 63 ============== 64 65 :: 66 67 modprobe zram num_devices=4 68 69 This creates 4 devices: /dev/zram{0,1,2,3} 70 71 num_devices parameter is optional and tells zr 72 pre-created. Default: 1. 73 74 2) Set max number of compression streams 75 ======================================== 76 77 Regardless of the value passed to this attribu 78 allocate multiple compression streams - one pe 79 allowing several concurrent compression operat 80 allocated compression streams goes down when s 81 become offline. There is no single-compression 82 unless you are running a UP system or have onl 83 84 To find out how many streams are currently ava 85 86 cat /sys/block/zram0/max_comp_streams 87 88 3) Select compression algorithm 89 =============================== 90 91 Using comp_algorithm device attribute one can 92 currently selected (shown in square brackets) 93 or change the selected compression algorithm ( 94 there is no way to change compression algorith 95 96 Examples:: 97 98 #show supported compression algorithms 99 cat /sys/block/zram0/comp_algorithm 100 lzo [lz4] 101 102 #select lzo compression algorithm 103 echo lzo > /sys/block/zram0/comp_algor 104 105 For the time being, the `comp_algorithm` conte 106 algorithms that are supported by zram. 107 108 4) Set compression algorithm parameters: Optio 109 ============================================== 110 111 Compression algorithms may support specific pa 112 tweaked for particular dataset. ZRAM has an `a 113 attribute which provides a per-algorithm param 114 115 For example, several compression algorithms su 116 In addition, certain compression algorithms su 117 which significantly change algorithms' charact 118 compression algorithm to use external pre-trai 119 path to the `dict` along with other parameters 120 121 #pass path to pre-trained zstd diction 122 echo "algo=zstd dict=/etc/dictioary" > 123 124 #same, but using algorithm priority 125 echo "priority=1 dict=/etc/dictioary" 126 /sys/block/zram0/algorithm_par 127 128 #pass path to pre-trained zstd diction 129 echo "algo=zstd level=8 dict=/etc/dict 130 /sys/block/zram0/algorithm_par 131 132 Parameters are algorithm specific: not all alg 133 dictionaries, not all algorithms support `leve 134 algorithms `level` controls the compression le 135 better the compression ratio, it even can take 136 algorithms), for other algorithms `level` is a 137 the value the lower the compression ratio). 138 139 5) Set Disksize 140 =============== 141 142 Set disk size by writing the value to sysfs no 143 The value can be either in bytes or you can us 144 Examples:: 145 146 # Initialize /dev/zram0 with 50MB disk 147 echo $((50*1024*1024)) > /sys/block/zr 148 149 # Using mem suffixes 150 echo 256K > /sys/block/zram0/disksize 151 echo 512M > /sys/block/zram0/disksize 152 echo 1G > /sys/block/zram0/disksize 153 154 Note: 155 There is little point creating a zram of great 156 since we expect a 2:1 compression ratio. Note 157 size of the disk when not in use so a huge zra 158 159 6) Set memory limit: Optional 160 ============================= 161 162 Set memory limit by writing the value to sysfs 163 The value can be either in bytes or you can us 164 In addition, you could change the value in run 165 Examples:: 166 167 # limit /dev/zram0 with 50MB memory 168 echo $((50*1024*1024)) > /sys/block/zr 169 170 # Using mem suffixes 171 echo 256K > /sys/block/zram0/mem_limit 172 echo 512M > /sys/block/zram0/mem_limit 173 echo 1G > /sys/block/zram0/mem_limit 174 175 # To disable memory limit 176 echo 0 > /sys/block/zram0/mem_limit 177 178 7) Activate 179 =========== 180 181 :: 182 183 mkswap /dev/zram0 184 swapon /dev/zram0 185 186 mkfs.ext4 /dev/zram1 187 mount /dev/zram1 /tmp 188 189 8) Add/remove zram devices 190 ========================== 191 192 zram provides a control interface, which enabl 193 addition and removal. 194 195 In order to add a new /dev/zramX device, perfo 196 attribute. This will return either the new dev 197 can use /dev/zram<id>) or an error code. 198 199 Example:: 200 201 cat /sys/class/zram-control/hot_add 202 1 203 204 To remove the existing /dev/zramX device (wher 205 execute:: 206 207 echo X > /sys/class/zram-control/hot_r 208 209 9) Stats 210 ======== 211 212 Per-device statistics are exported as various 213 214 A brief description of exported device attribu 215 please read Documentation/ABI/testing/sysfs-bl 216 217 ====================== ====== ============== 218 Name access desc 219 ====================== ====== ============== 220 disksize RW show and set t 221 initstate RO shows the init 222 reset WO trigger device 223 mem_used_max WO reset the `mem 224 mem_limit WO specifies the 225 use to store t 226 writeback_limit WO specifies the 227 can write out 228 writeback_limit_enable RW show and set w 229 max_comp_streams RW the number of 230 operations 231 comp_algorithm RW show and chang 232 algorithm_params WO setup compress 233 compact WO trigger memory 234 debug_stat RO this file is u 235 backing_dev RW set up backend 236 idle WO mark allocated 237 ====================== ====== ============== 238 239 240 User space is advised to use the following fil 241 242 File /sys/block/zram<id>/stat 243 244 Represents block layer statistics. Read Docume 245 details. 246 247 File /sys/block/zram<id>/io_stat 248 249 The stat file represents device's I/O statisti 250 layer and, thus, not available in zram<id>/sta 251 single line of text and contains the following 252 whitespace: 253 254 ============= ============================ 255 failed_reads The number of failed reads 256 failed_writes The number of failed writes 257 invalid_io The number of non-page-size- 258 notify_free Depending on device usage sc 259 260 a) the number of pages freed 261 notifications 262 b) the number of pages freed 263 REQ_OP_DISCARD requests s 264 sent to a swap block devi 265 which implies that this d 266 267 The latter ones are sent by 268 discard option, whenever som 269 discarded. 270 ============= ============================ 271 272 File /sys/block/zram<id>/mm_stat 273 274 The mm_stat file represents the device's mm st 275 line of text and contains the following stats 276 277 ================ ============================ 278 orig_data_size uncompressed size of data st 279 Unit: bytes 280 compr_data_size compressed size of data stor 281 mem_used_total the amount of memory allocat 282 includes allocator fragmenta 283 allocated for this disk. So, 284 can be calculated using comp 285 Unit: bytes 286 mem_limit the maximum amount of memory 287 the compressed data 288 mem_used_max the maximum amount of memory 289 store the data 290 same_pages the number of same element f 291 No memory is allocated for s 292 pages_compacted the number of pages freed du 293 huge_pages the number of incompressible 294 huge_pages_since the number of incompressible 295 ================ ============================ 296 297 File /sys/block/zram<id>/bd_stat 298 299 The bd_stat file represents a device's backing 300 a single line of text and contains the followi 301 302 ============== ============================== 303 bd_count size of data written in backin 304 Unit: 4K bytes 305 bd_reads the number of reads from backi 306 Unit: 4K bytes 307 bd_writes the number of writes to backin 308 Unit: 4K bytes 309 ============== ============================== 310 311 10) Deactivate 312 ============== 313 314 :: 315 316 swapoff /dev/zram0 317 umount /dev/zram1 318 319 11) Reset 320 ========= 321 322 Write any positive value to 'reset' sy 323 324 echo 1 > /sys/block/zram0/rese 325 echo 1 > /sys/block/zram1/rese 326 327 This frees all the memory allocated fo 328 resets the disksize to zero. You must 329 before reusing the device. 330 331 Optional Feature 332 ================ 333 334 writeback 335 --------- 336 337 With CONFIG_ZRAM_WRITEBACK, zram can write idl 338 to backing storage rather than keeping it in m 339 To use the feature, admin should set up backin 340 341 echo /dev/sda5 > /sys/block/zramX/back 342 343 before disksize setting. It supports only part 344 If admin wants to use incompressible page writ 345 346 echo huge > /sys/block/zramX/writeback 347 348 To use idle page writeback, first, user need t 349 as idle:: 350 351 echo all > /sys/block/zramX/idle 352 353 From now on, any pages on zram are idle pages. 354 will be removed until someone requests access 355 IOW, unless there is access request, those pag 356 Additionally, when CONFIG_ZRAM_TRACK_ENTRY_ACT 357 marked as idle based on how long (in seconds) 358 last accessed:: 359 360 echo 86400 > /sys/block/zramX/idle 361 362 In this example all pages which haven't been a 363 seconds (one day) will be marked idle. 364 365 Admin can request writeback of those idle page 366 367 echo idle > /sys/block/zramX/writeback 368 369 With the command, zram will writeback idle pag 370 371 Additionally, if a user choose to writeback on 372 this can be accomplished with:: 373 374 echo huge_idle > /sys/block/zramX/writ 375 376 If a user chooses to writeback only incompress 377 algorithms can compress) this can be accomplis 378 379 echo incompressible > /sys/block/zramX 380 381 If an admin wants to write a specific page in 382 they could write a page index into the interfa 383 384 echo "page_index=1251" > /sys/block/zr 385 386 If there are lots of write IO with flash devic 387 flash wearout problem so that admin needs to d 388 to guarantee storage health for entire product 389 390 To overcome the concern, zram supports "writeb 391 The "writeback_limit_enable"'s default value i 392 any writeback. IOW, if admin wants to apply wr 393 enable writeback_limit_enable via:: 394 395 $ echo 1 > /sys/block/zramX/writeback_ 396 397 Once writeback_limit_enable is set, zram doesn 398 until admin sets the budget via /sys/block/zra 399 400 (If admin doesn't enable writeback_limit_enabl 401 assigned via /sys/block/zramX/writeback_limit 402 403 If admin wants to limit writeback as per-day 4 404 like below:: 405 406 $ MB_SHIFT=20 407 $ 4K_SHIFT=12 408 $ echo $((400<<MB_SHIFT>>4K_SHIFT)) > 409 /sys/block/zram0/writeback_lim 410 $ echo 1 > /sys/block/zram0/writeback_ 411 412 If admins want to allow further write again on 413 they could do it like below:: 414 415 $ echo $((400<<MB_SHIFT>>4K_SHIFT)) > 416 /sys/block/zram0/writeback_lim 417 418 If an admin wants to see the remaining writeba 419 420 $ cat /sys/block/zramX/writeback_limit 421 422 If an admin wants to disable writeback limit, 423 424 $ echo 0 > /sys/block/zramX/writeback_ 425 426 The writeback_limit count will reset whenever 427 system reboot, echo 1 > /sys/block/zramX/reset 428 writeback happened until you reset the zram to 429 budget in next setting is user's job. 430 431 If admin wants to measure writeback count in a 432 know it via /sys/block/zram0/bd_stat's 3rd col 433 434 recompression 435 ------------- 436 437 With CONFIG_ZRAM_MULTI_COMP, zram can recompre 438 (secondary) compression algorithms. The basic 439 compression algorithm can provide better compr 440 (potentially) slower compression/decompression 441 algorithm can, for example, be more successful 442 that default algorithm failed to compress). An 443 recompression - pages that are cold and sit in 444 using more effective algorithm and, hence, red 445 446 With CONFIG_ZRAM_MULTI_COMP, zram supports up 447 one primary and up to 3 secondary ones. Primar 448 in "3) Select compression algorithm", secondar 449 using recomp_algorithm device attribute. 450 451 Example::: 452 453 #show supported recompression algorith 454 cat /sys/block/zramX/recomp_algorithm 455 #1: lzo lzo-rle lz4 lz4hc [zstd] 456 #2: lzo lzo-rle lz4 [lz4hc] zstd 457 458 Alternative compression algorithms are sorted 459 above, zstd is used as the first alternative a 460 of 1, while lz4hc is configured as a compressi 461 Alternative compression algorithm's priority i 462 configuration::: 463 464 #select zstd recompression algorithm, 465 echo "algo=zstd priority=1" > /sys/blo 466 467 #select deflate recompression algorith 468 echo "algo=deflate priority=2" > /sys/ 469 470 Another device attribute that CONFIG_ZRAM_MULT 471 which controls recompression. 472 473 Examples::: 474 475 #IDLE pages recompression is activated 476 echo "type=idle" > /sys/block/zramX/re 477 478 #HUGE pages recompression is activated 479 echo "type=huge" > /sys/block/zram0/re 480 481 #HUGE_IDLE pages recompression is acti 482 echo "type=huge_idle" > /sys/block/zra 483 484 The number of idle pages can be significant, s 485 threshold (in bytes) to the recompress knob: z 486 of equal or greater size::: 487 488 #recompress all pages larger than 3000 489 echo "threshold=3000" > /sys/block/zra 490 491 #recompress idle pages larger than 200 492 echo "type=idle threshold=2000" > /sys 493 494 It is also possible to limit the number of pag 495 attempt to recompress::: 496 497 echo "type=huge_idle max_pages=42" > / 498 499 Recompression of idle pages requires memory tr 500 501 During re-compression for every page, that mat 502 ZRAM iterates the list of registered alternati 503 order of their priorities. ZRAM stops either w 504 successful (re-compressed object is smaller in 505 and matches re-compression criteria (e.g. size 506 no secondary algorithms left to try. If none o 507 successfully re-compressed the page such a pag 508 so ZRAM will not attempt to re-compress it in 509 510 This re-compression behaviour, when it iterate 511 registered compression algorithms, increases o 512 algorithm that successfully compresses a parti 513 it is convenient (and sometimes even necessary 514 only one particular algorithm so that it will 515 This can be achieved by providing a `algo` or 516 517 #use zstd algorithm only (if registere 518 echo "type=huge algo=zstd" > /sys/bloc 519 520 #use zstd algorithm only (if zstd was 521 echo "type=huge priority=1" > /sys/blo 522 523 memory tracking 524 =============== 525 526 With CONFIG_ZRAM_MEMORY_TRACKING, user can kno 527 zram block. It could be useful to catch cold o 528 pages of the process with*pagemap. 529 530 If you enable the feature, you could see block 531 /sys/kernel/debug/zram/zram0/block_state". The 532 533 300 75.033841 .wh... 534 301 63.806904 s..... 535 302 63.806919 ..hi.. 536 303 62.801919 ....r. 537 304 146.781902 ..hi.n 538 539 First column 540 zram's block index. 541 Second column 542 access time since the system was boote 543 Third column 544 state of the block: 545 546 s: 547 same page 548 w: 549 written page to backing store 550 h: 551 huge page 552 i: 553 idle page 554 r: 555 recompressed page (secondary c 556 n: 557 none (including secondary) of 558 559 First line of above example says 300th block i 560 and the block's state is huge so it is written 561 storage. It's a debugging feature so anyone sh 562 properly. 563 564 Nitin Gupta 565 ngupta@vflare.org
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.