1 =============================== 2 Documentation for /proc/sys/vm/ 3 =============================== 4 5 kernel version 2.6.29 6 7 Copyright (c) 1998, 1999, Rik van Riel <riel@n 8 9 Copyright (c) 2008 Peter W. Morreale <p 10 11 For general info and legal blurb, please look 12 13 ---------------------------------------------- 14 15 This file contains the documentation for the s 16 /proc/sys/vm and is valid for Linux kernel ver 17 18 The files in this directory can be used to tun 19 of the virtual memory (VM) subsystem of the Li 20 the writeout of dirty data to disk. 21 22 Default values and initialization routines for 23 files can be found in mm/swap.c. 24 25 Currently, these files are in /proc/sys/vm: 26 27 - admin_reserve_kbytes 28 - compact_memory 29 - compaction_proactiveness 30 - compact_unevictable_allowed 31 - dirty_background_bytes 32 - dirty_background_ratio 33 - dirty_bytes 34 - dirty_expire_centisecs 35 - dirty_ratio 36 - dirtytime_expire_seconds 37 - dirty_writeback_centisecs 38 - drop_caches 39 - enable_soft_offline 40 - extfrag_threshold 41 - highmem_is_dirtyable 42 - hugetlb_shm_group 43 - laptop_mode 44 - legacy_va_layout 45 - lowmem_reserve_ratio 46 - max_map_count 47 - mem_profiling (only if CONFIG_MEM_AL 48 - memory_failure_early_kill 49 - memory_failure_recovery 50 - min_free_kbytes 51 - min_slab_ratio 52 - min_unmapped_ratio 53 - mmap_min_addr 54 - mmap_rnd_bits 55 - mmap_rnd_compat_bits 56 - nr_hugepages 57 - nr_hugepages_mempolicy 58 - nr_overcommit_hugepages 59 - nr_trim_pages (only if CONFIG_MMU=n) 60 - numa_zonelist_order 61 - oom_dump_tasks 62 - oom_kill_allocating_task 63 - overcommit_kbytes 64 - overcommit_memory 65 - overcommit_ratio 66 - page-cluster 67 - page_lock_unfairness 68 - panic_on_oom 69 - percpu_pagelist_high_fraction 70 - stat_interval 71 - stat_refresh 72 - numa_stat 73 - swappiness 74 - unprivileged_userfaultfd 75 - user_reserve_kbytes 76 - vfs_cache_pressure 77 - watermark_boost_factor 78 - watermark_scale_factor 79 - zone_reclaim_mode 80 81 82 admin_reserve_kbytes 83 ==================== 84 85 The amount of free memory in the system that s 86 with the capability cap_sys_admin. 87 88 admin_reserve_kbytes defaults to min(3% of fre 89 90 That should provide enough for the admin to lo 91 if necessary, under the default overcommit 'gu 92 93 Systems running under overcommit 'never' shoul 94 for the full Virtual Memory Size of programs u 95 root may not be able to log in to recover the 96 97 How do you calculate a minimum useful reserve? 98 99 sshd or login + bash (or some other shell) + t 100 101 For overcommit 'guess', we can sum resident se 102 On x86_64 this is about 8MB. 103 104 For overcommit 'never', we can take the max of 105 and add the sum of their RSS. 106 On x86_64 this is about 128MB. 107 108 Changing this takes effect whenever an applica 109 110 111 compact_memory 112 ============== 113 114 Available only when CONFIG_COMPACTION is set. 115 all zones are compacted such that free memory 116 blocks where possible. This can be important f 117 huge pages although processes will also direct 118 119 compaction_proactiveness 120 ======================== 121 122 This tunable takes a value in the range [0, 10 123 20. This tunable determines how aggressively c 124 background. Write of a non zero value to this 125 trigger the proactive compaction. Setting it t 126 127 Note that compaction has a non-trivial system- 128 belonging to different processes are moved aro 129 to latency spikes in unsuspecting applications 130 various heuristics to avoid wasting CPU cycles 131 proactive compaction is not being effective. 132 133 Be careful when setting it to extreme values l 134 cause excessive background compaction activity 135 136 compact_unevictable_allowed 137 =========================== 138 139 Available only when CONFIG_COMPACTION is set. 140 allowed to examine the unevictable lru (mlocke 141 This should be used on systems where stalls fo 142 acceptable trade for large contiguous free mem 143 compaction from moving pages that are unevicta 144 On CONFIG_PREEMPT_RT the default value is 0 in 145 to compaction, which would block the task from 146 is resolved. 147 148 149 dirty_background_bytes 150 ====================== 151 152 Contains the amount of dirty memory at which t 153 flusher threads will start writeback. 154 155 Note: 156 dirty_background_bytes is the counterpart of 157 one of them may be specified at a time. When 158 immediately taken into account to evaluate t 159 other appears as 0 when read. 160 161 162 dirty_background_ratio 163 ====================== 164 165 Contains, as a percentage of total available m 166 and reclaimable pages, the number of pages at 167 flusher threads will start writing out dirty d 168 169 The total available memory is not equal to tot 170 171 172 dirty_bytes 173 =========== 174 175 Contains the amount of dirty memory at which a 176 will itself start writeback. 177 178 Note: dirty_bytes is the counterpart of dirty_ 179 specified at a time. When one sysctl is writte 180 account to evaluate the dirty memory limits an 181 read. 182 183 Note: the minimum value allowed for dirty_byte 184 value lower than this limit will be ignored an 185 retained. 186 187 188 dirty_expire_centisecs 189 ====================== 190 191 This tunable is used to define when dirty data 192 for writeout by the kernel flusher threads. I 193 of a second. Data which has been dirty in-mem 194 interval will be written out next time a flush 195 196 197 dirty_ratio 198 =========== 199 200 Contains, as a percentage of total available m 201 and reclaimable pages, the number of pages at 202 generating disk writes will itself start writi 203 204 The total available memory is not equal to tot 205 206 207 dirtytime_expire_seconds 208 ======================== 209 210 When a lazytime inode is constantly having its 211 an updated timestamp will never get chance to 212 only thing that has happened on the file syste 213 by an atime update, a worker will be scheduled 214 eventually gets pushed out to disk. This tuna 215 inode is old enough to be eligible for writeba 216 And, it is also used as the interval to wakeup 217 218 219 dirty_writeback_centisecs 220 ========================= 221 222 The kernel flusher threads will periodically w 223 out to disk. This tunable expresses the inter 224 100'ths of a second. 225 226 Setting this to zero disables periodic writeba 227 228 229 drop_caches 230 =========== 231 232 Writing to this will cause the kernel to drop 233 reclaimable slab objects like dentries and ino 234 memory becomes free. 235 236 To free pagecache:: 237 238 echo 1 > /proc/sys/vm/drop_caches 239 240 To free reclaimable slab objects (includes den 241 242 echo 2 > /proc/sys/vm/drop_caches 243 244 To free slab objects and pagecache:: 245 246 echo 3 > /proc/sys/vm/drop_caches 247 248 This is a non-destructive operation and will n 249 To increase the number of objects freed by thi 250 `sync` prior to writing to /proc/sys/vm/drop_c 251 number of dirty objects on the system and crea 252 dropped. 253 254 This file is not a means to control the growth 255 (inodes, dentries, pagecache, etc...) These o 256 reclaimed by the kernel when memory is needed 257 258 Use of this file can cause performance problem 259 objects, it may cost a significant amount of I 260 dropped objects, especially if they were under 261 use outside of a testing or debugging environm 262 263 You may see informational messages in your ker 264 used:: 265 266 cat (1234): drop_caches: 3 267 268 These are informational only. They do not mea 269 with your system. To disable them, echo 4 (bi 270 271 enable_soft_offline 272 =================== 273 Correctable memory errors are very common on s 274 solution for memory pages having (excessive) c 275 276 For different types of page, soft-offline has 277 278 - For a raw error page, soft-offline migrates 279 a new raw page. 280 281 - For a page that is part of a transparent hug 282 transparent hugepage into raw pages, then mi 283 As a result, user is transparently backed by 284 memory access performance. 285 286 - For a page that is part of a HugeTLB hugepag 287 the entire HugeTLB hugepage, during which a 288 as migration target. Then the original huge 289 pages without compensation, reducing the cap 290 291 It is user's call to choose between reliabilit 292 physical memory) vs performance / capacity imp 293 HugeTLB cases. 294 295 For all architectures, enable_soft_offline con 296 memory pages. When set to 1, kernel attempts 297 whenever it thinks needed. When set to 0, ker 298 the request to soft offline the pages. Its de 299 300 It is worth mentioning that after setting enab 301 following requests to soft offline pages will 302 303 - Request to soft offline pages from RAS Corre 304 305 - On ARM, the request to soft offline pages fr 306 307 - On PARISC, the request to soft offline pages 308 309 extfrag_threshold 310 ================= 311 312 This parameter affects whether the kernel will 313 reclaim to satisfy a high-order allocation. Th 314 debugfs shows what the fragmentation index for 315 the system. Values tending towards 0 imply all 316 of memory, values towards 1000 imply failures 317 implies that the allocation will succeed as lo 318 319 The kernel will not compact memory in a zone i 320 fragmentation index is <= extfrag_threshold. T 321 322 323 highmem_is_dirtyable 324 ==================== 325 326 Available only for systems with CONFIG_HIGHMEM 327 328 This parameter controls whether the high memor 329 writers throttling. This is not the case by d 330 only the amount of memory directly visible/usa 331 be dirtied. As a result, on systems with a lar 332 lowmem basically depleted writers might be thr 333 streaming writes can get very slow. 334 335 Changing the value to non zero would allow mor 336 and thus allow writers to write more data whic 337 storage more effectively. Note this also comes 338 OOM killer because some writers (e.g. direct b 339 only use the low memory and they can fill it u 340 any throttling. 341 342 343 hugetlb_shm_group 344 ================= 345 346 hugetlb_shm_group contains group id that is al 347 shared memory segment using hugetlb page. 348 349 350 laptop_mode 351 =========== 352 353 laptop_mode is a knob that controls "laptop mo 354 controlled by this knob are discussed in Docum 355 356 357 legacy_va_layout 358 ================ 359 360 If non-zero, this sysctl disables the new 32-b 361 will use the legacy (2.4) layout for all proce 362 363 364 lowmem_reserve_ratio 365 ==================== 366 367 For some specialised workloads on highmem mach 368 the kernel to allow process memory to be alloc 369 zone. This is because that memory could then 370 system call, or by unavailability of swapspace 371 372 And on large highmem machines this lack of rec 373 can be fatal. 374 375 So the Linux page allocator has a mechanism wh 376 which *could* use highmem from using too much 377 a certain amount of lowmem is defended from th 378 captured into pinned user memory. 379 380 (The same argument applies to the old 16 megab 381 mechanism will also defend that region from al 382 highmem or lowmem). 383 384 The `lowmem_reserve_ratio` tunable determines 385 in defending these lower zones. 386 387 If you have a machine which uses highmem or IS 388 applications are using mlock(), or if you are 389 you probably should change the lowmem_reserve_ 390 391 The lowmem_reserve_ratio is an array. You can 392 393 % cat /proc/sys/vm/lowmem_reserve_rati 394 256 256 32 395 396 But, these values are not used directly. The k 397 pages for each zones from them. These are show 398 in /proc/zoneinfo like the following. (This is 399 Each zone has an array of protection pages lik 400 401 Node 0, zone DMA 402 pages free 1355 403 min 3 404 low 3 405 high 4 406 : 407 : 408 numa_other 0 409 protection: (0, 2004, 2004, 2004) 410 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 411 pagesets 412 cpu: 0 pcp: 0 413 : 414 415 These protections are added to score to judge 416 for page allocation or should be reclaimed. 417 418 In this example, if normal pages (index=2) are 419 watermark[WMARK_HIGH] is used for watermark, t 420 not be used because pages_free(1355) is smalle 421 (4 + 2004 = 2008). If this protection value is 422 normal page requirement. If requirement is DMA 423 (=0) is used. 424 425 zone[i]'s protection[j] is calculated by follo 426 427 (i < j): 428 zone[i]->protection[j] 429 = (total sums of managed_pages from zone[i 430 / lowmem_reserve_ratio[i]; 431 (i = j): 432 (should not be protected. = 0; 433 (i > j): 434 (not necessary, but looks 0) 435 436 The default values of lowmem_reserve_ratio[i] 437 438 === ==================================== 439 256 (if zone[i] means DMA or DMA32 zone) 440 32 (others) 441 === ==================================== 442 443 As above expression, they are reciprocal numbe 444 256 means 1/256. # of protection pages becomes 445 pages of higher zones on the node. 446 447 If you would like to protect more pages, small 448 The minimum value is 1 (1/1 -> 100%). The valu 449 disables protection of the pages. 450 451 452 max_map_count: 453 ============== 454 455 This file contains the maximum number of memor 456 may have. Memory map areas are used as a side- 457 malloc, directly by mmap, mprotect, and madvis 458 shared libraries. 459 460 While most applications need less than a thous 461 programs, particularly malloc debuggers, may c 462 e.g., up to one or two maps per allocation. 463 464 The default value is 65530. 465 466 467 mem_profiling 468 ============== 469 470 Enable memory profiling (when CONFIG_MEM_ALLOC 471 472 1: Enable memory profiling. 473 474 0: Disable memory profiling. 475 476 Enabling memory profiling introduces a small p 477 memory allocations. 478 479 The default value depends on CONFIG_MEM_ALLOC_ 480 481 482 memory_failure_early_kill: 483 ========================== 484 485 Control how to kill processes when uncorrected 486 a 2bit error in a memory module) is detected i 487 that cannot be handled by the kernel. In some 488 still having a valid copy on disk) the kernel 489 transparently without affecting any applicatio 490 no other up-to-date copy of the data it will k 491 corruptions from propagating. 492 493 1: Kill all processes that have the corrupted 494 as soon as the corruption is detected. Note t 495 for a few types of pages, like kernel internal 496 the swap cache, but works for the majority of 497 498 0: Only unmap the corrupted page from all proc 499 who tries to access it. 500 501 The kill is done using a catchable SIGBUS with 502 handle this if they want to. 503 504 This is only active on architectures/platforms 505 check handling and depends on the hardware cap 506 507 Applications can override this setting individ 508 509 510 memory_failure_recovery 511 ======================= 512 513 Enable memory failure recovery (when supported 514 515 1: Attempt recovery. 516 517 0: Always panic on a memory failure. 518 519 520 min_free_kbytes 521 =============== 522 523 This is used to force the Linux VM to keep a m 524 of kilobytes free. The VM uses this number to 525 watermark[WMARK_MIN] value for each lowmem zon 526 Each lowmem zone gets a number of reserved fre 527 proportionally on its size. 528 529 Some minimal amount of memory is needed to sat 530 allocations; if you set this to lower than 102 531 become subtly broken, and prone to deadlock un 532 533 Setting this too high will OOM your machine in 534 535 536 min_slab_ratio 537 ============== 538 539 This is available only on NUMA kernels. 540 541 A percentage of the total pages in each zone. 542 (fallback from the local zone occurs) slabs wi 543 than this percentage of pages in a zone are re 544 This insures that the slab growth stays under 545 systems that rarely perform global reclaim. 546 547 The default is 5 percent. 548 549 Note that slab reclaim is triggered in a per z 550 The process of reclaiming slab memory is curre 551 and may not be fast. 552 553 554 min_unmapped_ratio 555 ================== 556 557 This is available only on NUMA kernels. 558 559 This is a percentage of the total pages in eac 560 only occur if more than this percentage of pag 561 zone_reclaim_mode allows to be reclaimed. 562 563 If zone_reclaim_mode has the value 4 OR'd, the 564 against all file-backed unmapped pages includi 565 files. Otherwise, only unmapped pages backed b 566 files and similar are considered. 567 568 The default is 1 percent. 569 570 571 mmap_min_addr 572 ============= 573 574 This file indicates the amount of address spac 575 be restricted from mmapping. Since kernel nul 576 accidentally operate based on the information 577 of memory userspace processes should not be al 578 default this value is set to 0 and no protecti 579 security module. Setting this value to someth 580 vast majority of applications to work correctl 581 against future potential kernel bugs. 582 583 584 mmap_rnd_bits 585 ============= 586 587 This value can be used to select the number of 588 determine the random offset to the base addres 589 resulting from mmap allocations on architectur 590 tuning address space randomization. This valu 591 by the architecture's minimum and maximum supp 592 593 This value can be changed after boot using the 594 /proc/sys/vm/mmap_rnd_bits tunable 595 596 597 mmap_rnd_compat_bits 598 ==================== 599 600 This value can be used to select the number of 601 determine the random offset to the base addres 602 resulting from mmap allocations for applicatio 603 compatibility mode on architectures which supp 604 space randomization. This value will be bound 605 architecture's minimum and maximum supported v 606 607 This value can be changed after boot using the 608 /proc/sys/vm/mmap_rnd_compat_bits tunable 609 610 611 nr_hugepages 612 ============ 613 614 Change the minimum size of the hugepage pool. 615 616 See Documentation/admin-guide/mm/hugetlbpage.r 617 618 619 hugetlb_optimize_vmemmap 620 ======================== 621 622 This knob is not available when the size of 's 623 in include/linux/mm_types.h) is not power of t 624 result in this). 625 626 Enable (set to 1) or disable (set to 0) HugeTL 627 628 Once enabled, the vmemmap pages of subsequent 629 buddy allocator will be optimized (7 pages per 630 per 1GB HugeTLB page), whereas already allocat 631 optimized. When those optimized HugeTLB pages 632 to the buddy allocator, the vmemmap pages repr 633 remapped again and the vmemmap pages discarded 634 again. If your use case is that HugeTLB pages 635 never explicitly allocating HugeTLB pages with 636 'nr_overcommit_hugepages', those overcommitted 637 the fly') instead of being pulled from the Hug 638 benefits of memory savings against the more ov 639 of allocation or freeing HugeTLB pages between 640 allocator. Another behavior to note is that i 641 pressure, it could prevent the user from freei 642 pool to the buddy allocator since the allocati 643 failed, you have to retry later if your system 644 645 Once disabled, the vmemmap pages of subsequent 646 buddy allocator will not be optimized meaning 647 time from buddy allocator disappears, whereas 648 will not be affected. If you want to make sur 649 pages, you can set "nr_hugepages" to 0 first a 650 writing 0 to nr_hugepages will make any "in us 651 pages. So, those surplus pages are still opti 652 in use. You would need to wait for those surp 653 there are no optimized pages in the system. 654 655 656 nr_hugepages_mempolicy 657 ====================== 658 659 Change the size of the hugepage pool at run-ti 660 set of NUMA nodes. 661 662 See Documentation/admin-guide/mm/hugetlbpage.r 663 664 665 nr_overcommit_hugepages 666 ======================= 667 668 Change the maximum size of the hugepage pool. 669 nr_hugepages + nr_overcommit_hugepages. 670 671 See Documentation/admin-guide/mm/hugetlbpage.r 672 673 674 nr_trim_pages 675 ============= 676 677 This is available only on NOMMU kernels. 678 679 This value adjusts the excess page trimming be 680 NOMMU mmap allocations. 681 682 A value of 0 disables trimming of allocations 683 trims excess pages aggressively. Any value >= 684 trimming of allocations is initiated. 685 686 The default value is 1. 687 688 See Documentation/admin-guide/mm/nommu-mmap.rs 689 690 691 numa_zonelist_order 692 =================== 693 694 This sysctl is only for NUMA and it is depreca 695 Node order will fail! 696 697 'where the memory is allocated from' is contro 698 699 (This documentation ignores ZONE_HIGHMEM/ZONE_ 700 you may be able to read ZONE_DMA as ZONE_DMA32 701 702 In non-NUMA case, a zonelist for GFP_KERNEL is 703 ZONE_NORMAL -> ZONE_DMA 704 This means that a memory allocation request fo 705 get memory from ZONE_DMA only when ZONE_NORMAL 706 707 In NUMA case, you can think of following 2 typ 708 Assume 2 node NUMA and below is zonelist of No 709 710 (A) Node(0) ZONE_NORMAL -> Node(0) ZONE_DMA 711 (B) Node(0) ZONE_NORMAL -> Node(1) ZONE_NORM 712 713 Type(A) offers the best locality for processes 714 will be used before ZONE_NORMAL exhaustion. Th 715 out-of-memory(OOM) of ZONE_DMA because ZONE_DM 716 717 Type(B) cannot offer the best locality but is 718 the DMA zone. 719 720 Type(A) is called as "Node" order. Type (B) is 721 722 "Node order" orders the zonelists by node, the 723 Specify "[Nn]ode" for node order 724 725 "Zone Order" orders the zonelists by zone type 726 zone. Specify "[Zz]one" for zone order. 727 728 Specify "[Dd]efault" to request automatic conf 729 730 On 32-bit, the Normal zone needs to be preserv 731 by the kernel, so "zone" order will be selecte 732 733 On 64-bit, devices that require DMA32/DMA are 734 order will be selected. 735 736 Default order is recommended unless this is ca 737 system/application. 738 739 740 oom_dump_tasks 741 ============== 742 743 Enables a system-wide task dump (excluding ker 744 when the kernel performs an OOM-killing and in 745 pid, uid, tgid, vm size, rss, pgtables_bytes, 746 score, and name. This is helpful to determine 747 invoked, to identify the rogue task that cause 748 the OOM killer chose the task it did to kill. 749 750 If this is set to zero, this information is su 751 large systems with thousands of tasks it may n 752 the memory state information for each one. Su 753 be forced to incur a performance penalty in OO 754 information may not be desired. 755 756 If this is set to non-zero, this information i 757 OOM killer actually kills a memory-hogging tas 758 759 The default value is 1 (enabled). 760 761 762 oom_kill_allocating_task 763 ======================== 764 765 This enables or disables killing the OOM-trigg 766 out-of-memory situations. 767 768 If this is set to zero, the OOM killer will sc 769 tasklist and select a task based on heuristics 770 selects a rogue memory-hogging task that frees 771 memory when killed. 772 773 If this is set to non-zero, the OOM killer sim 774 triggered the out-of-memory condition. This a 775 tasklist scan. 776 777 If panic_on_oom is selected, it takes preceden 778 is used in oom_kill_allocating_task. 779 780 The default value is 0. 781 782 783 overcommit_kbytes 784 ================= 785 786 When overcommit_memory is set to 2, the commit 787 permitted to exceed swap plus this amount of p 788 789 Note: overcommit_kbytes is the counterpart of 790 of them may be specified at a time. Setting on 791 then appears as 0 when read). 792 793 794 overcommit_memory 795 ================= 796 797 This value contains a flag that enables memory 798 799 When this flag is 0, the kernel compares the u 800 size against total memory plus swap and reject 801 802 When this flag is 1, the kernel pretends there 803 memory until it actually runs out. 804 805 When this flag is 2, the kernel uses a "never 806 policy that attempts to prevent any overcommit 807 Note that user_reserve_kbytes affects this pol 808 809 This feature can be very useful because there 810 programs that malloc() huge amounts of memory 811 and don't use much of it. 812 813 The default value is 0. 814 815 See Documentation/mm/overcommit-accounting.rst 816 mm/util.c::__vm_enough_memory() for more infor 817 818 819 overcommit_ratio 820 ================ 821 822 When overcommit_memory is set to 2, the commit 823 space is not permitted to exceed swap plus thi 824 of physical RAM. See above. 825 826 827 page-cluster 828 ============ 829 830 page-cluster controls the number of pages up t 831 are read in from swap in a single attempt. Thi 832 to page cache readahead. 833 The mentioned consecutivity is not in terms of 834 but consecutive on swap space - that means the 835 836 It is a logarithmic value - setting it to zero 837 it to 1 means "2 pages", setting it to 2 means 838 Zero disables swap readahead completely. 839 840 The default value is three (eight pages at a t 841 small benefits in tuning this to a different v 842 swap-intensive. 843 844 Lower values mean lower latencies for initial 845 extra faults and I/O delays for following faul 846 that consecutive pages readahead would have br 847 848 849 page_lock_unfairness 850 ==================== 851 852 This value determines the number of times that 853 stolen from under a waiter. After the lock is 854 specified in this file (default is 5), the "fa 855 will apply, and the waiter will only be awaken 856 857 panic_on_oom 858 ============ 859 860 This enables or disables panic on out-of-memor 861 862 If this is set to 0, the kernel will kill some 863 called oom_killer. Usually, oom_killer can ki 864 system will survive. 865 866 If this is set to 1, the kernel panics when ou 867 However, if a process limits using nodes by me 868 and those nodes become memory exhaustion statu 869 may be killed by oom-killer. No panic occurs i 870 Because other nodes' memory may be free. This 871 may be not fatal yet. 872 873 If this is set to 2, the kernel panics compuls 874 above-mentioned. Even oom happens under memory 875 system panics. 876 877 The default value is 0. 878 879 1 and 2 are for failover of clustering. Please 880 according to your policy of failover. 881 882 panic_on_oom=2+kdump gives you very strong too 883 why oom happens. You can get snapshot. 884 885 886 percpu_pagelist_high_fraction 887 ============================= 888 889 This is the fraction of pages in each zone tha 890 per-cpu page lists. It is an upper boundary th 891 on the number of online CPUs. The min value fo 892 that we do not allow more than 1/8th of pages 893 on per-cpu page lists. This entry only changes 894 page lists. A user can specify a number like 1 895 each zone between per-cpu lists. 896 897 The batch value of each per-cpu page list rema 898 the value of the high fraction so allocation l 899 900 The initial value is zero. Kernel uses this va 901 mark based on the low watermark for the zone a 902 online CPUs. If the user writes '0' to this s 903 this default behavior. 904 905 906 stat_interval 907 ============= 908 909 The time interval between which vm statistics 910 is 1 second. 911 912 913 stat_refresh 914 ============ 915 916 Any read or write (by root only) flushes all t 917 into their global totals, for more accurate re 918 e.g. cat /proc/sys/vm/stat_refresh /proc/memin 919 920 As a side-effect, it also checks for negative 921 as 0) and "fails" with EINVAL if any are found 922 (At time of writing, a few stats are known som 923 with no ill effects: errors and warnings on th 924 925 926 numa_stat 927 ========= 928 929 This interface allows runtime configuration of 930 931 When page allocation performance becomes a bot 932 some possible tool breakage and decreased numa 933 do:: 934 935 echo 0 > /proc/sys/vm/numa_stat 936 937 When page allocation performance is not a bott 938 tooling to work, you can do:: 939 940 echo 1 > /proc/sys/vm/numa_stat 941 942 943 swappiness 944 ========== 945 946 This control is used to define the rough relat 947 and filesystem paging, as a value between 0 an 948 assumes equal IO cost and will thus apply memo 949 cache and swap-backed pages equally; lower val 950 expensive swap IO, higher values indicates che 951 952 Keep in mind that filesystem IO patterns under 953 be more efficient than swap's random IO. An op 954 experimentation and will also be workload-depe 955 956 The default value is 60. 957 958 For in-memory swap, like zram or zswap, as wel 959 have swap on faster devices than the filesyste 960 be considered. For example, if the random IO a 961 is on average 2x faster than IO from the files 962 be 133 (x + 2x = 200, 2x = 133.33). 963 964 At 0, the kernel will not initiate swap until 965 file-backed pages is less than the high waterm 966 967 968 unprivileged_userfaultfd 969 ======================== 970 971 This flag controls the mode in which unprivile 972 userfaultfd system calls. Set this to 0 to res 973 to handle page faults in user mode only. In th 974 SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY i 975 succeed. Prohibiting use of userfaultfd for ha 976 mode may make certain vulnerabilities more dif 977 978 Set this to 1 to allow unprivileged users to u 979 calls without any restrictions. 980 981 The default value is 0. 982 983 Another way to control permissions for userfau 984 /dev/userfaultfd instead of userfaultfd(2). Se 985 Documentation/admin-guide/mm/userfaultfd.rst. 986 987 user_reserve_kbytes 988 =================== 989 990 When overcommit_memory is set to 2, "never ove 991 min(3% of current process size, user_reserve_k 992 This is intended to prevent a user from starti 993 process, such that they cannot recover (kill t 994 995 user_reserve_kbytes defaults to min(3% of the 996 997 If this is reduced to zero, then the user will 998 all free memory with a single process, minus a 999 Any subsequent attempts to execute a command w 1000 "fork: Cannot allocate memory". 1001 1002 Changing this takes effect whenever an applic 1003 1004 1005 vfs_cache_pressure 1006 ================== 1007 1008 This percentage value controls the tendency o 1009 the memory which is used for caching of direc 1010 1011 At the default value of vfs_cache_pressure=10 1012 reclaim dentries and inodes at a "fair" rate 1013 swapcache reclaim. Decreasing vfs_cache_pres 1014 to retain dentry and inode caches. When vfs_c 1015 never reclaim dentries and inodes due to memo 1016 lead to out-of-memory conditions. Increasing 1017 causes the kernel to prefer to reclaim dentri 1018 1019 Increasing vfs_cache_pressure significantly b 1020 performance impact. Reclaim code needs to tak 1021 directory and inode objects. With vfs_cache_p 1022 ten times more freeable objects than there ar 1023 1024 1025 watermark_boost_factor 1026 ====================== 1027 1028 This factor controls the level of reclaim whe 1029 It defines the percentage of the high waterma 1030 reclaimed if pages of different mobility are 1031 The intent is that compaction has less work t 1032 increase the success rate of future high-orde 1033 allocations, THP and hugetlbfs pages. 1034 1035 To make it sensible with respect to the water 1036 parameter, the unit is in fractions of 10,000 1037 15,000 means that up to 150% of the high wate 1038 event of a pageblock being mixed due to fragm 1039 is determined by the number of fragmentation 1040 recent past. If this value is smaller than a 1041 worth of pages will be reclaimed (e.g. 2MB o 1042 of 0 will disable the feature. 1043 1044 1045 watermark_scale_factor 1046 ====================== 1047 1048 This factor controls the aggressiveness of ks 1049 amount of memory left in a node/system before 1050 how much memory needs to be free before kswap 1051 1052 The unit is in fractions of 10,000. The defau 1053 distances between watermarks are 0.1% of the 1054 node/system. The maximum value is 3000, or 30 1055 1056 A high rate of threads entering direct reclai 1057 going to sleep prematurely (kswapd_low_wmark_ 1058 that the number of free pages kswapd maintain 1059 too small for the allocation bursts occurring 1060 can then be used to tune kswapd aggressivenes 1061 1062 1063 zone_reclaim_mode 1064 ================= 1065 1066 Zone_reclaim_mode allows someone to set more 1067 reclaim memory when a zone runs out of memory 1068 zone reclaim occurs. Allocations will be sati 1069 in the system. 1070 1071 This is value OR'ed together of 1072 1073 = =================================== 1074 1 Zone reclaim on 1075 2 Zone reclaim writes dirty pages out 1076 4 Zone reclaim swaps pages 1077 = =================================== 1078 1079 zone_reclaim_mode is disabled by default. Fo 1080 that benefit from having their data cached, z 1081 left disabled as the caching effect is likely 1082 data locality. 1083 1084 Consider enabling one or more zone_reclaim mo 1085 workload is partitioned such that each partit 1086 and that accessing remote memory would cause 1087 reduction. The page allocator will take addi 1088 allocating off node pages. 1089 1090 Allowing zone reclaim to write out pages stop 1091 writing large amounts of data from dirtying p 1092 reclaim will write out dirty pages if a zone 1093 throttle the process. This may decrease the p 1094 since it cannot use all of system memory to b 1095 anymore but it preserve the memory on other n 1096 of other processes running on other nodes wil 1097 1098 Allowing regular swap effectively restricts a 1099 node unless explicitly overridden by memory p 1100 configurations.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.