~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~
vm.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~
Diff markup

Differences between /Documentation/admin-guide/sysctl/vm.rst (Version linux-6.12-rc7) and /Documentation/admin-guide/sysctl/vm.rst (Version linux-4.19.323)

  1 ===============================                   
  2 Documentation for /proc/sys/vm/                   
  3 ===============================                   
  4                                                   
  5 kernel version 2.6.29                             
  6                                                   
  7 Copyright (c) 1998, 1999,  Rik van Riel <riel@n    
  8                                                   
  9 Copyright (c) 2008         Peter W. Morreale <p    
 10                                                   
 11 For general info and legal blurb, please look     
 12                                                   
 13 ----------------------------------------------    
 14                                                   
 15 This file contains the documentation for the s    
 16 /proc/sys/vm and is valid for Linux kernel ver    
 17                                                   
 18 The files in this directory can be used to tun    
 19 of the virtual memory (VM) subsystem of the Li    
 20 the writeout of dirty data to disk.               
 21                                                   
 22 Default values and initialization routines for    
 23 files can be found in mm/swap.c.                  
 24                                                   
 25 Currently, these files are in /proc/sys/vm:       
 26                                                   
 27 - admin_reserve_kbytes                            
 28 - compact_memory                                  
 29 - compaction_proactiveness                        
 30 - compact_unevictable_allowed                     
 31 - dirty_background_bytes                          
 32 - dirty_background_ratio                          
 33 - dirty_bytes                                     
 34 - dirty_expire_centisecs                          
 35 - dirty_ratio                                     
 36 - dirtytime_expire_seconds                        
 37 - dirty_writeback_centisecs                       
 38 - drop_caches                                     
 39 - enable_soft_offline                             
 40 - extfrag_threshold                               
 41 - highmem_is_dirtyable                            
 42 - hugetlb_shm_group                               
 43 - laptop_mode                                     
 44 - legacy_va_layout                                
 45 - lowmem_reserve_ratio                            
 46 - max_map_count                                   
 47 - mem_profiling         (only if CONFIG_MEM_AL    
 48 - memory_failure_early_kill                       
 49 - memory_failure_recovery                         
 50 - min_free_kbytes                                 
 51 - min_slab_ratio                                  
 52 - min_unmapped_ratio                              
 53 - mmap_min_addr                                   
 54 - mmap_rnd_bits                                   
 55 - mmap_rnd_compat_bits                            
 56 - nr_hugepages                                    
 57 - nr_hugepages_mempolicy                          
 58 - nr_overcommit_hugepages                         
 59 - nr_trim_pages         (only if CONFIG_MMU=n)    
 60 - numa_zonelist_order                             
 61 - oom_dump_tasks                                  
 62 - oom_kill_allocating_task                        
 63 - overcommit_kbytes                               
 64 - overcommit_memory                               
 65 - overcommit_ratio                                
 66 - page-cluster                                    
 67 - page_lock_unfairness                            
 68 - panic_on_oom                                    
 69 - percpu_pagelist_high_fraction                   
 70 - stat_interval                                   
 71 - stat_refresh                                    
 72 - numa_stat                                       
 73 - swappiness                                      
 74 - unprivileged_userfaultfd                        
 75 - user_reserve_kbytes                             
 76 - vfs_cache_pressure                              
 77 - watermark_boost_factor                          
 78 - watermark_scale_factor                          
 79 - zone_reclaim_mode                               
 80                                                   
 81                                                   
 82 admin_reserve_kbytes                              
 83 ====================                              
 84                                                   
 85 The amount of free memory in the system that s    
 86 with the capability cap_sys_admin.                
 87                                                   
 88 admin_reserve_kbytes defaults to min(3% of fre    
 89                                                   
 90 That should provide enough for the admin to lo    
 91 if necessary, under the default overcommit 'gu    
 92                                                   
 93 Systems running under overcommit 'never' shoul    
 94 for the full Virtual Memory Size of programs u    
 95 root may not be able to log in to recover the     
 96                                                   
 97 How do you calculate a minimum useful reserve?    
 98                                                   
 99 sshd or login + bash (or some other shell) + t    
100                                                   
101 For overcommit 'guess', we can sum resident se    
102 On x86_64 this is about 8MB.                      
103                                                   
104 For overcommit 'never', we can take the max of    
105 and add the sum of their RSS.                     
106 On x86_64 this is about 128MB.                    
107                                                   
108 Changing this takes effect whenever an applica    
109                                                   
110                                                   
111 compact_memory                                    
112 ==============                                    
113                                                   
114 Available only when CONFIG_COMPACTION is set.     
115 all zones are compacted such that free memory     
116 blocks where possible. This can be important f    
117 huge pages although processes will also direct    
118                                                   
119 compaction_proactiveness                          
120 ========================                          
121                                                   
122 This tunable takes a value in the range [0, 10    
123 20. This tunable determines how aggressively c    
124 background. Write of a non zero value to this     
125 trigger the proactive compaction. Setting it t    
126                                                   
127 Note that compaction has a non-trivial system-    
128 belonging to different processes are moved aro    
129 to latency spikes in unsuspecting applications    
130 various heuristics to avoid wasting CPU cycles    
131 proactive compaction is not being effective.      
132                                                   
133 Be careful when setting it to extreme values l    
134 cause excessive background compaction activity    
135                                                   
136 compact_unevictable_allowed                       
137 ===========================                       
138                                                   
139 Available only when CONFIG_COMPACTION is set.     
140 allowed to examine the unevictable lru (mlocke    
141 This should be used on systems where stalls fo    
142 acceptable trade for large contiguous free mem    
143 compaction from moving pages that are unevicta    
144 On CONFIG_PREEMPT_RT the default value is 0 in    
145 to compaction, which would block the task from    
146 is resolved.                                      
147                                                   
148                                                   
149 dirty_background_bytes                            
150 ======================                            
151                                                   
152 Contains the amount of dirty memory at which t    
153 flusher threads will start writeback.             
154                                                   
155 Note:                                             
156   dirty_background_bytes is the counterpart of    
157   one of them may be specified at a time. When    
158   immediately taken into account to evaluate t    
159   other appears as 0 when read.                   
160                                                   
161                                                   
162 dirty_background_ratio                            
163 ======================                            
164                                                   
165 Contains, as a percentage of total available m    
166 and reclaimable pages, the number of pages at     
167 flusher threads will start writing out dirty d    
168                                                   
169 The total available memory is not equal to tot    
170                                                   
171                                                   
172 dirty_bytes                                       
173 ===========                                       
174                                                   
175 Contains the amount of dirty memory at which a    
176 will itself start writeback.                      
177                                                   
178 Note: dirty_bytes is the counterpart of dirty_    
179 specified at a time. When one sysctl is writte    
180 account to evaluate the dirty memory limits an    
181 read.                                             
182                                                   
183 Note: the minimum value allowed for dirty_byte    
184 value lower than this limit will be ignored an    
185 retained.                                         
186                                                   
187                                                   
188 dirty_expire_centisecs                            
189 ======================                            
190                                                   
191 This tunable is used to define when dirty data    
192 for writeout by the kernel flusher threads.  I    
193 of a second.  Data which has been dirty in-mem    
194 interval will be written out next time a flush    
195                                                   
196                                                   
197 dirty_ratio                                       
198 ===========                                       
199                                                   
200 Contains, as a percentage of total available m    
201 and reclaimable pages, the number of pages at     
202 generating disk writes will itself start writi    
203                                                   
204 The total available memory is not equal to tot    
205                                                   
206                                                   
207 dirtytime_expire_seconds                          
208 ========================                          
209                                                   
210 When a lazytime inode is constantly having its    
211 an updated timestamp will never get chance to     
212 only thing that has happened on the file syste    
213 by an atime update, a worker will be scheduled    
214 eventually gets pushed out to disk.  This tuna    
215 inode is old enough to be eligible for writeba    
216 And, it is also used as the interval to wakeup    
217                                                   
218                                                   
219 dirty_writeback_centisecs                         
220 =========================                         
221                                                   
222 The kernel flusher threads will periodically w    
223 out to disk.  This tunable expresses the inter    
224 100'ths of a second.                              
225                                                   
226 Setting this to zero disables periodic writeba    
227                                                   
228                                                   
229 drop_caches                                       
230 ===========                                       
231                                                   
232 Writing to this will cause the kernel to drop     
233 reclaimable slab objects like dentries and ino    
234 memory becomes free.                              
235                                                   
236 To free pagecache::                               
237                                                   
238         echo 1 > /proc/sys/vm/drop_caches         
239                                                   
240 To free reclaimable slab objects (includes den    
241                                                   
242         echo 2 > /proc/sys/vm/drop_caches         
243                                                   
244 To free slab objects and pagecache::              
245                                                   
246         echo 3 > /proc/sys/vm/drop_caches         
247                                                   
248 This is a non-destructive operation and will n    
249 To increase the number of objects freed by thi    
250 `sync` prior to writing to /proc/sys/vm/drop_c    
251 number of dirty objects on the system and crea    
252 dropped.                                          
253                                                   
254 This file is not a means to control the growth    
255 (inodes, dentries, pagecache, etc...)  These o    
256 reclaimed by the kernel when memory is needed     
257                                                   
258 Use of this file can cause performance problem    
259 objects, it may cost a significant amount of I    
260 dropped objects, especially if they were under    
261 use outside of a testing or debugging environm    
262                                                   
263 You may see informational messages in your ker    
264 used::                                            
265                                                   
266         cat (1234): drop_caches: 3                
267                                                   
268 These are informational only.  They do not mea    
269 with your system.  To disable them, echo 4 (bi    
270                                                   
271 enable_soft_offline                               
272 ===================                               
273 Correctable memory errors are very common on s    
274 solution for memory pages having (excessive) c    
275                                                   
276 For different types of page, soft-offline has     
277                                                   
278 - For a raw error page, soft-offline migrates     
279   a new raw page.                                 
280                                                   
281 - For a page that is part of a transparent hug    
282   transparent hugepage into raw pages, then mi    
283   As a result, user is transparently backed by    
284   memory access performance.                      
285                                                   
286 - For a page that is part of a HugeTLB hugepag    
287   the entire HugeTLB hugepage, during which a     
288   as migration target.  Then the original huge    
289   pages without compensation, reducing the cap    
290                                                   
291 It is user's call to choose between reliabilit    
292 physical memory) vs performance / capacity imp    
293 HugeTLB cases.                                    
294                                                   
295 For all architectures, enable_soft_offline con    
296 memory pages.  When set to 1, kernel attempts     
297 whenever it thinks needed.  When set to 0, ker    
298 the request to soft offline the pages.  Its de    
299                                                   
300 It is worth mentioning that after setting enab    
301 following requests to soft offline pages will     
302                                                   
303 - Request to soft offline pages from RAS Corre    
304                                                   
305 - On ARM, the request to soft offline pages fr    
306                                                   
307 - On PARISC, the request to soft offline pages    
308                                                   
309 extfrag_threshold                                 
310 =================                                 
311                                                   
312 This parameter affects whether the kernel will    
313 reclaim to satisfy a high-order allocation. Th    
314 debugfs shows what the fragmentation index for    
315 the system. Values tending towards 0 imply all    
316 of memory, values towards 1000 imply failures     
317 implies that the allocation will succeed as lo    
318                                                   
319 The kernel will not compact memory in a zone i    
320 fragmentation index is <= extfrag_threshold. T    
321                                                   
322                                                   
323 highmem_is_dirtyable                              
324 ====================                              
325                                                   
326 Available only for systems with CONFIG_HIGHMEM    
327                                                   
328 This parameter controls whether the high memor    
329 writers throttling.  This is not the case by d    
330 only the amount of memory directly visible/usa    
331 be dirtied. As a result, on systems with a lar    
332 lowmem basically depleted writers might be thr    
333 streaming writes can get very slow.               
334                                                   
335 Changing the value to non zero would allow mor    
336 and thus allow writers to write more data whic    
337 storage more effectively. Note this also comes    
338 OOM killer because some writers (e.g. direct b    
339 only use the low memory and they can fill it u    
340 any throttling.                                   
341                                                   
342                                                   
343 hugetlb_shm_group                                 
344 =================                                 
345                                                   
346 hugetlb_shm_group contains group id that is al    
347 shared memory segment using hugetlb page.         
348                                                   
349                                                   
350 laptop_mode                                       
351 ===========                                       
352                                                   
353 laptop_mode is a knob that controls "laptop mo    
354 controlled by this knob are discussed in Docum    
355                                                   
356                                                   
357 legacy_va_layout                                  
358 ================                                  
359                                                   
360 If non-zero, this sysctl disables the new 32-b    
361 will use the legacy (2.4) layout for all proce    
362                                                   
363                                                   
364 lowmem_reserve_ratio                              
365 ====================                              
366                                                   
367 For some specialised workloads on highmem mach    
368 the kernel to allow process memory to be alloc    
369 zone.  This is because that memory could then     
370 system call, or by unavailability of swapspace    
371                                                   
372 And on large highmem machines this lack of rec    
373 can be fatal.                                     
374                                                   
375 So the Linux page allocator has a mechanism wh    
376 which *could* use highmem from using too much     
377 a certain amount of lowmem is defended from th    
378 captured into pinned user memory.                 
379                                                   
380 (The same argument applies to the old 16 megab    
381 mechanism will also defend that region from al    
382 highmem or lowmem).                               
383                                                   
384 The `lowmem_reserve_ratio` tunable determines     
385 in defending these lower zones.                   
386                                                   
387 If you have a machine which uses highmem or IS    
388 applications are using mlock(), or if you are     
389 you probably should change the lowmem_reserve_    
390                                                   
391 The lowmem_reserve_ratio is an array. You can     
392                                                   
393         % cat /proc/sys/vm/lowmem_reserve_rati    
394         256     256     32                        
395                                                   
396 But, these values are not used directly. The k    
397 pages for each zones from them. These are show    
398 in /proc/zoneinfo like the following. (This is    
399 Each zone has an array of protection pages lik    
400                                                   
401   Node 0, zone      DMA                           
402     pages free     1355                           
403           min      3                              
404           low      3                              
405           high     4                              
406         :                                         
407         :                                         
408       numa_other   0                              
409           protection: (0, 2004, 2004, 2004)       
410         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^         
411     pagesets                                      
412       cpu: 0 pcp: 0                               
413           :                                       
414                                                   
415 These protections are added to score to judge     
416 for page allocation or should be reclaimed.       
417                                                   
418 In this example, if normal pages (index=2) are    
419 watermark[WMARK_HIGH] is used for watermark, t    
420 not be used because pages_free(1355) is smalle    
421 (4 + 2004 = 2008). If this protection value is    
422 normal page requirement. If requirement is DMA    
423 (=0) is used.                                     
424                                                   
425 zone[i]'s protection[j] is calculated by follo    
426                                                   
427   (i < j):                                        
428     zone[i]->protection[j]                        
429     = (total sums of managed_pages from zone[i    
430       / lowmem_reserve_ratio[i];                  
431   (i = j):                                        
432      (should not be protected. = 0;               
433   (i > j):                                        
434      (not necessary, but looks 0)                 
435                                                   
436 The default values of lowmem_reserve_ratio[i]     
437                                                   
438     === ====================================      
439     256 (if zone[i] means DMA or DMA32 zone)      
440     32  (others)                                  
441     === ====================================      
442                                                   
443 As above expression, they are reciprocal numbe    
444 256 means 1/256. # of protection pages becomes    
445 pages of higher zones on the node.                
446                                                   
447 If you would like to protect more pages, small    
448 The minimum value is 1 (1/1 -> 100%). The valu    
449 disables protection of the pages.                 
450                                                   
451                                                   
452 max_map_count:                                    
453 ==============                                    
454                                                   
455 This file contains the maximum number of memor    
456 may have. Memory map areas are used as a side-    
457 malloc, directly by mmap, mprotect, and madvis    
458 shared libraries.                                 
459                                                   
460 While most applications need less than a thous    
461 programs, particularly malloc debuggers, may c    
462 e.g., up to one or two maps per allocation.       
463                                                   
464 The default value is 65530.                       
465                                                   
466                                                   
467 mem_profiling                                     
468 ==============                                    
469                                                   
470 Enable memory profiling (when CONFIG_MEM_ALLOC    
471                                                   
472 1: Enable memory profiling.                       
473                                                   
474 0: Disable memory profiling.                      
475                                                   
476 Enabling memory profiling introduces a small p    
477 memory allocations.                               
478                                                   
479 The default value depends on CONFIG_MEM_ALLOC_    
480                                                   
481                                                   
482 memory_failure_early_kill:                        
483 ==========================                        
484                                                   
485 Control how to kill processes when uncorrected    
486 a 2bit error in a memory module) is detected i    
487 that cannot be handled by the kernel. In some     
488 still having a valid copy on disk) the kernel     
489 transparently without affecting any applicatio    
490 no other up-to-date copy of the data it will k    
491 corruptions from propagating.                     
492                                                   
493 1: Kill all processes that have the corrupted     
494 as soon as the corruption is detected.  Note t    
495 for a few types of pages, like kernel internal    
496 the swap cache, but works for the majority of     
497                                                   
498 0: Only unmap the corrupted page from all proc    
499 who tries to access it.                           
500                                                   
501 The kill is done using a catchable SIGBUS with    
502 handle this if they want to.                      
503                                                   
504 This is only active on architectures/platforms    
505 check handling and depends on the hardware cap    
506                                                   
507 Applications can override this setting individ    
508                                                   
509                                                   
510 memory_failure_recovery                           
511 =======================                           
512                                                   
513 Enable memory failure recovery (when supported    
514                                                   
515 1: Attempt recovery.                              
516                                                   
517 0: Always panic on a memory failure.              
518                                                   
519                                                   
520 min_free_kbytes                                   
521 ===============                                   
522                                                   
523 This is used to force the Linux VM to keep a m    
524 of kilobytes free.  The VM uses this number to    
525 watermark[WMARK_MIN] value for each lowmem zon    
526 Each lowmem zone gets a number of reserved fre    
527 proportionally on its size.                       
528                                                   
529 Some minimal amount of memory is needed to sat    
530 allocations; if you set this to lower than 102    
531 become subtly broken, and prone to deadlock un    
532                                                   
533 Setting this too high will OOM your machine in    
534                                                   
535                                                   
536 min_slab_ratio                                    
537 ==============                                    
538                                                   
539 This is available only on NUMA kernels.           
540                                                   
541 A percentage of the total pages in each zone.     
542 (fallback from the local zone occurs) slabs wi    
543 than this percentage of pages in a zone are re    
544 This insures that the slab growth stays under     
545 systems that rarely perform global reclaim.       
546                                                   
547 The default is 5 percent.                         
548                                                   
549 Note that slab reclaim is triggered in a per z    
550 The process of reclaiming slab memory is curre    
551 and may not be fast.                              
552                                                   
553                                                   
554 min_unmapped_ratio                                
555 ==================                                
556                                                   
557 This is available only on NUMA kernels.           
558                                                   
559 This is a percentage of the total pages in eac    
560 only occur if more than this percentage of pag    
561 zone_reclaim_mode allows to be reclaimed.         
562                                                   
563 If zone_reclaim_mode has the value 4 OR'd, the    
564 against all file-backed unmapped pages includi    
565 files. Otherwise, only unmapped pages backed b    
566 files and similar are considered.                 
567                                                   
568 The default is 1 percent.                         
569                                                   
570                                                   
571 mmap_min_addr                                     
572 =============                                     
573                                                   
574 This file indicates the amount of address spac    
575 be restricted from mmapping.  Since kernel nul    
576 accidentally operate based on the information     
577 of memory userspace processes should not be al    
578 default this value is set to 0 and no protecti    
579 security module.  Setting this value to someth    
580 vast majority of applications to work correctl    
581 against future potential kernel bugs.             
582                                                   
583                                                   
584 mmap_rnd_bits                                     
585 =============                                     
586                                                   
587 This value can be used to select the number of    
588 determine the random offset to the base addres    
589 resulting from mmap allocations on architectur    
590 tuning address space randomization.  This valu    
591 by the architecture's minimum and maximum supp    
592                                                   
593 This value can be changed after boot using the    
594 /proc/sys/vm/mmap_rnd_bits tunable                
595                                                   
596                                                   
597 mmap_rnd_compat_bits                              
598 ====================                              
599                                                   
600 This value can be used to select the number of    
601 determine the random offset to the base addres    
602 resulting from mmap allocations for applicatio    
603 compatibility mode on architectures which supp    
604 space randomization.  This value will be bound    
605 architecture's minimum and maximum supported v    
606                                                   
607 This value can be changed after boot using the    
608 /proc/sys/vm/mmap_rnd_compat_bits tunable         
609                                                   
610                                                   
611 nr_hugepages                                      
612 ============                                      
613                                                   
614 Change the minimum size of the hugepage pool.     
615                                                   
616 See Documentation/admin-guide/mm/hugetlbpage.r    
617                                                   
618                                                   
619 hugetlb_optimize_vmemmap                          
620 ========================                          
621                                                   
622 This knob is not available when the size of 's    
623 in include/linux/mm_types.h) is not power of t    
624 result in this).                                  
625                                                   
626 Enable (set to 1) or disable (set to 0) HugeTL    
627                                                   
628 Once enabled, the vmemmap pages of subsequent     
629 buddy allocator will be optimized (7 pages per    
630 per 1GB HugeTLB page), whereas already allocat    
631 optimized.  When those optimized HugeTLB pages    
632 to the buddy allocator, the vmemmap pages repr    
633 remapped again and the vmemmap pages discarded    
634 again.  If your use case is that HugeTLB pages    
635 never explicitly allocating HugeTLB pages with    
636 'nr_overcommit_hugepages', those overcommitted    
637 the fly') instead of being pulled from the Hug    
638 benefits of memory savings against the more ov    
639 of allocation or freeing HugeTLB pages between    
640 allocator.  Another behavior to note is that i    
641 pressure, it could prevent the user from freei    
642 pool to the buddy allocator since the allocati    
643 failed, you have to retry later if your system    
644                                                   
645 Once disabled, the vmemmap pages of subsequent    
646 buddy allocator will not be optimized meaning     
647 time from buddy allocator disappears, whereas     
648 will not be affected.  If you want to make sur    
649 pages, you can set "nr_hugepages" to 0 first a    
650 writing 0 to nr_hugepages will make any "in us    
651 pages.  So, those surplus pages are still opti    
652 in use.  You would need to wait for those surp    
653 there are no optimized pages in the system.       
654                                                   
655                                                   
656 nr_hugepages_mempolicy                            
657 ======================                            
658                                                   
659 Change the size of the hugepage pool at run-ti    
660 set of NUMA nodes.                                
661                                                   
662 See Documentation/admin-guide/mm/hugetlbpage.r    
663                                                   
664                                                   
665 nr_overcommit_hugepages                           
666 =======================                           
667                                                   
668 Change the maximum size of the hugepage pool.     
669 nr_hugepages + nr_overcommit_hugepages.           
670                                                   
671 See Documentation/admin-guide/mm/hugetlbpage.r    
672                                                   
673                                                   
674 nr_trim_pages                                     
675 =============                                     
676                                                   
677 This is available only on NOMMU kernels.          
678                                                   
679 This value adjusts the excess page trimming be    
680 NOMMU mmap allocations.                           
681                                                   
682 A value of 0 disables trimming of allocations     
683 trims excess pages aggressively. Any value >=     
684 trimming of allocations is initiated.             
685                                                   
686 The default value is 1.                           
687                                                   
688 See Documentation/admin-guide/mm/nommu-mmap.rs    
689                                                   
690                                                   
691 numa_zonelist_order                               
692 ===================                               
693                                                   
694 This sysctl is only for NUMA and it is depreca    
695 Node order will fail!                             
696                                                   
697 'where the memory is allocated from' is contro    
698                                                   
699 (This documentation ignores ZONE_HIGHMEM/ZONE_    
700 you may be able to read ZONE_DMA as ZONE_DMA32    
701                                                   
702 In non-NUMA case, a zonelist for GFP_KERNEL is    
703 ZONE_NORMAL -> ZONE_DMA                           
704 This means that a memory allocation request fo    
705 get memory from ZONE_DMA only when ZONE_NORMAL    
706                                                   
707 In NUMA case, you can think of following 2 typ    
708 Assume 2 node NUMA and below is zonelist of No    
709                                                   
710   (A) Node(0) ZONE_NORMAL -> Node(0) ZONE_DMA     
711   (B) Node(0) ZONE_NORMAL -> Node(1) ZONE_NORM    
712                                                   
713 Type(A) offers the best locality for processes    
714 will be used before ZONE_NORMAL exhaustion. Th    
715 out-of-memory(OOM) of ZONE_DMA because ZONE_DM    
716                                                   
717 Type(B) cannot offer the best locality but is     
718 the DMA zone.                                     
719                                                   
720 Type(A) is called as "Node" order. Type (B) is    
721                                                   
722 "Node order" orders the zonelists by node, the    
723 Specify "[Nn]ode" for node order                  
724                                                   
725 "Zone Order" orders the zonelists by zone type    
726 zone.  Specify "[Zz]one" for zone order.          
727                                                   
728 Specify "[Dd]efault" to request automatic conf    
729                                                   
730 On 32-bit, the Normal zone needs to be preserv    
731 by the kernel, so "zone" order will be selecte    
732                                                   
733 On 64-bit, devices that require DMA32/DMA are     
734 order will be selected.                           
735                                                   
736 Default order is recommended unless this is ca    
737 system/application.                               
738                                                   
739                                                   
740 oom_dump_tasks                                    
741 ==============                                    
742                                                   
743 Enables a system-wide task dump (excluding ker    
744 when the kernel performs an OOM-killing and in    
745 pid, uid, tgid, vm size, rss, pgtables_bytes,     
746 score, and name.  This is helpful to determine    
747 invoked, to identify the rogue task that cause    
748 the OOM killer chose the task it did to kill.     
749                                                   
750 If this is set to zero, this information is su    
751 large systems with thousands of tasks it may n    
752 the memory state information for each one.  Su    
753 be forced to incur a performance penalty in OO    
754 information may not be desired.                   
755                                                   
756 If this is set to non-zero, this information i    
757 OOM killer actually kills a memory-hogging tas    
758                                                   
759 The default value is 1 (enabled).                 
760                                                   
761                                                   
762 oom_kill_allocating_task                          
763 ========================                          
764                                                   
765 This enables or disables killing the OOM-trigg    
766 out-of-memory situations.                         
767                                                   
768 If this is set to zero, the OOM killer will sc    
769 tasklist and select a task based on heuristics    
770 selects a rogue memory-hogging task that frees    
771 memory when killed.                               
772                                                   
773 If this is set to non-zero, the OOM killer sim    
774 triggered the out-of-memory condition.  This a    
775 tasklist scan.                                    
776                                                   
777 If panic_on_oom is selected, it takes preceden    
778 is used in oom_kill_allocating_task.              
779                                                   
780 The default value is 0.                           
781                                                   
782                                                   
783 overcommit_kbytes                                 
784 =================                                 
785                                                   
786 When overcommit_memory is set to 2, the commit    
787 permitted to exceed swap plus this amount of p    
788                                                   
789 Note: overcommit_kbytes is the counterpart of     
790 of them may be specified at a time. Setting on    
791 then appears as 0 when read).                     
792                                                   
793                                                   
794 overcommit_memory                                 
795 =================                                 
796                                                   
797 This value contains a flag that enables memory    
798                                                   
799 When this flag is 0, the kernel compares the u    
800 size against total memory plus swap and reject    
801                                                   
802 When this flag is 1, the kernel pretends there    
803 memory until it actually runs out.                
804                                                   
805 When this flag is 2, the kernel uses a "never     
806 policy that attempts to prevent any overcommit    
807 Note that user_reserve_kbytes affects this pol    
808                                                   
809 This feature can be very useful because there     
810 programs that malloc() huge amounts of memory     
811 and don't use much of it.                         
812                                                   
813 The default value is 0.                           
814                                                   
815 See Documentation/mm/overcommit-accounting.rst    
816 mm/util.c::__vm_enough_memory() for more infor    
817                                                   
818                                                   
819 overcommit_ratio                                  
820 ================                                  
821                                                   
822 When overcommit_memory is set to 2, the commit    
823 space is not permitted to exceed swap plus thi    
824 of physical RAM.  See above.                      
825                                                   
826                                                   
827 page-cluster                                      
828 ============                                      
829                                                   
830 page-cluster controls the number of pages up t    
831 are read in from swap in a single attempt. Thi    
832 to page cache readahead.                          
833 The mentioned consecutivity is not in terms of    
834 but consecutive on swap space - that means the    
835                                                   
836 It is a logarithmic value - setting it to zero    
837 it to 1 means "2 pages", setting it to 2 means    
838 Zero disables swap readahead completely.          
839                                                   
840 The default value is three (eight pages at a t    
841 small benefits in tuning this to a different v    
842 swap-intensive.                                   
843                                                   
844 Lower values mean lower latencies for initial     
845 extra faults and I/O delays for following faul    
846 that consecutive pages readahead would have br    
847                                                   
848                                                   
849 page_lock_unfairness                              
850 ====================                              
851                                                   
852 This value determines the number of times that    
853 stolen from under a waiter. After the lock is     
854 specified in this file (default is 5), the "fa    
855 will apply, and the waiter will only be awaken    
856                                                   
857 panic_on_oom                                      
858 ============                                      
859                                                   
860 This enables or disables panic on out-of-memor    
861                                                   
862 If this is set to 0, the kernel will kill some    
863 called oom_killer.  Usually, oom_killer can ki    
864 system will survive.                              
865                                                   
866 If this is set to 1, the kernel panics when ou    
867 However, if a process limits using nodes by me    
868 and those nodes become memory exhaustion statu    
869 may be killed by oom-killer. No panic occurs i    
870 Because other nodes' memory may be free. This     
871 may be not fatal yet.                             
872                                                   
873 If this is set to 2, the kernel panics compuls    
874 above-mentioned. Even oom happens under memory    
875 system panics.                                    
876                                                   
877 The default value is 0.                           
878                                                   
879 1 and 2 are for failover of clustering. Please    
880 according to your policy of failover.             
881                                                   
882 panic_on_oom=2+kdump gives you very strong too    
883 why oom happens. You can get snapshot.            
884                                                   
885                                                   
886 percpu_pagelist_high_fraction                     
887 =============================                     
888                                                   
889 This is the fraction of pages in each zone tha    
890 per-cpu page lists. It is an upper boundary th    
891 on the number of online CPUs. The min value fo    
892 that we do not allow more than 1/8th of pages     
893 on per-cpu page lists. This entry only changes    
894 page lists. A user can specify a number like 1    
895 each zone between per-cpu lists.                  
896                                                   
897 The batch value of each per-cpu page list rema    
898 the value of the high fraction so allocation l    
899                                                   
900 The initial value is zero. Kernel uses this va    
901 mark based on the low watermark for the zone a    
902 online CPUs.  If the user writes '0' to this s    
903 this default behavior.                            
904                                                   
905                                                   
906 stat_interval                                     
907 =============                                     
908                                                   
909 The time interval between which vm statistics     
910 is 1 second.                                      
911                                                   
912                                                   
913 stat_refresh                                      
914 ============                                      
915                                                   
916 Any read or write (by root only) flushes all t    
917 into their global totals, for more accurate re    
918 e.g. cat /proc/sys/vm/stat_refresh /proc/memin    
919                                                   
920 As a side-effect, it also checks for negative     
921 as 0) and "fails" with EINVAL if any are found    
922 (At time of writing, a few stats are known som    
923 with no ill effects: errors and warnings on th    
924                                                   
925                                                   
926 numa_stat                                         
927 =========                                         
928                                                   
929 This interface allows runtime configuration of    
930                                                   
931 When page allocation performance becomes a bot    
932 some possible tool breakage and decreased numa    
933 do::                                              
934                                                   
935         echo 0 > /proc/sys/vm/numa_stat           
936                                                   
937 When page allocation performance is not a bott    
938 tooling to work, you can do::                     
939                                                   
940         echo 1 > /proc/sys/vm/numa_stat           
941                                                   
942                                                   
943 swappiness                                        
944 ==========                                        
945                                                   
946 This control is used to define the rough relat    
947 and filesystem paging, as a value between 0 an    
948 assumes equal IO cost and will thus apply memo    
949 cache and swap-backed pages equally; lower val    
950 expensive swap IO, higher values indicates che    
951                                                   
952 Keep in mind that filesystem IO patterns under    
953 be more efficient than swap's random IO. An op    
954 experimentation and will also be workload-depe    
955                                                   
956 The default value is 60.                          
957                                                   
958 For in-memory swap, like zram or zswap, as wel    
959 have swap on faster devices than the filesyste    
960 be considered. For example, if the random IO a    
961 is on average 2x faster than IO from the files    
962 be 133 (x + 2x = 200, 2x = 133.33).               
963                                                   
964 At 0, the kernel will not initiate swap until     
965 file-backed pages is less than the high waterm    
966                                                   
967                                                   
968 unprivileged_userfaultfd                          
969 ========================                          
970                                                   
971 This flag controls the mode in which unprivile    
972 userfaultfd system calls. Set this to 0 to res    
973 to handle page faults in user mode only. In th    
974 SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY i    
975 succeed. Prohibiting use of userfaultfd for ha    
976 mode may make certain vulnerabilities more dif    
977                                                   
978 Set this to 1 to allow unprivileged users to u    
979 calls without any restrictions.                   
980                                                   
981 The default value is 0.                           
982                                                   
983 Another way to control permissions for userfau    
984 /dev/userfaultfd instead of userfaultfd(2). Se    
985 Documentation/admin-guide/mm/userfaultfd.rst.     
986                                                   
987 user_reserve_kbytes                               
988 ===================                               
989                                                   
990 When overcommit_memory is set to 2, "never ove    
991 min(3% of current process size, user_reserve_k    
992 This is intended to prevent a user from starti    
993 process, such that they cannot recover (kill t    
994                                                   
995 user_reserve_kbytes defaults to min(3% of the     
996                                                   
997 If this is reduced to zero, then the user will    
998 all free memory with a single process, minus a    
999 Any subsequent attempts to execute a command w    
1000 "fork: Cannot allocate memory".                  
1001                                                  
1002 Changing this takes effect whenever an applic    
1003                                                  
1004                                                  
1005 vfs_cache_pressure                               
1006 ==================                               
1007                                                  
1008 This percentage value controls the tendency o    
1009 the memory which is used for caching of direc    
1010                                                  
1011 At the default value of vfs_cache_pressure=10    
1012 reclaim dentries and inodes at a "fair" rate     
1013 swapcache reclaim.  Decreasing vfs_cache_pres    
1014 to retain dentry and inode caches. When vfs_c    
1015 never reclaim dentries and inodes due to memo    
1016 lead to out-of-memory conditions. Increasing     
1017 causes the kernel to prefer to reclaim dentri    
1018                                                  
1019 Increasing vfs_cache_pressure significantly b    
1020 performance impact. Reclaim code needs to tak    
1021 directory and inode objects. With vfs_cache_p    
1022 ten times more freeable objects than there ar    
1023                                                  
1024                                                  
1025 watermark_boost_factor                           
1026 ======================                           
1027                                                  
1028 This factor controls the level of reclaim whe    
1029 It defines the percentage of the high waterma    
1030 reclaimed if pages of different mobility are     
1031 The intent is that compaction has less work t    
1032 increase the success rate of future high-orde    
1033 allocations, THP and hugetlbfs pages.            
1034                                                  
1035 To make it sensible with respect to the water    
1036 parameter, the unit is in fractions of 10,000    
1037 15,000 means that up to 150% of the high wate    
1038 event of a pageblock being mixed due to fragm    
1039 is determined by the number of fragmentation     
1040 recent past. If this value is smaller than a     
1041 worth of pages will be reclaimed (e.g.  2MB o    
1042 of 0 will disable the feature.                   
1043                                                  
1044                                                  
1045 watermark_scale_factor                           
1046 ======================                           
1047                                                  
1048 This factor controls the aggressiveness of ks    
1049 amount of memory left in a node/system before    
1050 how much memory needs to be free before kswap    
1051                                                  
1052 The unit is in fractions of 10,000. The defau    
1053 distances between watermarks are 0.1% of the     
1054 node/system. The maximum value is 3000, or 30    
1055                                                  
1056 A high rate of threads entering direct reclai    
1057 going to sleep prematurely (kswapd_low_wmark_    
1058 that the number of free pages kswapd maintain    
1059 too small for the allocation bursts occurring    
1060 can then be used to tune kswapd aggressivenes    
1061                                                  
1062                                                  
1063 zone_reclaim_mode                                
1064 =================                                
1065                                                  
1066 Zone_reclaim_mode allows someone to set more     
1067 reclaim memory when a zone runs out of memory    
1068 zone reclaim occurs. Allocations will be sati    
1069 in the system.                                   
1070                                                  
1071 This is value OR'ed together of                  
1072                                                  
1073 =       ===================================      
1074 1       Zone reclaim on                          
1075 2       Zone reclaim writes dirty pages out      
1076 4       Zone reclaim swaps pages                 
1077 =       ===================================      
1078                                                  
1079 zone_reclaim_mode is disabled by default.  Fo    
1080 that benefit from having their data cached, z    
1081 left disabled as the caching effect is likely    
1082 data locality.                                   
1083                                                  
1084 Consider enabling one or more zone_reclaim mo    
1085 workload is partitioned such that each partit    
1086 and that accessing remote memory would cause     
1087 reduction.  The page allocator will take addi    
1088 allocating off node pages.                       
1089                                                  
1090 Allowing zone reclaim to write out pages stop    
1091 writing large amounts of data from dirtying p    
1092 reclaim will write out dirty pages if a zone     
1093 throttle the process. This may decrease the p    
1094 since it cannot use all of system memory to b    
1095 anymore but it preserve the memory on other n    
1096 of other processes running on other nodes wil    
1097                                                  
1098 Allowing regular swap effectively restricts a    
1099 node unless explicitly overridden by memory p    
1100 configurations.
~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.
TOMOYO Linux Cross Reference Linux/Documentation/admin-guide/sysctl/vm.rst

Diff markup

Differences between /Documentation/admin-guide/sysctl/vm.rst (Version linux-6.12-rc7) and /Documentation/admin-guide/sysctl/vm.rst (Version linux-4.19.323)

TOMOYO Linux Cross Reference
Linux/Documentation/admin-guide/sysctl/vm.rst