~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/admin-guide/mm/multigen_lru.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/admin-guide/mm/multigen_lru.rst (Version linux-6.12-rc7) and /Documentation/admin-guide/mm/multigen_lru.rst (Version linux-6.6.60)


  1 .. SPDX-License-Identifier: GPL-2.0                 1 .. SPDX-License-Identifier: GPL-2.0
  2                                                     2 
  3 =============                                       3 =============
  4 Multi-Gen LRU                                       4 Multi-Gen LRU
  5 =============                                       5 =============
  6 The multi-gen LRU is an alternative LRU implem      6 The multi-gen LRU is an alternative LRU implementation that optimizes
  7 page reclaim and improves performance under me      7 page reclaim and improves performance under memory pressure. Page
  8 reclaim decides the kernel's caching policy an      8 reclaim decides the kernel's caching policy and ability to overcommit
  9 memory. It directly impacts the kswapd CPU usa      9 memory. It directly impacts the kswapd CPU usage and RAM efficiency.
 10                                                    10 
 11 Quick start                                        11 Quick start
 12 ===========                                        12 ===========
 13 Build the kernel with the following configurat     13 Build the kernel with the following configurations.
 14                                                    14 
 15 * ``CONFIG_LRU_GEN=y``                             15 * ``CONFIG_LRU_GEN=y``
 16 * ``CONFIG_LRU_GEN_ENABLED=y``                     16 * ``CONFIG_LRU_GEN_ENABLED=y``
 17                                                    17 
 18 All set!                                           18 All set!
 19                                                    19 
 20 Runtime options                                    20 Runtime options
 21 ===============                                    21 ===============
 22 ``/sys/kernel/mm/lru_gen/`` contains stable AB     22 ``/sys/kernel/mm/lru_gen/`` contains stable ABIs described in the
 23 following subsections.                             23 following subsections.
 24                                                    24 
 25 Kill switch                                        25 Kill switch
 26 -----------                                        26 -----------
 27 ``enabled`` accepts different values to enable     27 ``enabled`` accepts different values to enable or disable the
 28 following components. Its default value depend     28 following components. Its default value depends on
 29 ``CONFIG_LRU_GEN_ENABLED``. All the components     29 ``CONFIG_LRU_GEN_ENABLED``. All the components should be enabled
 30 unless some of them have unforeseen side effec     30 unless some of them have unforeseen side effects. Writing to
 31 ``enabled`` has no effect when a component is      31 ``enabled`` has no effect when a component is not supported by the
 32 hardware, and valid values will be accepted ev     32 hardware, and valid values will be accepted even when the main switch
 33 is off.                                            33 is off.
 34                                                    34 
 35 ====== =======================================     35 ====== ===============================================================
 36 Values Components                                  36 Values Components
 37 ====== =======================================     37 ====== ===============================================================
 38 0x0001 The main switch for the multi-gen LRU.      38 0x0001 The main switch for the multi-gen LRU.
 39 0x0002 Clearing the accessed bit in leaf page      39 0x0002 Clearing the accessed bit in leaf page table entries in large
 40        batches, when MMU sets it (e.g., on x86     40        batches, when MMU sets it (e.g., on x86). This behavior can
 41        theoretically worsen lock contention (m     41        theoretically worsen lock contention (mmap_lock). If it is
 42        disabled, the multi-gen LRU will suffer     42        disabled, the multi-gen LRU will suffer a minor performance
 43        degradation for workloads that contiguo     43        degradation for workloads that contiguously map hot pages,
 44        whose accessed bits can be otherwise cl     44        whose accessed bits can be otherwise cleared by fewer larger
 45        batches.                                    45        batches.
 46 0x0004 Clearing the accessed bit in non-leaf p     46 0x0004 Clearing the accessed bit in non-leaf page table entries as
 47        well, when MMU sets it (e.g., on x86).      47        well, when MMU sets it (e.g., on x86). This behavior was not
 48        verified on x86 varieties other than In     48        verified on x86 varieties other than Intel and AMD. If it is
 49        disabled, the multi-gen LRU will suffer     49        disabled, the multi-gen LRU will suffer a negligible
 50        performance degradation.                    50        performance degradation.
 51 [yYnN] Apply to all the components above.          51 [yYnN] Apply to all the components above.
 52 ====== =======================================     52 ====== ===============================================================
 53                                                    53 
 54 E.g.,                                              54 E.g.,
 55 ::                                                 55 ::
 56                                                    56 
 57     echo y >/sys/kernel/mm/lru_gen/enabled         57     echo y >/sys/kernel/mm/lru_gen/enabled
 58     cat /sys/kernel/mm/lru_gen/enabled             58     cat /sys/kernel/mm/lru_gen/enabled
 59     0x0007                                         59     0x0007
 60     echo 5 >/sys/kernel/mm/lru_gen/enabled         60     echo 5 >/sys/kernel/mm/lru_gen/enabled
 61     cat /sys/kernel/mm/lru_gen/enabled             61     cat /sys/kernel/mm/lru_gen/enabled
 62     0x0005                                         62     0x0005
 63                                                    63 
 64 Thrashing prevention                               64 Thrashing prevention
 65 --------------------                               65 --------------------
 66 Personal computers are more sensitive to thras     66 Personal computers are more sensitive to thrashing because it can
 67 cause janks (lags when rendering UI) and negat     67 cause janks (lags when rendering UI) and negatively impact user
 68 experience. The multi-gen LRU offers thrashing     68 experience. The multi-gen LRU offers thrashing prevention to the
 69 majority of laptop and desktop users who do no     69 majority of laptop and desktop users who do not have ``oomd``.
 70                                                    70 
 71 Users can write ``N`` to ``min_ttl_ms`` to pre     71 Users can write ``N`` to ``min_ttl_ms`` to prevent the working set of
 72 ``N`` milliseconds from getting evicted. The O     72 ``N`` milliseconds from getting evicted. The OOM killer is triggered
 73 if this working set cannot be kept in memory.      73 if this working set cannot be kept in memory. In other words, this
 74 option works as an adjustable pressure relief      74 option works as an adjustable pressure relief valve, and when open, it
 75 terminates applications that are hopefully not     75 terminates applications that are hopefully not being used.
 76                                                    76 
 77 Based on the average human detectable lag (~10     77 Based on the average human detectable lag (~100ms), ``N=1000`` usually
 78 eliminates intolerable janks due to thrashing.     78 eliminates intolerable janks due to thrashing. Larger values like
 79 ``N=3000`` make janks less noticeable at the r     79 ``N=3000`` make janks less noticeable at the risk of premature OOM
 80 kills.                                             80 kills.
 81                                                    81 
 82 The default value ``0`` means disabled.            82 The default value ``0`` means disabled.
 83                                                    83 
 84 Experimental features                              84 Experimental features
 85 =====================                              85 =====================
 86 ``/sys/kernel/debug/lru_gen`` accepts commands     86 ``/sys/kernel/debug/lru_gen`` accepts commands described in the
 87 following subsections. Multiple command lines      87 following subsections. Multiple command lines are supported, so does
 88 concatenation with delimiters ``,`` and ``;``.     88 concatenation with delimiters ``,`` and ``;``.
 89                                                    89 
 90 ``/sys/kernel/debug/lru_gen_full`` provides ad     90 ``/sys/kernel/debug/lru_gen_full`` provides additional stats for
 91 debugging. ``CONFIG_LRU_GEN_STATS=y`` keeps hi     91 debugging. ``CONFIG_LRU_GEN_STATS=y`` keeps historical stats from
 92 evicted generations in this file.                  92 evicted generations in this file.
 93                                                    93 
 94 Working set estimation                             94 Working set estimation
 95 ----------------------                             95 ----------------------
 96 Working set estimation measures how much memor     96 Working set estimation measures how much memory an application needs
 97 in a given time interval, and it is usually do     97 in a given time interval, and it is usually done with little impact on
 98 the performance of the application. E.g., data     98 the performance of the application. E.g., data centers want to
 99 optimize job scheduling (bin packing) to impro     99 optimize job scheduling (bin packing) to improve memory utilizations.
100 When a new job comes in, the job scheduler nee    100 When a new job comes in, the job scheduler needs to find out whether
101 each server it manages can allocate a certain     101 each server it manages can allocate a certain amount of memory for
102 this new job before it can pick a candidate. T    102 this new job before it can pick a candidate. To do so, the job
103 scheduler needs to estimate the working sets o    103 scheduler needs to estimate the working sets of the existing jobs.
104                                                   104 
105 When it is read, ``lru_gen`` returns a histogr    105 When it is read, ``lru_gen`` returns a histogram of numbers of pages
106 accessed over different time intervals for eac    106 accessed over different time intervals for each memcg and node.
107 ``MAX_NR_GENS`` decides the number of bins for    107 ``MAX_NR_GENS`` decides the number of bins for each histogram. The
108 histograms are noncumulative.                     108 histograms are noncumulative.
109 ::                                                109 ::
110                                                   110 
111     memcg  memcg_id  memcg_path                   111     memcg  memcg_id  memcg_path
112        node  node_id                              112        node  node_id
113            min_gen_nr  age_in_ms  nr_anon_page    113            min_gen_nr  age_in_ms  nr_anon_pages  nr_file_pages
114            ...                                    114            ...
115            max_gen_nr  age_in_ms  nr_anon_page    115            max_gen_nr  age_in_ms  nr_anon_pages  nr_file_pages
116                                                   116 
117 Each bin contains an estimated number of pages    117 Each bin contains an estimated number of pages that have been accessed
118 within ``age_in_ms``. E.g., ``min_gen_nr`` con    118 within ``age_in_ms``. E.g., ``min_gen_nr`` contains the coldest pages
119 and ``max_gen_nr`` contains the hottest pages,    119 and ``max_gen_nr`` contains the hottest pages, since ``age_in_ms`` of
120 the former is the largest and that of the latt    120 the former is the largest and that of the latter is the smallest.
121                                                   121 
122 Users can write the following command to ``lru    122 Users can write the following command to ``lru_gen`` to create a new
123 generation ``max_gen_nr+1``:                      123 generation ``max_gen_nr+1``:
124                                                   124 
125     ``+ memcg_id node_id max_gen_nr [can_swap     125     ``+ memcg_id node_id max_gen_nr [can_swap [force_scan]]``
126                                                   126 
127 ``can_swap`` defaults to the swap setting and,    127 ``can_swap`` defaults to the swap setting and, if it is set to ``1``,
128 it forces the scan of anon pages when swap is     128 it forces the scan of anon pages when swap is off, and vice versa.
129 ``force_scan`` defaults to ``1`` and, if it is    129 ``force_scan`` defaults to ``1`` and, if it is set to ``0``, it
130 employs heuristics to reduce the overhead, whi    130 employs heuristics to reduce the overhead, which is likely to reduce
131 the coverage as well.                             131 the coverage as well.
132                                                   132 
133 A typical use case is that a job scheduler run    133 A typical use case is that a job scheduler runs this command at a
134 certain time interval to create new generation    134 certain time interval to create new generations, and it ranks the
135 servers it manages based on the sizes of their    135 servers it manages based on the sizes of their cold pages defined by
136 this time interval.                               136 this time interval.
137                                                   137 
138 Proactive reclaim                                 138 Proactive reclaim
139 -----------------                                 139 -----------------
140 Proactive reclaim induces page reclaim when th    140 Proactive reclaim induces page reclaim when there is no memory
141 pressure. It usually targets cold pages only.     141 pressure. It usually targets cold pages only. E.g., when a new job
142 comes in, the job scheduler wants to proactive    142 comes in, the job scheduler wants to proactively reclaim cold pages on
143 the server it selected, to improve the chance     143 the server it selected, to improve the chance of successfully landing
144 this new job.                                     144 this new job.
145                                                   145 
146 Users can write the following command to ``lru    146 Users can write the following command to ``lru_gen`` to evict
147 generations less than or equal to ``min_gen_nr    147 generations less than or equal to ``min_gen_nr``.
148                                                   148 
149     ``- memcg_id node_id min_gen_nr [swappines    149     ``- memcg_id node_id min_gen_nr [swappiness [nr_to_reclaim]]``
150                                                   150 
151 ``min_gen_nr`` should be less than ``max_gen_n    151 ``min_gen_nr`` should be less than ``max_gen_nr-1``, since
152 ``max_gen_nr`` and ``max_gen_nr-1`` are not fu    152 ``max_gen_nr`` and ``max_gen_nr-1`` are not fully aged (equivalent to
153 the active list) and therefore cannot be evict    153 the active list) and therefore cannot be evicted. ``swappiness``
154 overrides the default value in ``/proc/sys/vm/    154 overrides the default value in ``/proc/sys/vm/swappiness``.
155 ``nr_to_reclaim`` limits the number of pages t    155 ``nr_to_reclaim`` limits the number of pages to evict.
156                                                   156 
157 A typical use case is that a job scheduler run    157 A typical use case is that a job scheduler runs this command before it
158 tries to land a new job on a server. If it fai    158 tries to land a new job on a server. If it fails to materialize enough
159 cold pages because of the overestimation, it r    159 cold pages because of the overestimation, it retries on the next
160 server according to the ranking result obtaine    160 server according to the ranking result obtained from the working set
161 estimation step. This less forceful approach l    161 estimation step. This less forceful approach limits the impacts on the
162 existing jobs.                                    162 existing jobs.
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php