~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/mm/balance.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 ================
  2 Memory Balancing
  3 ================
  4 
  5 Started Jan 2000 by Kanoj Sarcar <kanoj@sgi.com>
  6 
  7 Memory balancing is needed for !__GFP_HIGH and !__GFP_KSWAPD_RECLAIM as
  8 well as for non __GFP_IO allocations.
  9 
 10 The first reason why a caller may avoid reclaim is that the caller can not
 11 sleep due to holding a spinlock or is in interrupt context. The second may
 12 be that the caller is willing to fail the allocation without incurring the
 13 overhead of page reclaim. This may happen for opportunistic high-order
 14 allocation requests that have order-0 fallback options. In such cases,
 15 the caller may also wish to avoid waking kswapd.
 16 
 17 __GFP_IO allocation requests are made to prevent file system deadlocks.
 18 
 19 In the absence of non sleepable allocation requests, it seems detrimental
 20 to be doing balancing. Page reclamation can be kicked off lazily, that
 21 is, only when needed (aka zone free memory is 0), instead of making it
 22 a proactive process.
 23 
 24 That being said, the kernel should try to fulfill requests for direct
 25 mapped pages from the direct mapped pool, instead of falling back on
 26 the dma pool, so as to keep the dma pool filled for dma requests (atomic
 27 or not). A similar argument applies to highmem and direct mapped pages.
 28 OTOH, if there is a lot of free dma pages, it is preferable to satisfy
 29 regular memory requests by allocating one from the dma pool, instead
 30 of incurring the overhead of regular zone balancing.
 31 
 32 In 2.2, memory balancing/page reclamation would kick off only when the
 33 _total_ number of free pages fell below 1/64 th of total memory. With the
 34 right ratio of dma and regular memory, it is quite possible that balancing
 35 would not be done even when the dma zone was completely empty. 2.2 has
 36 been running production machines of varying memory sizes, and seems to be
 37 doing fine even with the presence of this problem. In 2.3, due to
 38 HIGHMEM, this problem is aggravated.
 39 
 40 In 2.3, zone balancing can be done in one of two ways: depending on the
 41 zone size (and possibly of the size of lower class zones), we can decide
 42 at init time how many free pages we should aim for while balancing any
 43 zone. The good part is, while balancing, we do not need to look at sizes
 44 of lower class zones, the bad part is, we might do too frequent balancing
 45 due to ignoring possibly lower usage in the lower class zones. Also,
 46 with a slight change in the allocation routine, it is possible to reduce
 47 the memclass() macro to be a simple equality.
 48 
 49 Another possible solution is that we balance only when the free memory
 50 of a zone _and_ all its lower class zones falls below 1/64th of the
 51 total memory in the zone and its lower class zones. This fixes the 2.2
 52 balancing problem, and stays as close to 2.2 behavior as possible. Also,
 53 the balancing algorithm works the same way on the various architectures,
 54 which have different numbers and types of zones. If we wanted to get
 55 fancy, we could assign different weights to free pages in different
 56 zones in the future.
 57 
 58 Note that if the size of the regular zone is huge compared to dma zone,
 59 it becomes less significant to consider the free dma pages while
 60 deciding whether to balance the regular zone. The first solution
 61 becomes more attractive then.
 62 
 63 The appended patch implements the second solution. It also "fixes" two
 64 problems: first, kswapd is woken up as in 2.2 on low memory conditions
 65 for non-sleepable allocations. Second, the HIGHMEM zone is also balanced,
 66 so as to give a fighting chance for replace_with_highmem() to get a
 67 HIGHMEM page, as well as to ensure that HIGHMEM allocations do not
 68 fall back into regular zone. This also makes sure that HIGHMEM pages
 69 are not leaked (for example, in situations where a HIGHMEM page is in
 70 the swapcache but is not being used by anyone)
 71 
 72 kswapd also needs to know about the zones it should balance. kswapd is
 73 primarily needed in a situation where balancing can not be done,
 74 probably because all allocation requests are coming from intr context
 75 and all process contexts are sleeping. For 2.3, kswapd does not really
 76 need to balance the highmem zone, since intr context does not request
 77 highmem pages. kswapd looks at the zone_wake_kswapd field in the zone
 78 structure to decide whether a zone needs balancing.
 79 
 80 Page stealing from process memory and shm is done if stealing the page would
 81 alleviate memory pressure on any zone in the page's node that has fallen below
 82 its watermark.
 83 
 84 watemark[WMARK_MIN/WMARK_LOW/WMARK_HIGH]/low_on_memory/zone_wake_kswapd: These
 85 are per-zone fields, used to determine when a zone needs to be balanced. When
 86 the number of pages falls below watermark[WMARK_MIN], the hysteric field
 87 low_on_memory gets set. This stays set till the number of free pages becomes
 88 watermark[WMARK_HIGH]. When low_on_memory is set, page allocation requests will
 89 try to free some pages in the zone (providing GFP_WAIT is set in the request).
 90 Orthogonal to this, is the decision to poke kswapd to free some zone pages.
 91 That decision is not hysteresis based, and is done when the number of free
 92 pages is below watermark[WMARK_LOW]; in which case zone_wake_kswapd is also set.
 93 
 94 
 95 (Good) Ideas that I have heard:
 96 
 97 1. Dynamic experience should influence balancing: number of failed requests
 98    for a zone can be tracked and fed into the balancing scheme (jalvo@mbay.net)
 99 2. Implement a replace_with_highmem()-like replace_with_regular() to preserve
100    dma pages. (lkd@tantalophile.demon.co.uk)

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php