1 ================= 2 Concepts overview 3 ================= 4 5 The memory management in Linux is a complex sy 6 years and included more and more functionality 7 systems from MMU-less microcontrollers to supe 8 management for systems without an MMU is calle 9 definitely deserves a dedicated document, whic 10 eventually written. Yet, although some of the 11 here we assume that an MMU is available and a 12 address to a physical address. 13 14 .. contents:: :local: 15 16 Virtual Memory Primer 17 ===================== 18 19 The physical memory in a computer system is a 20 even for systems that support memory hotplug t 21 the amount of memory that can be installed. Th 22 necessarily contiguous; it might be accessible 23 address ranges. Besides, different CPU archite 24 different implementations of the same architec 25 of how these address ranges are defined. 26 27 All this makes dealing directly with physical 28 to avoid this complexity a concept of virtual 29 30 The virtual memory abstracts the details of ph 31 application software, allows to keep only need 32 physical memory (demand paging) and provides a 33 protection and controlled sharing of data betw 34 35 With virtual memory, each and every memory acc 36 address. When the CPU decodes an instruction t 37 writes) from (or to) the system memory, it tra 38 address encoded in that instruction to a `phys 39 memory controller can understand. 40 41 The physical system memory is divided into pag 42 size of each page is architecture specific. So 43 selection of the page size from several suppor 44 selection is performed at the kernel build tim 45 appropriate kernel configuration option. 46 47 Each physical memory page can be mapped as one 48 pages. These mappings are described by page ta 49 translation from a virtual address used by pro 50 memory address. The page tables are organized 51 52 The tables at the lowest level of the hierarch 53 addresses of actual pages used by the software 54 levels contain physical addresses of the pages 55 levels. The pointer to the top level page tabl 56 register. When the CPU performs the address tr 57 register to access the top level page table. T 58 virtual address are used to index an entry in 59 table. That entry is then used to access the n 60 hierarchy with the next bits of the virtual ad 61 that level page table. The lowest bits in the 62 the offset inside the actual page. 63 64 Huge Pages 65 ========== 66 67 The address translation requires several memor 68 accesses are slow relatively to CPU speed. To 69 processor cycles on the address translation, C 70 such translations called Translation Lookaside 71 TLB). Usually TLB is pretty scarce resource an 72 large memory working set will experience perfo 73 TLB misses. 74 75 Many modern CPU architectures allow mapping of 76 directly by the higher levels in the page tabl 77 it is possible to map 2M and even 1G pages usi 78 and the third level page tables. In Linux such 79 `huge`. Usage of huge pages significantly redu 80 improves TLB hit-rate and thus improves overal 81 82 There are two mechanisms in Linux that enable 83 memory with the huge pages. The first one is ` 84 hugetlbfs. It is a pseudo filesystem that uses 85 store. For the files created in this filesyste 86 the memory and mapped using huge pages. The hu 87 Documentation/admin-guide/mm/hugetlbpage.rst. 88 89 Another, more recent, mechanism that enables u 90 called `Transparent HugePages`, or THP. Unlike 91 requires users and/or system administrators to 92 the system memory should and can be mapped by 93 manages such mappings transparently to the use 94 name. See Documentation/admin-guide/mm/transhu 95 about THP. 96 97 Zones 98 ===== 99 100 Often hardware poses restrictions on how diffe 101 ranges can be accessed. In some cases, devices 102 all the addressable memory. In other cases, th 103 memory exceeds the maximal addressable size of 104 special actions are required to access portion 105 groups memory pages into `zones` according to 106 usage. For example, ZONE_DMA will contain memo 107 devices for DMA, ZONE_HIGHMEM will contain mem 108 permanently mapped into kernel's address space 109 contain normally addressed pages. 110 111 The actual layout of the memory zones is hardw 112 architectures define all zones, and requiremen 113 for different platforms. 114 115 Nodes 116 ===== 117 118 Many multi-processor machines are NUMA - Non-U 119 systems. In such systems the memory is arrange 120 different access latency depending on the "dis 121 processor. Each bank is referred to as a `node 122 constructs an independent memory management su 123 own set of zones, lists of free and used pages 124 counters. You can find more details about NUMA 125 Documentation/mm/numa.rst` and in 126 Documentation/admin-guide/mm/numa_memory_polic 127 128 Page cache 129 ========== 130 131 The physical memory is volatile and the common 132 into the memory is to read it from files. When 133 data is put into the `page cache` to avoid exp 134 the subsequent reads. Similarly, when one writ 135 is placed in the page cache and eventually get 136 storage device. The written pages are marked a 137 decides to reuse them for other purposes, it m 138 the file contents on the device with the updat 139 140 Anonymous Memory 141 ================ 142 143 The `anonymous memory` or `anonymous mappings` 144 is not backed by a filesystem. Such mappings a 145 for program's stack and heap or by explicit ca 146 call. Usually, the anonymous mappings only def 147 that the program is allowed to access. The rea 148 in creation of a page table entry that referen 149 page filled with zeroes. When the program perf 150 physical page will be allocated to hold the wr 151 will be marked dirty and if the kernel decides 152 the dirty page will be swapped out. 153 154 Reclaim 155 ======= 156 157 Throughout the system lifetime, a physical pag 158 different types of data. It can be kernel inte 159 DMA'able buffers for device drivers use, data 160 memory allocated by user space processes etc. 161 162 Depending on the page usage it is treated diff 163 memory management. The pages that can be freed 164 because they cache the data available elsewher 165 hard disk, or because they can be swapped out, 166 disk, are called `reclaimable`. The most notab 167 reclaimable pages are page cache and anonymous 168 169 In most cases, the pages holding internal kern 170 buffers cannot be repurposed, and they remain 171 their user. Such pages are called `unreclaimab 172 circumstances, even pages occupied with kernel 173 reclaimed. For instance, in-memory caches of f 174 be re-read from the storage device and therefo 175 discard them from the main memory when system 176 pressure. 177 178 The process of freeing the reclaimable physica 179 repurposing them is called (surprise!) `reclai 180 pages either asynchronously or synchronously, 181 of the system. When the system is not loaded, 182 and allocation requests will be satisfied imme 183 pages supply. As the load increases, the amoun 184 down and when it reaches a certain threshold ( 185 allocation request will awaken the ``kswapd`` 186 asynchronously scan memory pages and either ju 187 they contain is available elsewhere, or evict 188 device (remember those dirty pages?). As memor 189 more and reaches another threshold - min water 190 will trigger `direct reclaim`. In this case al 191 until enough memory pages are reclaimed to sat 192 193 Compaction 194 ========== 195 196 As the system runs, tasks allocate and free th 197 fragmented. Although with virtual memory it is 198 scattered physical pages as virtually contiguo 199 necessary to allocate large physically contigu 200 need may arise, for instance, when a device dr 201 buffer for DMA, or when THP allocates a huge p 202 addresses the fragmentation issue. This mechan 203 from the lower part of a memory zone to free p 204 of the zone. When a compaction scan is finishe 205 together at the beginning of the zone and allo 206 physically contiguous areas become possible. 207 208 Like reclaim, the compaction may happen asynch 209 daemon or synchronously as a result of a memor 210 211 OOM killer 212 ========== 213 214 It is possible that on a loaded machine memory 215 kernel will be unable to reclaim enough memory 216 order to save the rest of the system, it invok 217 218 The `OOM killer` selects a task to sacrifice f 219 system health. The selected task is killed in 220 enough memory will be freed to continue normal
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.