1 ======== 2 zsmalloc 3 ======== 4 5 This allocator is designed for use with zram. Thus, the allocator is 6 supposed to work well under low memory conditions. In particular, it 7 never attempts higher order page allocation which is very likely to 8 fail under memory pressure. On the other hand, if we just use single 9 (0-order) pages, it would suffer from very high fragmentation -- 10 any object of size PAGE_SIZE/2 or larger would occupy an entire page. 11 This was one of the major issues with its predecessor (xvmalloc). 12 13 To overcome these issues, zsmalloc allocates a bunch of 0-order pages 14 and links them together using various 'struct page' fields. These linked 15 pages act as a single higher-order page i.e. an object can span 0-order 16 page boundaries. The code refers to these linked pages as a single entity 17 called zspage. 18 19 For simplicity, zsmalloc can only allocate objects of size up to PAGE_SIZE 20 since this satisfies the requirements of all its current users (in the 21 worst case, page is incompressible and is thus stored "as-is" i.e. in 22 uncompressed form). For allocation requests larger than this size, failure 23 is returned (see zs_malloc). 24 25 Additionally, zs_malloc() does not return a dereferenceable pointer. 26 Instead, it returns an opaque handle (unsigned long) which encodes actual 27 location of the allocated object. The reason for this indirection is that 28 zsmalloc does not keep zspages permanently mapped since that would cause 29 issues on 32-bit systems where the VA region for kernel space mappings 30 is very small. So, before using the allocating memory, the object has to 31 be mapped using zs_map_object() to get a usable pointer and subsequently 32 unmapped using zs_unmap_object(). 33 34 stat 35 ==== 36 37 With CONFIG_ZSMALLOC_STAT, we could see zsmalloc internal information via 38 ``/sys/kernel/debug/zsmalloc/<user name>``. Here is a sample of stat output:: 39 40 # cat /sys/kernel/debug/zsmalloc/zram0/classes 41 42 class size 10% 20% 30% 40% 50% 60% 70% 80% 90% 99% 100% obj_allocated obj_used pages_used pages_per_zspage freeable 43 ... 44 ... 45 30 512 0 12 4 1 0 1 0 0 1 0 414 3464 3346 433 1 14 46 31 528 2 7 2 2 1 0 1 0 0 2 117 4154 3793 536 4 44 47 32 544 6 3 4 1 2 1 0 0 0 1 260 4170 3965 556 2 26 48 ... 49 ... 50 51 52 class 53 index 54 size 55 object size zspage stores 56 10% 57 the number of zspages with usage ratio less than 10% (see below) 58 20% 59 the number of zspages with usage ratio between 10% and 20% 60 30% 61 the number of zspages with usage ratio between 20% and 30% 62 40% 63 the number of zspages with usage ratio between 30% and 40% 64 50% 65 the number of zspages with usage ratio between 40% and 50% 66 60% 67 the number of zspages with usage ratio between 50% and 60% 68 70% 69 the number of zspages with usage ratio between 60% and 70% 70 80% 71 the number of zspages with usage ratio between 70% and 80% 72 90% 73 the number of zspages with usage ratio between 80% and 90% 74 99% 75 the number of zspages with usage ratio between 90% and 99% 76 100% 77 the number of zspages with usage ratio 100% 78 obj_allocated 79 the number of objects allocated 80 obj_used 81 the number of objects allocated to the user 82 pages_used 83 the number of pages allocated for the class 84 pages_per_zspage 85 the number of 0-order pages to make a zspage 86 freeable 87 the approximate number of pages class compaction can free 88 89 Each zspage maintains inuse counter which keeps track of the number of 90 objects stored in the zspage. The inuse counter determines the zspage's 91 "fullness group" which is calculated as the ratio of the "inuse" objects to 92 the total number of objects the zspage can hold (objs_per_zspage). The 93 closer the inuse counter is to objs_per_zspage, the better. 94 95 Internals 96 ========= 97 98 zsmalloc has 255 size classes, each of which can hold a number of zspages. 99 Each zspage can contain up to ZSMALLOC_CHAIN_SIZE physical (0-order) pages. 100 The optimal zspage chain size for each size class is calculated during the 101 creation of the zsmalloc pool (see calculate_zspage_chain_size()). 102 103 As an optimization, zsmalloc merges size classes that have similar 104 characteristics in terms of the number of pages per zspage and the number 105 of objects that each zspage can store. 106 107 For instance, consider the following size classes::: 108 109 class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 110 ... 111 94 1536 0 .... 0 0 0 0 3 0 112 100 1632 0 .... 0 0 0 0 2 0 113 ... 114 115 116 Size classes #95-99 are merged with size class #100. This means that when we 117 need to store an object of size, say, 1568 bytes, we end up using size class 118 #100 instead of size class #96. Size class #100 is meant for objects of size 119 1632 bytes, so each object of size 1568 bytes wastes 1632-1568=64 bytes. 120 121 Size class #100 consists of zspages with 2 physical pages each, which can 122 hold a total of 5 objects. If we need to store 13 objects of size 1568, we 123 end up allocating three zspages, or 6 physical pages. 124 125 However, if we take a closer look at size class #96 (which is meant for 126 objects of size 1568 bytes) and trace `calculate_zspage_chain_size()`, we 127 find that the most optimal zspage configuration for this class is a chain 128 of 5 physical pages::: 129 130 pages per zspage wasted bytes used% 131 1 960 76 132 2 352 95 133 3 1312 89 134 4 704 95 135 5 96 99 136 137 This means that a class #96 configuration with 5 physical pages can store 13 138 objects of size 1568 in a single zspage, using a total of 5 physical pages. 139 This is more efficient than the class #100 configuration, which would use 6 140 physical pages to store the same number of objects. 141 142 As the zspage chain size for class #96 increases, its key characteristics 143 such as pages per-zspage and objects per-zspage also change. This leads to 144 dewer class mergers, resulting in a more compact grouping of classes, which 145 reduces memory wastage. 146 147 Let's take a closer look at the bottom of `/sys/kernel/debug/zsmalloc/zramX/classes`::: 148 149 class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 150 151 ... 152 202 3264 0 .. 0 0 0 0 4 0 153 254 4096 0 .. 0 0 0 0 1 0 154 ... 155 156 Size class #202 stores objects of size 3264 bytes and has a maximum of 4 pages 157 per zspage. Any object larger than 3264 bytes is considered huge and belongs 158 to size class #254, which stores each object in its own physical page (objects 159 in huge classes do not share pages). 160 161 Increasing the size of the chain of zspages also results in a higher watermark 162 for the huge size class and fewer huge classes overall. This allows for more 163 efficient storage of large objects. 164 165 For zspage chain size of 8, huge class watermark becomes 3632 bytes::: 166 167 class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 168 169 ... 170 202 3264 0 .. 0 0 0 0 4 0 171 211 3408 0 .. 0 0 0 0 5 0 172 217 3504 0 .. 0 0 0 0 6 0 173 222 3584 0 .. 0 0 0 0 7 0 174 225 3632 0 .. 0 0 0 0 8 0 175 254 4096 0 .. 0 0 0 0 1 0 176 ... 177 178 For zspage chain size of 16, huge class watermark becomes 3840 bytes::: 179 180 class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 181 182 ... 183 202 3264 0 .. 0 0 0 0 4 0 184 206 3328 0 .. 0 0 0 0 13 0 185 207 3344 0 .. 0 0 0 0 9 0 186 208 3360 0 .. 0 0 0 0 14 0 187 211 3408 0 .. 0 0 0 0 5 0 188 212 3424 0 .. 0 0 0 0 16 0 189 214 3456 0 .. 0 0 0 0 11 0 190 217 3504 0 .. 0 0 0 0 6 0 191 219 3536 0 .. 0 0 0 0 13 0 192 222 3584 0 .. 0 0 0 0 7 0 193 223 3600 0 .. 0 0 0 0 15 0 194 225 3632 0 .. 0 0 0 0 8 0 195 228 3680 0 .. 0 0 0 0 9 0 196 230 3712 0 .. 0 0 0 0 10 0 197 232 3744 0 .. 0 0 0 0 11 0 198 234 3776 0 .. 0 0 0 0 12 0 199 235 3792 0 .. 0 0 0 0 13 0 200 236 3808 0 .. 0 0 0 0 14 0 201 238 3840 0 .. 0 0 0 0 15 0 202 254 4096 0 .. 0 0 0 0 1 0 203 ... 204 205 Overall the combined zspage chain size effect on zsmalloc pool configuration::: 206 207 pages per zspage number of size classes (clusters) huge size class watermark 208 4 69 3264 209 5 86 3408 210 6 93 3504 211 7 112 3584 212 8 123 3632 213 9 140 3680 214 10 143 3712 215 11 159 3744 216 12 164 3776 217 13 180 3792 218 14 183 3808 219 15 188 3840 220 16 191 3840 221 222 223 A synthetic test 224 ---------------- 225 226 zram as a build artifacts storage (Linux kernel compilation). 227 228 * `CONFIG_ZSMALLOC_CHAIN_SIZE=4` 229 230 zsmalloc classes stats::: 231 232 class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 233 234 ... 235 Total 13 .. 51 413836 412973 159955 3 236 237 zram mm_stat::: 238 239 1691783168 628083717 655175680 0 655175680 60 0 34048 34049 240 241 242 * `CONFIG_ZSMALLOC_CHAIN_SIZE=8` 243 244 zsmalloc classes stats::: 245 246 class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 247 248 ... 249 Total 18 .. 87 414852 412978 156666 0 250 251 zram mm_stat::: 252 253 1691803648 627793930 641703936 0 641703936 60 0 33591 33591 254 255 Using larger zspage chains may result in using fewer physical pages, as seen 256 in the example where the number of physical pages used decreased from 159955 257 to 156666, at the same time maximum zsmalloc pool memory usage went down from 258 655175680 to 641703936 bytes. 259 260 However, this advantage may be offset by the potential for increased system 261 memory pressure (as some zspages have larger chain sizes) in cases where there 262 is heavy internal fragmentation and zspool compaction is unable to relocate 263 objects and release zspages. In these cases, it is recommended to decrease 264 the limit on the size of the zspage chains (as specified by the 265 CONFIG_ZSMALLOC_CHAIN_SIZE option). 266 267 Functions 268 ========= 269 270 .. kernel-doc:: mm/zsmalloc.c
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.