1 ================== 1 ================== 2 HugeTLB Controller 2 HugeTLB Controller 3 ================== 3 ================== 4 4 5 HugeTLB controller can be created by first mou 5 HugeTLB controller can be created by first mounting the cgroup filesystem. 6 6 7 # mount -t cgroup -o hugetlb none /sys/fs/cgro 7 # mount -t cgroup -o hugetlb none /sys/fs/cgroup 8 8 9 With the above step, the initial or the parent 9 With the above step, the initial or the parent HugeTLB group becomes 10 visible at /sys/fs/cgroup. At bootup, this gro 10 visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in 11 the system. /sys/fs/cgroup/tasks lists the tas 11 the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup. 12 12 13 New groups can be created under the parent gro 13 New groups can be created under the parent group /sys/fs/cgroup:: 14 14 15 # cd /sys/fs/cgroup 15 # cd /sys/fs/cgroup 16 # mkdir g1 16 # mkdir g1 17 # echo $$ > g1/tasks 17 # echo $$ > g1/tasks 18 18 19 The above steps create a new group g1 and move 19 The above steps create a new group g1 and move the current shell 20 process (bash) into it. 20 process (bash) into it. 21 21 22 Brief summary of control files:: 22 Brief summary of control files:: 23 23 24 hugetlb.<hugepagesize>.rsvd.limit_in_bytes 24 hugetlb.<hugepagesize>.rsvd.limit_in_bytes # set/show limit of "hugepagesize" hugetlb reservations 25 hugetlb.<hugepagesize>.rsvd.max_usage_in_byte 25 hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes # show max "hugepagesize" hugetlb reservations and no-reserve faults 26 hugetlb.<hugepagesize>.rsvd.usage_in_bytes 26 hugetlb.<hugepagesize>.rsvd.usage_in_bytes # show current reservations and no-reserve faults for "hugepagesize" hugetlb 27 hugetlb.<hugepagesize>.rsvd.failcnt 27 hugetlb.<hugepagesize>.rsvd.failcnt # show the number of allocation failure due to HugeTLB reservation limit 28 hugetlb.<hugepagesize>.limit_in_bytes 28 hugetlb.<hugepagesize>.limit_in_bytes # set/show limit of "hugepagesize" hugetlb faults 29 hugetlb.<hugepagesize>.max_usage_in_bytes 29 hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded 30 hugetlb.<hugepagesize>.usage_in_bytes 30 hugetlb.<hugepagesize>.usage_in_bytes # show current usage for "hugepagesize" hugetlb 31 hugetlb.<hugepagesize>.failcnt 31 hugetlb.<hugepagesize>.failcnt # show the number of allocation failure due to HugeTLB usage limit 32 hugetlb.<hugepagesize>.numa_stat << 33 32 34 For a system supporting three hugepage sizes ( 33 For a system supporting three hugepage sizes (64k, 32M and 1G), the control 35 files include:: 34 files include:: 36 35 37 hugetlb.1GB.limit_in_bytes 36 hugetlb.1GB.limit_in_bytes 38 hugetlb.1GB.max_usage_in_bytes 37 hugetlb.1GB.max_usage_in_bytes 39 hugetlb.1GB.numa_stat << 40 hugetlb.1GB.usage_in_bytes 38 hugetlb.1GB.usage_in_bytes 41 hugetlb.1GB.failcnt 39 hugetlb.1GB.failcnt 42 hugetlb.1GB.rsvd.limit_in_bytes 40 hugetlb.1GB.rsvd.limit_in_bytes 43 hugetlb.1GB.rsvd.max_usage_in_bytes 41 hugetlb.1GB.rsvd.max_usage_in_bytes 44 hugetlb.1GB.rsvd.usage_in_bytes 42 hugetlb.1GB.rsvd.usage_in_bytes 45 hugetlb.1GB.rsvd.failcnt 43 hugetlb.1GB.rsvd.failcnt 46 hugetlb.64KB.limit_in_bytes 44 hugetlb.64KB.limit_in_bytes 47 hugetlb.64KB.max_usage_in_bytes 45 hugetlb.64KB.max_usage_in_bytes 48 hugetlb.64KB.numa_stat << 49 hugetlb.64KB.usage_in_bytes 46 hugetlb.64KB.usage_in_bytes 50 hugetlb.64KB.failcnt 47 hugetlb.64KB.failcnt 51 hugetlb.64KB.rsvd.limit_in_bytes 48 hugetlb.64KB.rsvd.limit_in_bytes 52 hugetlb.64KB.rsvd.max_usage_in_bytes 49 hugetlb.64KB.rsvd.max_usage_in_bytes 53 hugetlb.64KB.rsvd.usage_in_bytes 50 hugetlb.64KB.rsvd.usage_in_bytes 54 hugetlb.64KB.rsvd.failcnt 51 hugetlb.64KB.rsvd.failcnt 55 hugetlb.32MB.limit_in_bytes 52 hugetlb.32MB.limit_in_bytes 56 hugetlb.32MB.max_usage_in_bytes 53 hugetlb.32MB.max_usage_in_bytes 57 hugetlb.32MB.numa_stat << 58 hugetlb.32MB.usage_in_bytes 54 hugetlb.32MB.usage_in_bytes 59 hugetlb.32MB.failcnt 55 hugetlb.32MB.failcnt 60 hugetlb.32MB.rsvd.limit_in_bytes 56 hugetlb.32MB.rsvd.limit_in_bytes 61 hugetlb.32MB.rsvd.max_usage_in_bytes 57 hugetlb.32MB.rsvd.max_usage_in_bytes 62 hugetlb.32MB.rsvd.usage_in_bytes 58 hugetlb.32MB.rsvd.usage_in_bytes 63 hugetlb.32MB.rsvd.failcnt 59 hugetlb.32MB.rsvd.failcnt 64 60 65 61 66 1. Page fault accounting 62 1. Page fault accounting 67 63 68 :: !! 64 hugetlb.<hugepagesize>.limit_in_bytes 69 !! 65 hugetlb.<hugepagesize>.max_usage_in_bytes 70 hugetlb.<hugepagesize>.limit_in_bytes !! 66 hugetlb.<hugepagesize>.usage_in_bytes 71 hugetlb.<hugepagesize>.max_usage_in_bytes !! 67 hugetlb.<hugepagesize>.failcnt 72 hugetlb.<hugepagesize>.usage_in_bytes << 73 hugetlb.<hugepagesize>.failcnt << 74 68 75 The HugeTLB controller allows users to limit t 69 The HugeTLB controller allows users to limit the HugeTLB usage (page fault) per 76 control group and enforces the limit during pa 70 control group and enforces the limit during page fault. Since HugeTLB 77 doesn't support page reclaim, enforcing the li 71 doesn't support page reclaim, enforcing the limit at page fault time implies 78 that, the application will get SIGBUS signal i 72 that, the application will get SIGBUS signal if it tries to fault in HugeTLB 79 pages beyond its limit. Therefore the applicat 73 pages beyond its limit. Therefore the application needs to know exactly how many 80 HugeTLB pages it uses before hand, and the sys 74 HugeTLB pages it uses before hand, and the sysadmin needs to make sure that 81 there are enough available on the machine for 75 there are enough available on the machine for all the users to avoid processes 82 getting SIGBUS. 76 getting SIGBUS. 83 77 84 78 85 2. Reservation accounting 79 2. Reservation accounting 86 80 87 :: !! 81 hugetlb.<hugepagesize>.rsvd.limit_in_bytes 88 !! 82 hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes 89 hugetlb.<hugepagesize>.rsvd.limit_in_bytes !! 83 hugetlb.<hugepagesize>.rsvd.usage_in_bytes 90 hugetlb.<hugepagesize>.rsvd.max_usage_in_byt !! 84 hugetlb.<hugepagesize>.rsvd.failcnt 91 hugetlb.<hugepagesize>.rsvd.usage_in_bytes << 92 hugetlb.<hugepagesize>.rsvd.failcnt << 93 85 94 The HugeTLB controller allows to limit the Hug 86 The HugeTLB controller allows to limit the HugeTLB reservations per control 95 group and enforces the controller limit at res 87 group and enforces the controller limit at reservation time and at the fault of 96 HugeTLB memory for which no reservation exists 88 HugeTLB memory for which no reservation exists. Since reservation limits are 97 enforced at reservation time (on mmap or shget 89 enforced at reservation time (on mmap or shget), reservation limits never causes 98 the application to get SIGBUS signal if the me 90 the application to get SIGBUS signal if the memory was reserved before hand. For 99 MAP_NORESERVE allocations, the reservation lim 91 MAP_NORESERVE allocations, the reservation limit behaves the same as the fault 100 limit, enforcing memory usage at fault time an 92 limit, enforcing memory usage at fault time and causing the application to 101 receive a SIGBUS if it's crossing its limit. 93 receive a SIGBUS if it's crossing its limit. 102 94 103 Reservation limits are superior to page fault 95 Reservation limits are superior to page fault limits described above, since 104 reservation limits are enforced at reservation 96 reservation limits are enforced at reservation time (on mmap or shget), and 105 never causes the application to get SIGBUS sig 97 never causes the application to get SIGBUS signal if the memory was reserved 106 before hand. This allows for easier fallback t 98 before hand. This allows for easier fallback to alternatives such as 107 non-HugeTLB memory for example. In the case of 99 non-HugeTLB memory for example. In the case of page fault accounting, it's very 108 hard to avoid processes getting SIGBUS since t 100 hard to avoid processes getting SIGBUS since the sysadmin needs precisely know 109 the HugeTLB usage of all the tasks in the syst 101 the HugeTLB usage of all the tasks in the system and make sure there is enough 110 pages to satisfy all requests. Avoiding tasks 102 pages to satisfy all requests. Avoiding tasks getting SIGBUS on overcommited 111 systems is practically impossible with page fa 103 systems is practically impossible with page fault accounting. 112 104 113 105 114 3. Caveats with shared memory 106 3. Caveats with shared memory 115 107 116 For shared HugeTLB memory, both HugeTLB reserv 108 For shared HugeTLB memory, both HugeTLB reservation and page faults are charged 117 to the first task that causes the memory to be 109 to the first task that causes the memory to be reserved or faulted, and all 118 subsequent uses of this reserved or faulted me 110 subsequent uses of this reserved or faulted memory is done without charging. 119 111 120 Shared HugeTLB memory is only uncharged when i 112 Shared HugeTLB memory is only uncharged when it is unreserved or deallocated. 121 This is usually when the HugeTLB file is delet 113 This is usually when the HugeTLB file is deleted, and not when the task that 122 caused the reservation or fault has exited. 114 caused the reservation or fault has exited. 123 115 124 116 125 4. Caveats with HugeTLB cgroup offline. 117 4. Caveats with HugeTLB cgroup offline. 126 118 127 When a HugeTLB cgroup goes offline with some r 119 When a HugeTLB cgroup goes offline with some reservations or faults still 128 charged to it, the behavior is as follows: 120 charged to it, the behavior is as follows: 129 121 130 - The fault charges are charged to the parent 122 - The fault charges are charged to the parent HugeTLB cgroup (reparented), 131 - the reservation charges remain on the offlin 123 - the reservation charges remain on the offline HugeTLB cgroup. 132 124 133 This means that if a HugeTLB cgroup gets offli 125 This means that if a HugeTLB cgroup gets offlined while there is still HugeTLB 134 reservations charged to it, that cgroup persis 126 reservations charged to it, that cgroup persists as a zombie until all HugeTLB 135 reservations are uncharged. HugeTLB reservatio 127 reservations are uncharged. HugeTLB reservations behave in this manner to match 136 the memory controller whose cgroups also persi 128 the memory controller whose cgroups also persist as zombie until all charged 137 memory is uncharged. Also, the tracking of Hug 129 memory is uncharged. Also, the tracking of HugeTLB reservations is a bit more 138 complex compared to the tracking of HugeTLB fa 130 complex compared to the tracking of HugeTLB faults, so it is significantly 139 harder to reparent reservations at offline tim 131 harder to reparent reservations at offline time.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.