1 ====================================== 2 NO_HZ: Reducing Scheduling-Clock Ticks 3 ====================================== 4 5 6 This document describes Kconfig options and bo 7 reduce the number of scheduling-clock interrup 8 efficiency and reducing OS jitter. Reducing O 9 some types of computationally intensive high-p 10 applications and for real-time applications. 11 12 There are three main ways of managing scheduli 13 (also known as "scheduling-clock ticks" or sim 14 15 1. Never omit scheduling-clock ticks (CON 16 CONFIG_NO_HZ=n for older kernels). Yo 17 want to choose this option. 18 19 2. Omit scheduling-clock ticks on idle CP 20 CONFIG_NO_HZ=y for older kernels). Th 21 approach, and should be the default. 22 23 3. Omit scheduling-clock ticks on CPUs th 24 have only one runnable task (CONFIG_NO 25 are running realtime applications or c 26 workloads, you will normally -not- wan 27 28 These three cases are described in the followi 29 by a third section on RCU-specific considerati 30 discussing testing, and a fifth and final sect 31 32 33 Never Omit Scheduling-Clock Ticks 34 ================================= 35 36 Very old versions of Linux from the 1990s and 37 are incapable of omitting scheduling-clock tic 38 there are some situations where this old-schoo 39 right approach, for example, in heavy workload 40 that use short bursts of CPU, where there are 41 periods, but where these idle periods are also 42 hundreds of microseconds). For these types of 43 clock interrupts will normally be delivered an 44 will frequently be multiple runnable tasks per 45 attempting to turn off the scheduling clock in 46 other than increasing the overhead of switchin 47 transitioning between user and kernel executio 48 49 This mode of operation can be selected using C 50 CONFIG_NO_HZ=n for older kernels). 51 52 However, if you are instead running a light wo 53 periods, failing to omit scheduling-clock inte 54 excessive power consumption. This is especial 55 devices, where it results in extremely short b 56 are running light workloads, you should theref 57 section. 58 59 In addition, if you are running either a real- 60 workload with short iterations, the scheduling 61 degrade your applications performance. If thi 62 you should read the following two sections. 63 64 65 Omit Scheduling-Clock Ticks For Idle CPUs 66 ========================================= 67 68 If a CPU is idle, there is little point in sen 69 interrupt. After all, the primary purpose of 70 is to force a busy CPU to shift its attention 71 and an idle CPU has no duties to shift its att 72 73 An idle CPU that is not receiving scheduling-c 74 be "dyntick-idle", "in dyntick-idle mode", "in 75 tickless". The remainder of this document wil 76 77 The CONFIG_NO_HZ_IDLE=y Kconfig option causes 78 scheduling-clock interrupts to idle CPUs, whic 79 both to battery-powered devices and to highly 80 A battery-powered device running a CONFIG_HZ_P 81 drain its battery very quickly, easily 2-3 tim 82 same device running a CONFIG_NO_HZ_IDLE=y kern 83 1,500 OS instances might find that half of its 84 unnecessary scheduling-clock interrupts. In t 85 is strong motivation to avoid sending scheduli 86 idle CPUs. That said, dyntick-idle mode is no 87 88 1. It increases the number of instruction 89 to and from the idle loop. 90 91 2. On many architectures, dyntick-idle mo 92 number of expensive clock-reprogrammin 93 94 Therefore, systems with aggressive real-time r 95 run CONFIG_HZ_PERIODIC=y kernels (or CONFIG_NO 96 in order to avoid degrading from-idle transiti 97 98 There is also a boot parameter "nohz=" that ca 99 dyntick-idle mode in CONFIG_NO_HZ_IDLE=y kerne 100 By default, CONFIG_NO_HZ_IDLE=y kernels boot w 101 dyntick-idle mode. 102 103 104 Omit Scheduling-Clock Ticks For CPUs With Only 105 ============================================== 106 107 If a CPU has only one runnable task, there is 108 a scheduling-clock interrupt because there is 109 Note that omitting scheduling-clock ticks for 110 task implies also omitting them for idle CPUs. 111 112 The CONFIG_NO_HZ_FULL=y Kconfig option causes 113 sending scheduling-clock interrupts to CPUs wi 114 and such CPUs are said to be "adaptive-ticks C 115 for applications with aggressive real-time res 116 it allows them to improve their worst-case res 117 duration of a scheduling-clock interrupt. It 118 computationally intensive short-iteration work 119 delayed during a given iteration, all the othe 120 wait idle while the delayed CPU finishes. Thu 121 by one less than the number of CPUs. In these 122 again strong motivation to avoid sending sched 123 124 By default, no CPU will be an adaptive-ticks C 125 boot parameter specifies the adaptive-ticks CP 126 "nohz_full=1,6-8" says that CPUs 1, 6, 7, and 127 CPUs. Note that you are prohibited from marki 128 adaptive-tick CPUs: At least one non-adaptive 129 online to handle timekeeping tasks in order to 130 calls like gettimeofday() returns accurate val 131 (This is not an issue for CONFIG_NO_HZ_IDLE=y 132 user processes to observe slight drifts in clo 133 means that your system must have at least two 134 CONFIG_NO_HZ_FULL=y to do anything for you. 135 136 Finally, adaptive-ticks CPUs must have their R 137 This is covered in the "RCU IMPLICATIONS" sect 138 139 Normally, a CPU remains in adaptive-ticks mode 140 In particular, transitioning to kernel mode do 141 the mode. Instead, the CPU will exit adaptive 142 for example, if that CPU enqueues an RCU callb 143 144 Just as with dyntick-idle mode, the benefits o 145 not come for free: 146 147 1. CONFIG_NO_HZ_FULL selects CONFIG_NO_HZ 148 adaptive ticks without also running dy 149 extends down into the implementation, 150 of CONFIG_NO_HZ_IDLE are also incurred 151 152 2. The user/kernel transitions are slight 153 to the need to inform kernel subsystem 154 the change in mode. 155 156 3. POSIX CPU timers prevent CPUs from ent 157 Real-time applications needing to take 158 consumption need to use other means of 159 160 4. If there are more perf events pending 161 accommodate, they are normally round-r 162 all of them over time. Adaptive-tick 163 round-robining from happening. This w 164 preventing CPUs with large numbers of 165 entering adaptive-tick mode. 166 167 5. Scheduler statistics for adaptive-tick 168 slightly differently than those for no 169 This might in turn perturb load-balanc 170 171 Although improvements are expected over time, 172 useful for many types of real-time and compute 173 However, the drawbacks listed above mean that 174 (yet) be enabled by default. 175 176 177 RCU Implications 178 ================ 179 180 There are situations in which idle CPUs cannot 181 enter either dyntick-idle mode or adaptive-tic 182 common being when that CPU has RCU callbacks p 183 184 Avoid this by offloading RCU callback processi 185 using the CONFIG_RCU_NOCB_CPU=y Kconfig option 186 offload may be selected using The "rcu_nocbs=" 187 which takes a comma-separated list of CPUs and 188 "1,3-5" selects CPUs 1, 3, 4, and 5. Note tha 189 the "nohz_full" kernel boot parameter are also 190 191 The offloaded CPUs will never queue RCU callba 192 never prevents offloaded CPUs from entering ei 193 or adaptive-tick mode. That said, note that i 194 pin the "rcuo" kthreads to specific CPUs if de 195 scheduler will decide where to run them, which 196 where you want them to run. 197 198 199 Testing 200 ======= 201 202 So you enable all the OS-jitter features descr 203 but do not see any change in your workload's b 204 your workload isn't affected that much by OS j 205 something else is in the way? This section he 206 by providing a simple OS-jitter test suite, wh 207 master of the following git archive: 208 209 git://git.kernel.org/pub/scm/linux/kernel/git/ 210 211 Clone this archive and follow the instructions 212 This test procedure will produce a trace that 213 whether or not you have succeeded in removing 214 If this trace shows that you have removed OS j 215 possible, then you can conclude that your work 216 sensitive to OS jitter. 217 218 Note: this test requires that your system have 219 We do not currently have a good way to remove 220 systems. 221 222 223 Known Issues 224 ============ 225 226 * Dyntick-idle slows transitions to and 227 In practice, this has not been a probl 228 aggressive real-time workloads, which 229 dyntick-idle mode, an option that most 230 some workloads will no doubt want to u 231 eliminate scheduling-clock interrupt l 232 options for these workloads: 233 234 a. Use PMQOS from userspace to in 235 latency requirements (preferre 236 237 b. On x86 systems, use the "idle= 238 239 c. On x86 systems, use the "intel 240 ` the maximum C-state depth. 241 242 d. On x86 systems, use the "idle= 243 However, please note that use 244 your CPU to overheat, which ma 245 to degrade your latencies -- a 246 be even worse than that of dyn 247 this parameter effectively dis 248 CPUs, which can significantly 249 250 * Adaptive-ticks slows user/kernel trans 251 This is not expected to be a problem f 252 workloads, which have few such transit 253 will be required to determine whether 254 are significantly affected by this eff 255 256 * Adaptive-ticks does not do anything un 257 runnable task for a given CPU, even th 258 of other situations where the scheduli 259 needed. To give but one example, cons 260 runnable high-priority SCHED_FIFO task 261 of low-priority SCHED_OTHER tasks. In 262 required to run the SCHED_FIFO task un 263 some other higher-priority task awaken 264 this CPU, so there is no point in send 265 interrupt to this CPU. However, the c 266 nevertheless sends scheduling-clock in 267 single runnable SCHED_FIFO task and mu 268 tasks, even though these interrupts ar 269 270 And even when there are multiple runna 271 there is little point in interrupting 272 running task's timeslice expires, whic 273 longer than the time of the next sched 274 275 Better handling of these sorts of situ 276 277 * A reboot is required to reconfigure bo 278 callback offloading. Runtime reconfig 279 if needed, however, due to the complex 280 runtime, there would need to be an ear 281 Especially given that you have the str 282 simply offloading RCU callbacks from a 283 where you want them whenever you want 284 285 * Additional configuration is required t 286 of OS jitter, including interrupts and 287 and processes. This configuration nor 288 interrupts and tasks to particular CPU 289 290 * Some sources of OS jitter can currentl 291 constraining the workload. For exampl 292 OS jitter due to global TLB shootdowns 293 operations (such as kernel module unlo 294 result in these shootdowns. For anoth 295 and TLB misses can be reduced (and in 296 using huge pages and by constraining t 297 by the application. Pre-faulting the 298 helpful, especially when combined with 299 system calls. 300 301 * Unless all CPUs are idle, at least one 302 scheduling-clock interrupt going in or 303 timekeeping. 304 305 * If there might potentially be some ada 306 will be at least one CPU keeping the s 307 going, even if all CPUs are otherwise 308 309 Better handling of this situation is o 310 311 * Some process-handling operations still 312 scheduling-clock tick. These operatio 313 load, maintaining sched average, compu 314 computing avenrun, and carrying out lo 315 currently accommodated by scheduling-c 316 or so. On-going work will eliminate t 317 infrequent scheduling-clock ticks.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.