~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/timers/no_hz.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/timers/no_hz.rst (Version linux-6.11.5) and /Documentation/timers/no_hz.rst (Version linux-5.2.21)


  1 ======================================            
  2 NO_HZ: Reducing Scheduling-Clock Ticks            
  3 ======================================            
  4                                                   
  5                                                   
  6 This document describes Kconfig options and bo    
  7 reduce the number of scheduling-clock interrup    
  8 efficiency and reducing OS jitter.  Reducing O    
  9 some types of computationally intensive high-p    
 10 applications and for real-time applications.      
 11                                                   
 12 There are three main ways of managing scheduli    
 13 (also known as "scheduling-clock ticks" or sim    
 14                                                   
 15 1.      Never omit scheduling-clock ticks (CON    
 16         CONFIG_NO_HZ=n for older kernels).  Yo    
 17         want to choose this option.               
 18                                                   
 19 2.      Omit scheduling-clock ticks on idle CP    
 20         CONFIG_NO_HZ=y for older kernels).  Th    
 21         approach, and should be the default.      
 22                                                   
 23 3.      Omit scheduling-clock ticks on CPUs th    
 24         have only one runnable task (CONFIG_NO    
 25         are running realtime applications or c    
 26         workloads, you will normally -not- wan    
 27                                                   
 28 These three cases are described in the followi    
 29 by a third section on RCU-specific considerati    
 30 discussing testing, and a fifth and final sect    
 31                                                   
 32                                                   
 33 Never Omit Scheduling-Clock Ticks                 
 34 =================================                 
 35                                                   
 36 Very old versions of Linux from the 1990s and     
 37 are incapable of omitting scheduling-clock tic    
 38 there are some situations where this old-schoo    
 39 right approach, for example, in heavy workload    
 40 that use short bursts of CPU, where there are     
 41 periods, but where these idle periods are also    
 42 hundreds of microseconds).  For these types of    
 43 clock interrupts will normally be delivered an    
 44 will frequently be multiple runnable tasks per    
 45 attempting to turn off the scheduling clock in    
 46 other than increasing the overhead of switchin    
 47 transitioning between user and kernel executio    
 48                                                   
 49 This mode of operation can be selected using C    
 50 CONFIG_NO_HZ=n for older kernels).                
 51                                                   
 52 However, if you are instead running a light wo    
 53 periods, failing to omit scheduling-clock inte    
 54 excessive power consumption.  This is especial    
 55 devices, where it results in extremely short b    
 56 are running light workloads, you should theref    
 57 section.                                          
 58                                                   
 59 In addition, if you are running either a real-    
 60 workload with short iterations, the scheduling    
 61 degrade your applications performance.  If thi    
 62 you should read the following two sections.       
 63                                                   
 64                                                   
 65 Omit Scheduling-Clock Ticks For Idle CPUs         
 66 =========================================         
 67                                                   
 68 If a CPU is idle, there is little point in sen    
 69 interrupt.  After all, the primary purpose of     
 70 is to force a busy CPU to shift its attention     
 71 and an idle CPU has no duties to shift its att    
 72                                                   
 73 An idle CPU that is not receiving scheduling-c    
 74 be "dyntick-idle", "in dyntick-idle mode", "in    
 75 tickless".  The remainder of this document wil    
 76                                                   
 77 The CONFIG_NO_HZ_IDLE=y Kconfig option causes     
 78 scheduling-clock interrupts to idle CPUs, whic    
 79 both to battery-powered devices and to highly     
 80 A battery-powered device running a CONFIG_HZ_P    
 81 drain its battery very quickly, easily 2-3 tim    
 82 same device running a CONFIG_NO_HZ_IDLE=y kern    
 83 1,500 OS instances might find that half of its    
 84 unnecessary scheduling-clock interrupts.  In t    
 85 is strong motivation to avoid sending scheduli    
 86 idle CPUs.  That said, dyntick-idle mode is no    
 87                                                   
 88 1.      It increases the number of instruction    
 89         to and from the idle loop.                
 90                                                   
 91 2.      On many architectures, dyntick-idle mo    
 92         number of expensive clock-reprogrammin    
 93                                                   
 94 Therefore, systems with aggressive real-time r    
 95 run CONFIG_HZ_PERIODIC=y kernels (or CONFIG_NO    
 96 in order to avoid degrading from-idle transiti    
 97                                                   
 98 There is also a boot parameter "nohz=" that ca    
 99 dyntick-idle mode in CONFIG_NO_HZ_IDLE=y kerne    
100 By default, CONFIG_NO_HZ_IDLE=y kernels boot w    
101 dyntick-idle mode.                                
102                                                   
103                                                   
104 Omit Scheduling-Clock Ticks For CPUs With Only    
105 ==============================================    
106                                                   
107 If a CPU has only one runnable task, there is     
108 a scheduling-clock interrupt because there is     
109 Note that omitting scheduling-clock ticks for     
110 task implies also omitting them for idle CPUs.    
111                                                   
112 The CONFIG_NO_HZ_FULL=y Kconfig option causes     
113 sending scheduling-clock interrupts to CPUs wi    
114 and such CPUs are said to be "adaptive-ticks C    
115 for applications with aggressive real-time res    
116 it allows them to improve their worst-case res    
117 duration of a scheduling-clock interrupt.  It     
118 computationally intensive short-iteration work    
119 delayed during a given iteration, all the othe    
120 wait idle while the delayed CPU finishes.  Thu    
121 by one less than the number of CPUs.  In these    
122 again strong motivation to avoid sending sched    
123                                                   
124 By default, no CPU will be an adaptive-ticks C    
125 boot parameter specifies the adaptive-ticks CP    
126 "nohz_full=1,6-8" says that CPUs 1, 6, 7, and     
127 CPUs.  Note that you are prohibited from marki    
128 adaptive-tick CPUs:  At least one non-adaptive    
129 online to handle timekeeping tasks in order to    
130 calls like gettimeofday() returns accurate val    
131 (This is not an issue for CONFIG_NO_HZ_IDLE=y     
132 user processes to observe slight drifts in clo    
133 means that your system must have at least two     
134 CONFIG_NO_HZ_FULL=y to do anything for you.       
135                                                   
136 Finally, adaptive-ticks CPUs must have their R    
137 This is covered in the "RCU IMPLICATIONS" sect    
138                                                   
139 Normally, a CPU remains in adaptive-ticks mode    
140 In particular, transitioning to kernel mode do    
141 the mode.  Instead, the CPU will exit adaptive    
142 for example, if that CPU enqueues an RCU callb    
143                                                   
144 Just as with dyntick-idle mode, the benefits o    
145 not come for free:                                
146                                                   
147 1.      CONFIG_NO_HZ_FULL selects CONFIG_NO_HZ    
148         adaptive ticks without also running dy    
149         extends down into the implementation,     
150         of CONFIG_NO_HZ_IDLE are also incurred    
151                                                   
152 2.      The user/kernel transitions are slight    
153         to the need to inform kernel subsystem    
154         the change in mode.                       
155                                                   
156 3.      POSIX CPU timers prevent CPUs from ent    
157         Real-time applications needing to take    
158         consumption need to use other means of    
159                                                   
160 4.      If there are more perf events pending     
161         accommodate, they are normally round-r    
162         all of them over time.  Adaptive-tick     
163         round-robining from happening.  This w    
164         preventing CPUs with large numbers of     
165         entering adaptive-tick mode.              
166                                                   
167 5.      Scheduler statistics for adaptive-tick    
168         slightly differently than those for no    
169         This might in turn perturb load-balanc    
170                                                   
171 Although improvements are expected over time,     
172 useful for many types of real-time and compute    
173 However, the drawbacks listed above mean that     
174 (yet) be enabled by default.                      
175                                                   
176                                                   
177 RCU Implications                                  
178 ================                                  
179                                                   
180 There are situations in which idle CPUs cannot    
181 enter either dyntick-idle mode or adaptive-tic    
182 common being when that CPU has RCU callbacks p    
183                                                   
184 Avoid this by offloading RCU callback processi    
185 using the CONFIG_RCU_NOCB_CPU=y Kconfig option    
186 offload may be selected using The "rcu_nocbs="    
187 which takes a comma-separated list of CPUs and    
188 "1,3-5" selects CPUs 1, 3, 4, and 5.  Note tha    
189 the "nohz_full" kernel boot parameter are also    
190                                                   
191 The offloaded CPUs will never queue RCU callba    
192 never prevents offloaded CPUs from entering ei    
193 or adaptive-tick mode.  That said, note that i    
194 pin the "rcuo" kthreads to specific CPUs if de    
195 scheduler will decide where to run them, which    
196 where you want them to run.                       
197                                                   
198                                                   
199 Testing                                           
200 =======                                           
201                                                   
202 So you enable all the OS-jitter features descr    
203 but do not see any change in your workload's b    
204 your workload isn't affected that much by OS j    
205 something else is in the way?  This section he    
206 by providing a simple OS-jitter test suite, wh    
207 master of the following git archive:              
208                                                   
209 git://git.kernel.org/pub/scm/linux/kernel/git/    
210                                                   
211 Clone this archive and follow the instructions    
212 This test procedure will produce a trace that     
213 whether or not you have succeeded in removing     
214 If this trace shows that you have removed OS j    
215 possible, then you can conclude that your work    
216 sensitive to OS jitter.                           
217                                                   
218 Note: this test requires that your system have    
219 We do not currently have a good way to remove     
220 systems.                                          
221                                                   
222                                                   
223 Known Issues                                      
224 ============                                      
225                                                   
226 *       Dyntick-idle slows transitions to and     
227         In practice, this has not been a probl    
228         aggressive real-time workloads, which     
229         dyntick-idle mode, an option that most    
230         some workloads will no doubt want to u    
231         eliminate scheduling-clock interrupt l    
232         options for these workloads:              
233                                                   
234         a.      Use PMQOS from userspace to in    
235                 latency requirements (preferre    
236                                                   
237         b.      On x86 systems, use the "idle=    
238                                                   
239         c.      On x86 systems, use the "intel    
240         `       the maximum C-state depth.        
241                                                   
242         d.      On x86 systems, use the "idle=    
243                 However, please note that use     
244                 your CPU to overheat, which ma    
245                 to degrade your latencies -- a    
246                 be even worse than that of dyn    
247                 this parameter effectively dis    
248                 CPUs, which can significantly     
249                                                   
250 *       Adaptive-ticks slows user/kernel trans    
251         This is not expected to be a problem f    
252         workloads, which have few such transit    
253         will be required to determine whether     
254         are significantly affected by this eff    
255                                                   
256 *       Adaptive-ticks does not do anything un    
257         runnable task for a given CPU, even th    
258         of other situations where the scheduli    
259         needed.  To give but one example, cons    
260         runnable high-priority SCHED_FIFO task    
261         of low-priority SCHED_OTHER tasks.  In    
262         required to run the SCHED_FIFO task un    
263         some other higher-priority task awaken    
264         this CPU, so there is no point in send    
265         interrupt to this CPU.  However, the c    
266         nevertheless sends scheduling-clock in    
267         single runnable SCHED_FIFO task and mu    
268         tasks, even though these interrupts ar    
269                                                   
270         And even when there are multiple runna    
271         there is little point in interrupting     
272         running task's timeslice expires, whic    
273         longer than the time of the next sched    
274                                                   
275         Better handling of these sorts of situ    
276                                                   
277 *       A reboot is required to reconfigure bo    
278         callback offloading.  Runtime reconfig    
279         if needed, however, due to the complex    
280         runtime, there would need to be an ear    
281         Especially given that you have the str    
282         simply offloading RCU callbacks from a    
283         where you want them whenever you want     
284                                                   
285 *       Additional configuration is required t    
286         of OS jitter, including interrupts and    
287         and processes.  This configuration nor    
288         interrupts and tasks to particular CPU    
289                                                   
290 *       Some sources of OS jitter can currentl    
291         constraining the workload.  For exampl    
292         OS jitter due to global TLB shootdowns    
293         operations (such as kernel module unlo    
294         result in these shootdowns.  For anoth    
295         and TLB misses can be reduced (and in     
296         using huge pages and by constraining t    
297         by the application.  Pre-faulting the     
298         helpful, especially when combined with    
299         system calls.                             
300                                                   
301 *       Unless all CPUs are idle, at least one    
302         scheduling-clock interrupt going in or    
303         timekeeping.                              
304                                                   
305 *       If there might potentially be some ada    
306         will be at least one CPU keeping the s    
307         going, even if all CPUs are otherwise     
308                                                   
309         Better handling of this situation is o    
310                                                   
311 *       Some process-handling operations still    
312         scheduling-clock tick.  These operatio    
313         load, maintaining sched average, compu    
314         computing avenrun, and carrying out lo    
315         currently accommodated by scheduling-c    
316         or so.  On-going work will eliminate t    
317         infrequent scheduling-clock ticks.        
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php