~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/scheduler/schedutil.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/scheduler/schedutil.rst (Version linux-6.12-rc7) and /Documentation/scheduler/schedutil.rst (Version policy-sample)


  1 =========                                         
  2 Schedutil                                         
  3 =========                                         
  4                                                   
  5 .. note::                                         
  6                                                   
  7    All this assumes a linear relation between     
  8    we know this is flawed, but it is the best     
  9                                                   
 10                                                   
 11 PELT (Per Entity Load Tracking)                   
 12 ===============================                   
 13                                                   
 14 With PELT we track some metrics across the var    
 15 individual tasks to task-group slices to CPU r    
 16 we use an Exponentially Weighted Moving Averag    
 17 is decayed such that y^32 = 0.5. That is, the     
 18 half, while the rest of history contribute the    
 19                                                   
 20 Specifically:                                     
 21                                                   
 22   ewma_sum(u) := u_0 + u_1*y + u_2*y^2 + ...      
 23                                                   
 24   ewma(u) = ewma_sum(u) / ewma_sum(1)             
 25                                                   
 26 Since this is essentially a progression of an     
 27 results are composable, that is ewma(A) + ewma    
 28 is key, since it gives the ability to recompos    
 29 around.                                           
 30                                                   
 31 Note that blocked tasks still contribute to th    
 32 and CPU runqueues), which reflects their expec    
 33 resume running.                                   
 34                                                   
 35 Using this we track 2 key metrics: 'running' a    
 36 reflects the time an entity spends on the CPU,    
 37 time an entity spends on the runqueue. When th    
 38 two metrics are the same, but once there is co    
 39 will decrease to reflect the fraction of time     
 40 while 'runnable' will increase to reflect the     
 41                                                   
 42 For more detail see: kernel/sched/pelt.c          
 43                                                   
 44                                                   
 45 Frequency / CPU Invariance                        
 46 ==========================                        
 47                                                   
 48 Because consuming the CPU for 50% at 1GHz is n    
 49 for 50% at 2GHz, nor is running 50% on a LITTL    
 50 a big CPU, we allow architectures to scale the    
 51 Dynamic Voltage and Frequency Scaling (DVFS) r    
 52                                                   
 53 For simple DVFS architectures (where software     
 54 compute the ratio as::                            
 55                                                   
 56             f_cur                                 
 57   r_dvfs := -----                                 
 58             f_max                                 
 59                                                   
 60 For more dynamic systems where the hardware is    
 61 hardware counters (Intel APERF/MPERF, ARMv8.4-    
 62 For Intel specifically, we use::                  
 63                                                   
 64            APERF                                  
 65   f_cur := ----- * P0                             
 66            MPERF                                  
 67                                                   
 68              4C-turbo;  if available and turbo    
 69   f_max := { 1C-turbo;  if turbo enabled          
 70              P0;        otherwise                 
 71                                                   
 72                     f_cur                         
 73   r_dvfs := min( 1, ----- )                       
 74                     f_max                         
 75                                                   
 76 We pick 4C turbo over 1C turbo to make it slig    
 77                                                   
 78 r_cpu is determined as the ratio of highest pe    
 79 CPU vs the highest performance level of any ot    
 80                                                   
 81   r_tot = r_dvfs * r_cpu                          
 82                                                   
 83 The result is that the above 'running' and 'ru    
 84 of DVFS and CPU type. IOW. we can transfer and    
 85                                                   
 86 For more detail see:                              
 87                                                   
 88  - kernel/sched/pelt.h:update_rq_clock_pelt()     
 89  - arch/x86/kernel/smpboot.c:"APERF/MPERF freq    
 90  - Documentation/scheduler/sched-capacity.rst:    
 91                                                   
 92                                                   
 93 UTIL_EST                                          
 94 ========                                          
 95                                                   
 96 Because periodic tasks have their averages dec    
 97 though when running their expected utilization    
 98 (DVFS) ramp-up after they are running again.      
 99                                                   
100 To alleviate this (a default enabled option) U    
101 Impulse Response (IIR) EWMA with the 'running'    
102 highest. UTIL_EST filters to instantly increas    
103                                                   
104 A further runqueue wide sum (of runnable tasks    
105                                                   
106   util_est := \Sum_t max( t_running, t_util_es    
107                                                   
108 For more detail see: kernel/sched/fair.c:util_    
109                                                   
110                                                   
111 UCLAMP                                            
112 ======                                            
113                                                   
114 It is possible to set effective u_min and u_ma    
115 the runqueue keeps an max aggregate of these c    
116                                                   
117 For more detail see: include/uapi/linux/sched/    
118                                                   
119                                                   
120 Schedutil / DVFS                                  
121 ================                                  
122                                                   
123 Every time the scheduler load tracking is upda    
124 migration, time progression) we call out to sc    
125 DVFS state.                                       
126                                                   
127 The basis is the CPU runqueue's 'running' metr    
128 the frequency invariant utilization estimate o    
129 a desired frequency like::                        
130                                                   
131              max( running, util_est );  if UTI    
132   u_cfs := { running;                   otherw    
133                                                   
134                clamp( u_cfs + u_rt , u_min, u_    
135   u_clamp := { u_cfs + u_rt;                      
136                                                   
137   u := u_clamp + u_irq + u_dl;          [appro    
138                                                   
139   f_des := min( f_max, 1.25 u * f_max )           
140                                                   
141 XXX IO-wait: when the update is due to a task     
142 boost 'u' above.                                  
143                                                   
144 This frequency is then used to select a P-stat    
145 CPPC style request to the hardware.               
146                                                   
147 XXX: deadline tasks (Sporadic Task Model) allo    
148 required to satisfy the workload.                 
149                                                   
150 Because these callbacks are directly from the     
151 interaction should be 'fast' and non-blocking.    
152 rate-limiting DVFS requests for when hardware     
153 expensive, this reduces effectiveness.            
154                                                   
155 For more information see: kernel/sched/cpufreq    
156                                                   
157                                                   
158 NOTES                                             
159 =====                                             
160                                                   
161  - On low-load scenarios, where DVFS is most r    
162    will closely reflect utilization.              
163                                                   
164  - In saturated scenarios task movement will c    
165    suppose we have a CPU saturated with 4 task    
166    to an idle CPU, the old CPU will have a 'ru    
167    new CPU will gain 0.25. This is inevitable     
168    correct this. XXX do we still guarantee f_m    
169                                                   
170  - Much of the above is about avoiding DVFS di    
171    having to re-learn / ramp-up when load shif    
172                                                   
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php