~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/scheduler/sched-capacity.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/scheduler/sched-capacity.rst (Version linux-6.12-rc7) and /Documentation/scheduler/sched-capacity.rst (Version linux-4.10.17)


  1 =========================                         
  2 Capacity Aware Scheduling                         
  3 =========================                         
  4                                                   
  5 1. CPU Capacity                                   
  6 ===============                                   
  7                                                   
  8 1.1 Introduction                                  
  9 ----------------                                  
 10                                                   
 11 Conventional, homogeneous SMP platforms are co    
 12 CPUs. Heterogeneous platforms on the other han    
 13 different performance characteristics - on suc    
 14 considered equal.                                 
 15                                                   
 16 CPU capacity is a measure of the performance a    
 17 the most performant CPU in the system. Heterog    
 18 asymmetric CPU capacity systems, as they conta    
 19                                                   
 20 Disparity in maximum attainable performance (I    
 21 from two factors:                                 
 22                                                   
 23 - not all CPUs may have the same microarchitec    
 24 - with Dynamic Voltage and Frequency Scaling (    
 25   physically able to attain the higher Operati    
 26                                                   
 27 Arm big.LITTLE systems are an example of both.    
 28 performance-oriented than the LITTLE ones (mor    
 29 smarter predictors, etc), and can usually reac    
 30 can.                                              
 31                                                   
 32 CPU performance is usually expressed in Millio    
 33 (MIPS), which can also be expressed as a given    
 34 per Hz, leading to::                              
 35                                                   
 36   capacity(cpu) = work_per_hz(cpu) * max_freq(    
 37                                                   
 38 1.2 Scheduler terms                               
 39 -------------------                               
 40                                                   
 41 Two different capacity values are used within     
 42 ``original capacity`` is its maximum attainabl    
 43 attainable performance level. This original ca    
 44 the function arch_scale_cpu_capacity(). A CPU'    
 45 capacity`` to which some loss of available per    
 46 handling IRQs) is subtracted.                     
 47                                                   
 48 Note that a CPU's ``capacity`` is solely inten    
 49 while ``original capacity`` is class-agnostic.    
 50 the term ``capacity`` interchangeably with ``o    
 51 brevity.                                          
 52                                                   
 53 1.3 Platform examples                             
 54 ---------------------                             
 55                                                   
 56 1.3.1 Identical OPPs                              
 57 ~~~~~~~~~~~~~~~~~~~~                              
 58                                                   
 59 Consider an hypothetical dual-core asymmetric     
 60                                                   
 61 - work_per_hz(CPU0) = W                           
 62 - work_per_hz(CPU1) = W/2                         
 63 - all CPUs are running at the same fixed frequ    
 64                                                   
 65 By the above definition of capacity:              
 66                                                   
 67 - capacity(CPU0) = C                              
 68 - capacity(CPU1) = C/2                            
 69                                                   
 70 To draw the parallel with Arm big.LITTLE, CPU0    
 71 be a LITTLE.                                      
 72                                                   
 73 With a workload that periodically does a fixed    
 74 execution trace like so::                         
 75                                                   
 76  CPU0 work ^                                      
 77            |     ____                ____         
 78            |    |    |              |    |        
 79            +----+----+----+----+----+----+----    
 80                                                   
 81  CPU1 work ^                                      
 82            |     _________           _________    
 83            |    |         |         |             
 84            +----+----+----+----+----+----+----    
 85                                                   
 86 CPU0 has the highest capacity in the system (C    
 87 work W in T units of time. On the other hand,     
 88 CPU0, and thus only completes W/2 in T.           
 89                                                   
 90 1.3.2 Different max OPPs                          
 91 ~~~~~~~~~~~~~~~~~~~~~~~~                          
 92                                                   
 93 Usually, CPUs of different capacity values als    
 94 OPPs. Consider the same CPUs as above (i.e. sa    
 95                                                   
 96 - max_freq(CPU0) = F                              
 97 - max_freq(CPU1) = 2/3 * F                        
 98                                                   
 99 This yields:                                      
100                                                   
101 - capacity(CPU0) = C                              
102 - capacity(CPU1) = C/3                            
103                                                   
104 Executing the same workload as described in 1.    
105 maximum frequency results in::                    
106                                                   
107  CPU0 work ^                                      
108            |     ____                ____         
109            |    |    |              |    |        
110            +----+----+----+----+----+----+----    
111                                                   
112                             workload on CPU1      
113  CPU1 work ^                                      
114            |     ______________      _________    
115            |    |              |    |             
116            +----+----+----+----+----+----+----    
117                                                   
118 1.4 Representation caveat                         
119 -------------------------                         
120                                                   
121 It should be noted that having a *single* valu    
122 performance is somewhat of a contentious point    
123 difference between two different µarchs could    
124 floating point operations, Z% on branches, and    
125 simple approach have been satisfactory for now    
126                                                   
127 2. Task utilization                               
128 ===================                               
129                                                   
130 2.1 Introduction                                  
131 ----------------                                  
132                                                   
133 Capacity aware scheduling requires an expressi    
134 regards to CPU capacity. Each scheduler class     
135 while task utilization is specific to CFS, it     
136 in order to introduce more generic concepts.      
137                                                   
138 Task utilization is a percentage meant to repr    
139 of a task. A simple approximation of it is the    
140                                                   
141   task_util(p) = duty_cycle(p)                    
142                                                   
143 On an SMP system with fixed frequencies, 100%     
144 busy loop. Conversely, 10% utilization hints i    
145 spends more time sleeping than executing. Vari    
146 asymmetric CPU capacities complexify this some    
147 expand on these.                                  
148                                                   
149 2.2 Frequency invariance                          
150 ------------------------                          
151                                                   
152 One issue that needs to be taken into account     
153 directly impacted by the current OPP the CPU i    
154 periodic workload at a given frequency F::        
155                                                   
156   CPU work ^                                      
157            |     ____                ____         
158            |    |    |              |    |        
159            +----+----+----+----+----+----+----    
160                                                   
161 This yields duty_cycle(p) == 25%.                 
162                                                   
163 Now, consider running the *same* workload at f    
164                                                   
165   CPU work ^                                      
166            |     _________           _________    
167            |    |         |         |             
168            +----+----+----+----+----+----+----    
169                                                   
170 This yields duty_cycle(p) == 50%, despite the     
171 behaviour (i.e. executing the same amount of w    
172                                                   
173 The task utilization signal can be made freque    
174 formula::                                         
175                                                   
176   task_util_freq_inv(p) = duty_cycle(p) * (cur    
177                                                   
178 Applying this formula to the two examples abov    
179 task utilization of 25%.                          
180                                                   
181 2.3 CPU invariance                                
182 ------------------                                
183                                                   
184 CPU capacity has a similar effect on task util    
185 identical workload on CPUs of different capaci    
186 duty cycles.                                      
187                                                   
188 Consider the system described in 1.3.2., i.e.:    
189                                                   
190 - capacity(CPU0) = C                              
191 - capacity(CPU1) = C/3                            
192                                                   
193 Executing a given periodic workload on each CP    
194 result in::                                       
195                                                   
196  CPU0 work ^                                      
197            |     ____                ____         
198            |    |    |              |    |        
199            +----+----+----+----+----+----+----    
200                                                   
201  CPU1 work ^                                      
202            |     ______________      _________    
203            |    |              |    |             
204            +----+----+----+----+----+----+----    
205                                                   
206 IOW,                                              
207                                                   
208 - duty_cycle(p) == 25% if p runs on CPU0 at it    
209 - duty_cycle(p) == 75% if p runs on CPU1 at it    
210                                                   
211 The task utilization signal can be made CPU in    
212 formula::                                         
213                                                   
214   task_util_cpu_inv(p) = duty_cycle(p) * (capa    
215                                                   
216 with ``max_capacity`` being the highest CPU ca    
217 system. Applying this formula to the above exa    
218 invariant task utilization of 25%.                
219                                                   
220 2.4 Invariant task utilization                    
221 ------------------------------                    
222                                                   
223 Both frequency and CPU invariance need to be a    
224 order to obtain a truly invariant signal. The     
225 utilization that is both CPU and frequency inv    
226 task p::                                          
227                                                   
228                                      curr_freq    
229   task_util_inv(p) = duty_cycle(p) * ---------    
230                                      max_frequ    
231                                                   
232 In other words, invariant task utilization des    
233 if it were running on the highest-capacity CPU    
234 maximum frequency.                                
235                                                   
236 Any mention of task utilization in the followi    
237 invariant form.                                   
238                                                   
239 2.5 Utilization estimation                        
240 --------------------------                        
241                                                   
242 Without a crystal ball, task behaviour (and th    
243 accurately be predicted the moment a task firs    
244 maintains a handful of CPU and task signals ba    
245 Tracking (PELT) mechanism, one of those yieldi    
246 opposed to instantaneous).                        
247                                                   
248 This means that while the capacity aware sched    
249 considering a "true" task utilization (using a    
250 will only ever be able to use an estimator the    
251                                                   
252 3. Capacity aware scheduling requirements         
253 =========================================         
254                                                   
255 3.1 CPU capacity                                  
256 ----------------                                  
257                                                   
258 Linux cannot currently figure out CPU capacity    
259 needs to be handed to it. Architectures must d    
260 for that purpose.                                 
261                                                   
262 The arm, arm64, and RISC-V architectures direc    
263 CPU scaling data, which is derived from the ca    
264 Documentation/devicetree/bindings/cpu/cpu-capa    
265                                                   
266 3.2 Frequency invariance                          
267 ------------------------                          
268                                                   
269 As stated in 2.2, capacity-aware scheduling re    
270 utilization. Architectures must define arch_sc    
271 purpose.                                          
272                                                   
273 Implementing this function requires figuring o    
274 have been running at. One way to implement thi    
275 whose increment rate scale with a CPU's curren    
276 AMU on arm64). Another is to directly hook int    
277 when the kernel is aware of the switched-to fr    
278 arm/arm64).                                       
279                                                   
280 4. Scheduler topology                             
281 =====================                             
282                                                   
283 During the construction of the sched domains,     
284 whether the system exhibits asymmetric CPU cap    
285 case:                                             
286                                                   
287 - The sched_asym_cpucapacity static key will b    
288 - The SD_ASYM_CPUCAPACITY_FULL flag will be se    
289   level that spans all unique CPU capacity val    
290 - The SD_ASYM_CPUCAPACITY flag will be set for    
291   CPUs with any range of asymmetry.               
292                                                   
293 The sched_asym_cpucapacity static key is inten    
294 cater to asymmetric CPU capacity systems. Do n    
295 *system-wide*. Imagine the following setup usi    
296                                                   
297   capacity    C/2          C                      
298             ________    ________                  
299            /        \  /        \                 
300   CPUs     0  1  2  3  4  5  6  7                 
301            \__/  \______________/                 
302   cpusets   cs0         cs1                       
303                                                   
304 Which could be created via:                       
305                                                   
306 .. code-block:: sh                                
307                                                   
308   mkdir /sys/fs/cgroup/cpuset/cs0                 
309   echo 0-1 > /sys/fs/cgroup/cpuset/cs0/cpuset.    
310   echo 0 > /sys/fs/cgroup/cpuset/cs0/cpuset.me    
311                                                   
312   mkdir /sys/fs/cgroup/cpuset/cs1                 
313   echo 2-7 > /sys/fs/cgroup/cpuset/cs1/cpuset.    
314   echo 0 > /sys/fs/cgroup/cpuset/cs1/cpuset.me    
315                                                   
316   echo 0 > /sys/fs/cgroup/cpuset/cpuset.sched_    
317                                                   
318 Since there *is* CPU capacity asymmetry in the    
319 sched_asym_cpucapacity static key will be enab    
320 hierarchy of CPUs 0-1 spans a single capacity     
321 set in that hierarchy, it describes an SMP isl    
322                                                   
323 Therefore, the 'canonical' pattern for protect    
324 asymmetric CPU capacities is to:                  
325                                                   
326 - Check the sched_asym_cpucapacity static key     
327 - If it is enabled, then also check for the pr    
328   the sched_domain hierarchy (if relevant, i.e    
329   CPU or group thereof)                           
330                                                   
331 5. Capacity aware scheduling implementation       
332 ===========================================       
333                                                   
334 5.1 CFS                                           
335 -------                                           
336                                                   
337 5.1.1 Capacity fitness                            
338 ~~~~~~~~~~~~~~~~~~~~~~                            
339                                                   
340 The main capacity scheduling criterion of CFS     
341                                                   
342   task_util(p) < capacity(task_cpu(p))            
343                                                   
344 This is commonly called the capacity fitness c    
345 task "fits" on its CPU. If it is violated, the    
346 work than what its CPU can provide: it will be    
347                                                   
348 Furthermore, uclamp lets userspace specify a m    
349 value for a task, either via sched_setattr() o    
350 Documentation/admin-guide/cgroup-v2.rst). As i    
351 clamp task_util() in the previous criterion.      
352                                                   
353 5.1.2 Wakeup CPU selection                        
354 ~~~~~~~~~~~~~~~~~~~~~~~~~~                        
355                                                   
356 CFS task wakeup CPU selection follows the capa    
357 above. On top of that, uclamp is used to clamp    
358 which lets userspace have more leverage over t    
359 tasks. IOW, CFS wakeup CPU selection searches     
360                                                   
361   clamp(task_util(p), task_uclamp_min(p), task    
362                                                   
363 By using uclamp, userspace can e.g. allow a bu    
364 on any CPU by giving it a low uclamp.max value    
365 periodic task (e.g. 10% utilization) to run on    
366 giving it a high uclamp.min value.                
367                                                   
368 .. note::                                         
369                                                   
370   Wakeup CPU selection in CFS can be eclipsed     
371   (EAS), which is described in Documentation/s    
372                                                   
373 5.1.3 Load balancing                              
374 ~~~~~~~~~~~~~~~~~~~~                              
375                                                   
376 A pathological case in the wakeup CPU selectio    
377 sleeps, if at all - it thus rarely wakes up, i    
378                                                   
379   w == wakeup event                               
380                                                   
381   capacity(CPU0) = C                              
382   capacity(CPU1) = C / 3                          
383                                                   
384                            workload on CPU0       
385   CPU work ^                                      
386            |     _________           _________    
387            |    |         |         |             
388            +----+----+----+----+----+----+----    
389                 w                   w             
390                                                   
391                            workload on CPU1       
392   CPU work ^                                      
393            |     _____________________________    
394            |    |                                 
395            +----+----+----+----+----+----+----    
396                 w                                 
397                                                   
398 This workload should run on CPU0, but if the t    
399                                                   
400 - was improperly scheduled from the start (ina    
401   utilization estimation)                         
402 - was properly scheduled from the start, but s    
403   processing power                                
404                                                   
405 then it might become CPU-bound, IOW ``task_uti    
406 the CPU capacity scheduling criterion is viola    
407 wakeup event to fix this up via wakeup CPU sel    
408                                                   
409 Tasks that are in this situation are dubbed "m    
410 put in place to handle this shares the same na    
411 leverages the CFS load balancer, more specific    
412 (which caters to migrating currently running t    
413 a misfit active load balance will be triggered    
414 to a CPU with more capacity than its current o    
415                                                   
416 5.2 RT                                            
417 ------                                            
418                                                   
419 5.2.1 Wakeup CPU selection                        
420 ~~~~~~~~~~~~~~~~~~~~~~~~~~                        
421                                                   
422 RT task wakeup CPU selection searches for a CP    
423                                                   
424   task_uclamp_min(p) <= capacity(task_cpu(cpu)    
425                                                   
426 while still following the usual priority const    
427 CPUs can satisfy this capacity criterion, then    
428 is followed and CPU capacities are ignored.       
429                                                   
430 5.3 DL                                            
431 ------                                            
432                                                   
433 5.3.1 Wakeup CPU selection                        
434 ~~~~~~~~~~~~~~~~~~~~~~~~~~                        
435                                                   
436 DL task wakeup CPU selection searches for a CP    
437                                                   
438   task_bandwidth(p) < capacity(task_cpu(p))       
439                                                   
440 while still respecting the usual bandwidth and    
441 none of the candidate CPUs can satisfy this ca    
442 task will remain on its current CPU.              
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php