~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/scheduler/sched-energy.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/scheduler/sched-energy.rst (Version linux-6.12-rc7) and /Documentation/scheduler/sched-energy.rst (Version policy-sample)


  1 =======================                           
  2 Energy Aware Scheduling                           
  3 =======================                           
  4                                                   
  5 1. Introduction                                   
  6 ---------------                                   
  7                                                   
  8 Energy Aware Scheduling (or EAS) gives the sch    
  9 the impact of its decisions on the energy cons    
 10 Energy Model (EM) of the CPUs to select an ene    
 11 with a minimal impact on throughput. This docu    
 12 introduction on how EAS works, what are the ma    
 13 details what is needed to get it to run.          
 14                                                   
 15 Before going any further, please note that at     
 16                                                   
 17    /!\ EAS does not support platforms with sym    
 18                                                   
 19 EAS operates only on heterogeneous CPU topolog    
 20 because this is where the potential for saving    
 21 the highest.                                      
 22                                                   
 23 The actual EM used by EAS is _not_ maintained     
 24 dedicated framework. For details about this fr    
 25 please refer to its documentation (see Documen    
 26                                                   
 27                                                   
 28 2. Background and Terminology                     
 29 -----------------------------                     
 30                                                   
 31 To make it clear from the start:                  
 32  - energy = [joule] (resource like a battery o    
 33  - power = energy/time = [joule/second] = [wat    
 34                                                   
 35 The goal of EAS is to minimize energy, while s    
 36 is, we want to maximize::                         
 37                                                   
 38         performance [inst/s]                      
 39         --------------------                      
 40             power [W]                             
 41                                                   
 42 which is equivalent to minimizing::               
 43                                                   
 44         energy [J]                                
 45         -----------                               
 46         instruction                               
 47                                                   
 48 while still getting 'good' performance. It is     
 49 optimization objective to the current performa    
 50 scheduler. This alternative considers two obje    
 51 performance.                                      
 52                                                   
 53 The idea behind introducing an EM is to allow     
 54 implications of its decisions rather than blin    
 55 techniques that may have positive effects only    
 56 time, the EM must be as simple as possible to     
 57 impact.                                           
 58                                                   
 59 In short, EAS changes the way CFS tasks are as    
 60 for the scheduler to decide where a task shoul    
 61 is used to break the tie between several good     
 62 that is predicted to yield the best energy con    
 63 system's throughput. The predictions made by E    
 64 knowledge about the platform's topology, which    
 65 and their respective energy costs.                
 66                                                   
 67                                                   
 68 3. Topology information                           
 69 -----------------------                           
 70                                                   
 71 EAS (as well as the rest of the scheduler) use    
 72 differentiate CPUs with different computing th    
 73 represents the amount of work it can absorb wh    
 74 frequency compared to the most capable CPU of     
 75 normalized in a 1024 range, and are comparable    
 76 tasks and CPUs computed by the Per-Entity Load    
 77 to capacity and utilization values, EAS is abl    
 78 task/CPU is, and to take this into considerati    
 79 energy trade-offs. The capacity of CPUs is pro    
 80 through the arch_scale_cpu_capacity() callback    
 81                                                   
 82 The rest of platform knowledge used by EAS is     
 83 Model (EM) framework. The EM of a platform is     
 84 per 'performance domain' in the system (see Do    
 85 for further details about performance domains)    
 86                                                   
 87 The scheduler manages references to the EM obj    
 88 scheduling domains are built, or re-built. For    
 89 scheduler maintains a singly linked list of al    
 90 the current rd->span. Each node in the list co    
 91 em_perf_domain as provided by the EM framework    
 92                                                   
 93 The lists are attached to the root domains in     
 94 cpuset configurations. Since the boundaries of    
 95 necessarily match those of performance domains    
 96 domains can contain duplicate elements.           
 97                                                   
 98 Example 1.                                        
 99     Let us consider a platform with 12 CPUs, s    
100     (pd0, pd4 and pd8), organized as follows::    
101                                                   
102                   CPUs:   0 1 2 3 4 5 6 7 8 9     
103                   PDs:   |--pd0--|--pd4--|---p    
104                   RDs:   |----rd1----|-----rd2    
105                                                   
106     Now, consider that userspace decided to sp    
107     exclusive cpusets, hence creating two inde    
108     containing 6 CPUs. The two root domains ar    
109     above figure. Since pd4 intersects with bo    
110     present in the linked list '->pd' attached    
111                                                   
112        * rd1->pd: pd0 -> pd4                      
113        * rd2->pd: pd4 -> pd8                      
114                                                   
115     Please note that the scheduler will create    
116     pd4 (one for each list). However, both jus    
117     shared data structure of the EM framework.    
118                                                   
119 Since the access to these lists can happen con    
120 things, they are protected by RCU, like the re    
121 manipulated by the scheduler.                     
122                                                   
123 EAS also maintains a static key (sched_energy_    
124 least one root domain meets all conditions for    
125 are summarized in Section 6.                      
126                                                   
127                                                   
128 4. Energy-Aware task placement                    
129 ------------------------------                    
130                                                   
131 EAS overrides the CFS task wake-up balancing c    
132 platform and the PELT signals to choose an ene    
133 wake-up balance. When EAS is enabled, select_t    
134 find_energy_efficient_cpu() to do the placemen    
135 for the CPU with the highest spare capacity (C    
136 each performance domain since it is the one wh    
137 frequency the lowest. Then, the function check    
138 save energy compared to leaving it on prev_cpu    
139 in its previous activation.                       
140                                                   
141 find_energy_efficient_cpu() uses compute_energ    
142 energy consumed by the system if the waking ta    
143 looks at the current utilization landscape of     
144 'simulate' the task migration. The EM framewor    
145 which computes the expected energy consumption    
146 the given utilization landscape.                  
147                                                   
148 An example of energy-optimized task placement     
149                                                   
150 Example 2.                                        
151     Let us consider a (fake) platform with 2 i    
152     composed of two CPUs each. CPU0 and CPU1 a    
153     are big.                                      
154                                                   
155     The scheduler must decide where to place a    
156     and prev_cpu = 0.                             
157                                                   
158     The current utilization landscape of the C    
159     below. CPUs 0-3 have a util_avg of 400, 10    
160     Each performance domain has three Operatin    
161     The CPU capacity and power cost associated    
162     the Energy Model table. The util_avg of P     
163     below as 'PP'::                               
164                                                   
165      CPU util.                                    
166       1024                 - - - - - - -          
167                                                   
168                                                   
169        768                 =============          
170                                                   
171                                                   
172        512  ===========    - ##- - - - -          
173                              ##     ##            
174        341  -PP - - - -      ##     ##            
175              PP              ##     ##            
176        170  -## - - - -      ##     ##            
177              ##     ##       ##     ##            
178            ------------    -------------          
179             CPU0   CPU1     CPU2   CPU3           
180                                                   
181       Current OPP: =====       Other OPP: - -     
182                                                   
183                                                   
184     find_energy_efficient_cpu() will first loo    
185     maximum spare capacity in the two performa    
186     CPU1 and CPU3. Then it will estimate the e    
187     placed on either of them, and check if tha    
188     compared to leaving P on CPU0. EAS assumes    
189     (which is coherent with the behaviour of t    
190     governor, see Section 6. for more details     
191                                                   
192     **Case 1. P is migrated to CPU1**::           
193                                                   
194       1024                 - - - - - - -          
195                                                   
196                                             En    
197        768                 =============     *    
198                                              *    
199                                              *    
200        512  - - - - - -    - ##- - - - -     *    
201                              ##     ##            
202        341  ===========      ##     ##            
203                     PP       ##     ##            
204        170  -## - - PP-      ##     ##            
205              ##     ##       ##     ##            
206            ------------    -------------          
207             CPU0   CPU1     CPU2   CPU3           
208                                                   
209                                                   
210     **Case 2. P is migrated to CPU3**::           
211                                                   
212       1024                 - - - - - - -          
213                                                   
214                                             En    
215        768                 =============     *    
216                                              *    
217                                     PP       *    
218        512  - - - - - -    - ##- - -PP -     *    
219                              ##     ##            
220        341  ===========      ##     ##            
221                              ##     ##            
222        170  -## - - - -      ##     ##            
223              ##     ##       ##     ##            
224            ------------    -------------          
225             CPU0   CPU1     CPU2   CPU3           
226                                                   
227                                                   
228     **Case 3. P stays on prev_cpu / CPU 0**::     
229                                                   
230       1024                 - - - - - - -          
231                                                   
232                                             En    
233        768                 =============     *    
234                                              *    
235                                              *    
236        512  ===========    - ##- - - - -     *    
237                              ##     ##            
238        341  -PP - - - -      ##     ##            
239              PP              ##     ##            
240        170  -## - - - -      ##     ##            
241              ##     ##       ##     ##            
242            ------------    -------------          
243             CPU0   CPU1     CPU2   CPU3           
244                                                   
245                                                   
246     From these calculations, the Case 1 has th    
247     is be the best candidate from an energy-ef    
248                                                   
249 Big CPUs are generally more power hungry than     
250 mainly when a task doesn't fit the littles. Ho    
251 necessarily more energy-efficient than big CPU    
252 of the little CPUs can be less energy-efficien    
253 bigs, for example. So, if the little CPUs happ    
254 a specific point in time, a small task waking     
255 of executing on the big side in order to save     
256 on the little side.                               
257                                                   
258 And even in the case where all OPPs of the big    
259 than those of the little, using the big CPUs f    
260 specific conditions, save energy. Indeed, plac    
261 result in raising the OPP of the entire perfor    
262 increase the cost of the tasks already running    
263 placed on a big CPU, its own execution cost mi    
264 running on a little, but it won't impact the o    
265 which will keep running at a lower OPP. So, wh    
266 consumed by CPUs, the extra cost of running th    
267 smaller than the cost of raising the OPP on th    
268 tasks.                                            
269                                                   
270 The examples above would be nearly impossible     
271 for all platforms, without knowing the cost of    
272 CPUs of the system. Thanks to its EM-based des    
273 correctly without too many troubles. However,     
274 impact on throughput for high-utilization scen    
275 mechanism called 'over-utilization'.              
276                                                   
277                                                   
278 5. Over-utilization                               
279 -------------------                               
280                                                   
281 From a general standpoint, the use-cases where    
282 involving a light/medium CPU utilization. When    
283 being run, they will require all of the availa    
284 much that can be done by the scheduler to save    
285 throughput. In order to avoid hurting performa    
286 'over-utilized' as soon as they are used at mo    
287 capacity. As long as no CPUs are over-utilized    
288 is disabled and EAS overridess the wake-up bal    
289 the most energy efficient CPUs of the system m    
290 done without harming throughput. So, the load-    
291 it from breaking the energy-efficient task pla    
292 do so when the system isn't overutilized since    
293 implies that:                                     
294                                                   
295     a. there is some idle time on all CPUs, so    
296        EAS are likely to accurately represent     
297        in the system;                             
298     b. all tasks should already be provided wi    
299        regardless of their nice values;           
300     c. since there is spare capacity all tasks    
301        regularly and balancing at wake-up is s    
302                                                   
303 As soon as one CPU goes above the 80% tipping     
304 assumptions above becomes incorrect. In this s    
305 is raised for the entire root domain, EAS is d    
306 re-enabled. By doing so, the scheduler falls b    
307 wake-up and load balance under CPU-bound condi    
308 respect of the nice values of tasks.              
309                                                   
310 Since the notion of overutilization largely re    
311 there is some idle time in the system, the CPU    
312 (than CFS) scheduling classes (as well as IRQ)    
313 such, the detection of overutilization account    
314 by CFS tasks, but also by the other scheduling    
315                                                   
316                                                   
317 6. Dependencies and requirements for EAS          
318 ----------------------------------------          
319                                                   
320 Energy Aware Scheduling depends on the CPUs of    
321 hardware properties and on other features of t    
322 section lists these dependencies and provides     
323                                                   
324                                                   
325 6.1 - Asymmetric CPU topology                     
326 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                     
327                                                   
328                                                   
329 As mentioned in the introduction, EAS is only     
330 asymmetric CPU topologies for now. This requir    
331 looking for the presence of the SD_ASYM_CPUCAP    
332 domains are built.                                
333                                                   
334 See Documentation/scheduler/sched-capacity.rst    
335 flag to be set in the sched_domain hierarchy.     
336                                                   
337 Please note that EAS is not fundamentally inco    
338 significant savings on SMP platforms have been    
339 could be amended in the future if proven other    
340                                                   
341                                                   
342 6.2 - Energy Model presence                       
343 ^^^^^^^^^^^^^^^^^^^^^^^^^^^                       
344                                                   
345 EAS uses the EM of a platform to estimate the     
346 energy. So, your platform must provide power c    
347 order to make EAS start. To do so, please refe    
348 independent EM framework in Documentation/powe    
349                                                   
350 Please also note that the scheduling domains n    
351 EM has been registered in order to start EAS.     
352                                                   
353 EAS uses the EM to make a forecasting decision    
354 more focused on the difference when checking p    
355 placement. For EAS it doesn't matter whether t    
356 in milli-Watts or in an 'abstract scale'.         
357                                                   
358                                                   
359 6.3 - Energy Model complexity                     
360 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                     
361                                                   
362 EAS does not impose any complexity limit on th    
363 restricts the number of CPUs to EM_MAX_NUM_CPU    
364 the energy estimation.                            
365                                                   
366                                                   
367 6.4 - Schedutil governor                          
368 ^^^^^^^^^^^^^^^^^^^^^^^^                          
369                                                   
370 EAS tries to predict at which OPP will the CPU    
371 in order to estimate their energy consumption.    
372 of CPUs follow their utilization.                 
373                                                   
374 Although it is very difficult to provide hard     
375 of this assumption in practice (because the ha    
376 told to do, for example), schedutil as opposed    
377 least _requests_ frequencies calculated using     
378 Consequently, the only sane governor to use to    
379 because it is the only one providing some degr    
380 frequency requests and energy predictions.        
381                                                   
382 Using EAS with any other governor than schedut    
383                                                   
384                                                   
385 6.5 Scale-invariant utilization signals           
386 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^           
387                                                   
388 In order to make accurate prediction across CP    
389 states, EAS needs frequency-invariant and CPU-    
390 be obtained using the architecture-defined arc    
391 callbacks.                                        
392                                                   
393 Using EAS on a platform that doesn't implement    
394 supported.                                        
395                                                   
396                                                   
397 6.6 Multithreading (SMT)                          
398 ^^^^^^^^^^^^^^^^^^^^^^^^                          
399                                                   
400 EAS in its current form is SMT unaware and is     
401 multithreaded hardware to save energy. EAS con    
402 CPUs, which can actually be counter-productive    
403                                                   
404 EAS on SMT is not supported.                      
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php