1 ======================= 2 Energy Aware Scheduling 3 ======================= 4 5 1. Introduction 6 --------------- 7 8 Energy Aware Scheduling (or EAS) gives the sch 9 the impact of its decisions on the energy cons 10 Energy Model (EM) of the CPUs to select an ene 11 with a minimal impact on throughput. This docu 12 introduction on how EAS works, what are the ma 13 details what is needed to get it to run. 14 15 Before going any further, please note that at 16 17 /!\ EAS does not support platforms with sym 18 19 EAS operates only on heterogeneous CPU topolog 20 because this is where the potential for saving 21 the highest. 22 23 The actual EM used by EAS is _not_ maintained 24 dedicated framework. For details about this fr 25 please refer to its documentation (see Documen 26 27 28 2. Background and Terminology 29 ----------------------------- 30 31 To make it clear from the start: 32 - energy = [joule] (resource like a battery o 33 - power = energy/time = [joule/second] = [wat 34 35 The goal of EAS is to minimize energy, while s 36 is, we want to maximize:: 37 38 performance [inst/s] 39 -------------------- 40 power [W] 41 42 which is equivalent to minimizing:: 43 44 energy [J] 45 ----------- 46 instruction 47 48 while still getting 'good' performance. It is 49 optimization objective to the current performa 50 scheduler. This alternative considers two obje 51 performance. 52 53 The idea behind introducing an EM is to allow 54 implications of its decisions rather than blin 55 techniques that may have positive effects only 56 time, the EM must be as simple as possible to 57 impact. 58 59 In short, EAS changes the way CFS tasks are as 60 for the scheduler to decide where a task shoul 61 is used to break the tie between several good 62 that is predicted to yield the best energy con 63 system's throughput. The predictions made by E 64 knowledge about the platform's topology, which 65 and their respective energy costs. 66 67 68 3. Topology information 69 ----------------------- 70 71 EAS (as well as the rest of the scheduler) use 72 differentiate CPUs with different computing th 73 represents the amount of work it can absorb wh 74 frequency compared to the most capable CPU of 75 normalized in a 1024 range, and are comparable 76 tasks and CPUs computed by the Per-Entity Load 77 to capacity and utilization values, EAS is abl 78 task/CPU is, and to take this into considerati 79 energy trade-offs. The capacity of CPUs is pro 80 through the arch_scale_cpu_capacity() callback 81 82 The rest of platform knowledge used by EAS is 83 Model (EM) framework. The EM of a platform is 84 per 'performance domain' in the system (see Do 85 for further details about performance domains) 86 87 The scheduler manages references to the EM obj 88 scheduling domains are built, or re-built. For 89 scheduler maintains a singly linked list of al 90 the current rd->span. Each node in the list co 91 em_perf_domain as provided by the EM framework 92 93 The lists are attached to the root domains in 94 cpuset configurations. Since the boundaries of 95 necessarily match those of performance domains 96 domains can contain duplicate elements. 97 98 Example 1. 99 Let us consider a platform with 12 CPUs, s 100 (pd0, pd4 and pd8), organized as follows:: 101 102 CPUs: 0 1 2 3 4 5 6 7 8 9 103 PDs: |--pd0--|--pd4--|---p 104 RDs: |----rd1----|-----rd2 105 106 Now, consider that userspace decided to sp 107 exclusive cpusets, hence creating two inde 108 containing 6 CPUs. The two root domains ar 109 above figure. Since pd4 intersects with bo 110 present in the linked list '->pd' attached 111 112 * rd1->pd: pd0 -> pd4 113 * rd2->pd: pd4 -> pd8 114 115 Please note that the scheduler will create 116 pd4 (one for each list). However, both jus 117 shared data structure of the EM framework. 118 119 Since the access to these lists can happen con 120 things, they are protected by RCU, like the re 121 manipulated by the scheduler. 122 123 EAS also maintains a static key (sched_energy_ 124 least one root domain meets all conditions for 125 are summarized in Section 6. 126 127 128 4. Energy-Aware task placement 129 ------------------------------ 130 131 EAS overrides the CFS task wake-up balancing c 132 platform and the PELT signals to choose an ene 133 wake-up balance. When EAS is enabled, select_t 134 find_energy_efficient_cpu() to do the placemen 135 for the CPU with the highest spare capacity (C 136 each performance domain since it is the one wh 137 frequency the lowest. Then, the function check 138 save energy compared to leaving it on prev_cpu 139 in its previous activation. 140 141 find_energy_efficient_cpu() uses compute_energ 142 energy consumed by the system if the waking ta 143 looks at the current utilization landscape of 144 'simulate' the task migration. The EM framewor 145 which computes the expected energy consumption 146 the given utilization landscape. 147 148 An example of energy-optimized task placement 149 150 Example 2. 151 Let us consider a (fake) platform with 2 i 152 composed of two CPUs each. CPU0 and CPU1 a 153 are big. 154 155 The scheduler must decide where to place a 156 and prev_cpu = 0. 157 158 The current utilization landscape of the C 159 below. CPUs 0-3 have a util_avg of 400, 10 160 Each performance domain has three Operatin 161 The CPU capacity and power cost associated 162 the Energy Model table. The util_avg of P 163 below as 'PP':: 164 165 CPU util. 166 1024 - - - - - - - 167 168 169 768 ============= 170 171 172 512 =========== - ##- - - - - 173 ## ## 174 341 -PP - - - - ## ## 175 PP ## ## 176 170 -## - - - - ## ## 177 ## ## ## ## 178 ------------ ------------- 179 CPU0 CPU1 CPU2 CPU3 180 181 Current OPP: ===== Other OPP: - - 182 183 184 find_energy_efficient_cpu() will first loo 185 maximum spare capacity in the two performa 186 CPU1 and CPU3. Then it will estimate the e 187 placed on either of them, and check if tha 188 compared to leaving P on CPU0. EAS assumes 189 (which is coherent with the behaviour of t 190 governor, see Section 6. for more details 191 192 **Case 1. P is migrated to CPU1**:: 193 194 1024 - - - - - - - 195 196 En 197 768 ============= * 198 * 199 * 200 512 - - - - - - - ##- - - - - * 201 ## ## 202 341 =========== ## ## 203 PP ## ## 204 170 -## - - PP- ## ## 205 ## ## ## ## 206 ------------ ------------- 207 CPU0 CPU1 CPU2 CPU3 208 209 210 **Case 2. P is migrated to CPU3**:: 211 212 1024 - - - - - - - 213 214 En 215 768 ============= * 216 * 217 PP * 218 512 - - - - - - - ##- - -PP - * 219 ## ## 220 341 =========== ## ## 221 ## ## 222 170 -## - - - - ## ## 223 ## ## ## ## 224 ------------ ------------- 225 CPU0 CPU1 CPU2 CPU3 226 227 228 **Case 3. P stays on prev_cpu / CPU 0**:: 229 230 1024 - - - - - - - 231 232 En 233 768 ============= * 234 * 235 * 236 512 =========== - ##- - - - - * 237 ## ## 238 341 -PP - - - - ## ## 239 PP ## ## 240 170 -## - - - - ## ## 241 ## ## ## ## 242 ------------ ------------- 243 CPU0 CPU1 CPU2 CPU3 244 245 246 From these calculations, the Case 1 has th 247 is be the best candidate from an energy-ef 248 249 Big CPUs are generally more power hungry than 250 mainly when a task doesn't fit the littles. Ho 251 necessarily more energy-efficient than big CPU 252 of the little CPUs can be less energy-efficien 253 bigs, for example. So, if the little CPUs happ 254 a specific point in time, a small task waking 255 of executing on the big side in order to save 256 on the little side. 257 258 And even in the case where all OPPs of the big 259 than those of the little, using the big CPUs f 260 specific conditions, save energy. Indeed, plac 261 result in raising the OPP of the entire perfor 262 increase the cost of the tasks already running 263 placed on a big CPU, its own execution cost mi 264 running on a little, but it won't impact the o 265 which will keep running at a lower OPP. So, wh 266 consumed by CPUs, the extra cost of running th 267 smaller than the cost of raising the OPP on th 268 tasks. 269 270 The examples above would be nearly impossible 271 for all platforms, without knowing the cost of 272 CPUs of the system. Thanks to its EM-based des 273 correctly without too many troubles. However, 274 impact on throughput for high-utilization scen 275 mechanism called 'over-utilization'. 276 277 278 5. Over-utilization 279 ------------------- 280 281 From a general standpoint, the use-cases where 282 involving a light/medium CPU utilization. When 283 being run, they will require all of the availa 284 much that can be done by the scheduler to save 285 throughput. In order to avoid hurting performa 286 'over-utilized' as soon as they are used at mo 287 capacity. As long as no CPUs are over-utilized 288 is disabled and EAS overridess the wake-up bal 289 the most energy efficient CPUs of the system m 290 done without harming throughput. So, the load- 291 it from breaking the energy-efficient task pla 292 do so when the system isn't overutilized since 293 implies that: 294 295 a. there is some idle time on all CPUs, so 296 EAS are likely to accurately represent 297 in the system; 298 b. all tasks should already be provided wi 299 regardless of their nice values; 300 c. since there is spare capacity all tasks 301 regularly and balancing at wake-up is s 302 303 As soon as one CPU goes above the 80% tipping 304 assumptions above becomes incorrect. In this s 305 is raised for the entire root domain, EAS is d 306 re-enabled. By doing so, the scheduler falls b 307 wake-up and load balance under CPU-bound condi 308 respect of the nice values of tasks. 309 310 Since the notion of overutilization largely re 311 there is some idle time in the system, the CPU 312 (than CFS) scheduling classes (as well as IRQ) 313 such, the detection of overutilization account 314 by CFS tasks, but also by the other scheduling 315 316 317 6. Dependencies and requirements for EAS 318 ---------------------------------------- 319 320 Energy Aware Scheduling depends on the CPUs of 321 hardware properties and on other features of t 322 section lists these dependencies and provides 323 324 325 6.1 - Asymmetric CPU topology 326 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 327 328 329 As mentioned in the introduction, EAS is only 330 asymmetric CPU topologies for now. This requir 331 looking for the presence of the SD_ASYM_CPUCAP 332 domains are built. 333 334 See Documentation/scheduler/sched-capacity.rst 335 flag to be set in the sched_domain hierarchy. 336 337 Please note that EAS is not fundamentally inco 338 significant savings on SMP platforms have been 339 could be amended in the future if proven other 340 341 342 6.2 - Energy Model presence 343 ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 344 345 EAS uses the EM of a platform to estimate the 346 energy. So, your platform must provide power c 347 order to make EAS start. To do so, please refe 348 independent EM framework in Documentation/powe 349 350 Please also note that the scheduling domains n 351 EM has been registered in order to start EAS. 352 353 EAS uses the EM to make a forecasting decision 354 more focused on the difference when checking p 355 placement. For EAS it doesn't matter whether t 356 in milli-Watts or in an 'abstract scale'. 357 358 359 6.3 - Energy Model complexity 360 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 361 362 EAS does not impose any complexity limit on th 363 restricts the number of CPUs to EM_MAX_NUM_CPU 364 the energy estimation. 365 366 367 6.4 - Schedutil governor 368 ^^^^^^^^^^^^^^^^^^^^^^^^ 369 370 EAS tries to predict at which OPP will the CPU 371 in order to estimate their energy consumption. 372 of CPUs follow their utilization. 373 374 Although it is very difficult to provide hard 375 of this assumption in practice (because the ha 376 told to do, for example), schedutil as opposed 377 least _requests_ frequencies calculated using 378 Consequently, the only sane governor to use to 379 because it is the only one providing some degr 380 frequency requests and energy predictions. 381 382 Using EAS with any other governor than schedut 383 384 385 6.5 Scale-invariant utilization signals 386 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 387 388 In order to make accurate prediction across CP 389 states, EAS needs frequency-invariant and CPU- 390 be obtained using the architecture-defined arc 391 callbacks. 392 393 Using EAS on a platform that doesn't implement 394 supported. 395 396 397 6.6 Multithreading (SMT) 398 ^^^^^^^^^^^^^^^^^^^^^^^^ 399 400 EAS in its current form is SMT unaware and is 401 multithreaded hardware to save energy. EAS con 402 CPUs, which can actually be counter-productive 403 404 EAS on SMT is not supported.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.