1 ======================= 1 ======================= 2 Energy Aware Scheduling 2 Energy Aware Scheduling 3 ======================= 3 ======================= 4 4 5 1. Introduction 5 1. Introduction 6 --------------- 6 --------------- 7 7 8 Energy Aware Scheduling (or EAS) gives the sch 8 Energy Aware Scheduling (or EAS) gives the scheduler the ability to predict 9 the impact of its decisions on the energy cons 9 the impact of its decisions on the energy consumed by CPUs. EAS relies on an 10 Energy Model (EM) of the CPUs to select an ene 10 Energy Model (EM) of the CPUs to select an energy efficient CPU for each task, 11 with a minimal impact on throughput. This docu 11 with a minimal impact on throughput. This document aims at providing an 12 introduction on how EAS works, what are the ma 12 introduction on how EAS works, what are the main design decisions behind it, and 13 details what is needed to get it to run. 13 details what is needed to get it to run. 14 14 15 Before going any further, please note that at 15 Before going any further, please note that at the time of writing:: 16 16 17 /!\ EAS does not support platforms with sym 17 /!\ EAS does not support platforms with symmetric CPU topologies /!\ 18 18 19 EAS operates only on heterogeneous CPU topolog 19 EAS operates only on heterogeneous CPU topologies (such as Arm big.LITTLE) 20 because this is where the potential for saving 20 because this is where the potential for saving energy through scheduling is 21 the highest. 21 the highest. 22 22 23 The actual EM used by EAS is _not_ maintained 23 The actual EM used by EAS is _not_ maintained by the scheduler, but by a 24 dedicated framework. For details about this fr 24 dedicated framework. For details about this framework and what it provides, 25 please refer to its documentation (see Documen 25 please refer to its documentation (see Documentation/power/energy-model.rst). 26 26 27 27 28 2. Background and Terminology 28 2. Background and Terminology 29 ----------------------------- 29 ----------------------------- 30 30 31 To make it clear from the start: 31 To make it clear from the start: 32 - energy = [joule] (resource like a battery o 32 - energy = [joule] (resource like a battery on powered devices) 33 - power = energy/time = [joule/second] = [wat 33 - power = energy/time = [joule/second] = [watt] 34 34 35 The goal of EAS is to minimize energy, while s 35 The goal of EAS is to minimize energy, while still getting the job done. That 36 is, we want to maximize:: 36 is, we want to maximize:: 37 37 38 performance [inst/s] 38 performance [inst/s] 39 -------------------- 39 -------------------- 40 power [W] 40 power [W] 41 41 42 which is equivalent to minimizing:: 42 which is equivalent to minimizing:: 43 43 44 energy [J] 44 energy [J] 45 ----------- 45 ----------- 46 instruction 46 instruction 47 47 48 while still getting 'good' performance. It is 48 while still getting 'good' performance. It is essentially an alternative 49 optimization objective to the current performa 49 optimization objective to the current performance-only objective for the 50 scheduler. This alternative considers two obje 50 scheduler. This alternative considers two objectives: energy-efficiency and 51 performance. 51 performance. 52 52 53 The idea behind introducing an EM is to allow 53 The idea behind introducing an EM is to allow the scheduler to evaluate the 54 implications of its decisions rather than blin 54 implications of its decisions rather than blindly applying energy-saving 55 techniques that may have positive effects only 55 techniques that may have positive effects only on some platforms. At the same 56 time, the EM must be as simple as possible to 56 time, the EM must be as simple as possible to minimize the scheduler latency 57 impact. 57 impact. 58 58 59 In short, EAS changes the way CFS tasks are as 59 In short, EAS changes the way CFS tasks are assigned to CPUs. When it is time 60 for the scheduler to decide where a task shoul 60 for the scheduler to decide where a task should run (during wake-up), the EM 61 is used to break the tie between several good 61 is used to break the tie between several good CPU candidates and pick the one 62 that is predicted to yield the best energy con 62 that is predicted to yield the best energy consumption without harming the 63 system's throughput. The predictions made by E 63 system's throughput. The predictions made by EAS rely on specific elements of 64 knowledge about the platform's topology, which 64 knowledge about the platform's topology, which include the 'capacity' of CPUs, 65 and their respective energy costs. 65 and their respective energy costs. 66 66 67 67 68 3. Topology information 68 3. Topology information 69 ----------------------- 69 ----------------------- 70 70 71 EAS (as well as the rest of the scheduler) use 71 EAS (as well as the rest of the scheduler) uses the notion of 'capacity' to 72 differentiate CPUs with different computing th 72 differentiate CPUs with different computing throughput. The 'capacity' of a CPU 73 represents the amount of work it can absorb wh 73 represents the amount of work it can absorb when running at its highest 74 frequency compared to the most capable CPU of 74 frequency compared to the most capable CPU of the system. Capacity values are 75 normalized in a 1024 range, and are comparable 75 normalized in a 1024 range, and are comparable with the utilization signals of 76 tasks and CPUs computed by the Per-Entity Load 76 tasks and CPUs computed by the Per-Entity Load Tracking (PELT) mechanism. Thanks 77 to capacity and utilization values, EAS is abl 77 to capacity and utilization values, EAS is able to estimate how big/busy a 78 task/CPU is, and to take this into considerati 78 task/CPU is, and to take this into consideration when evaluating performance vs 79 energy trade-offs. The capacity of CPUs is pro 79 energy trade-offs. The capacity of CPUs is provided via arch-specific code 80 through the arch_scale_cpu_capacity() callback 80 through the arch_scale_cpu_capacity() callback. 81 81 82 The rest of platform knowledge used by EAS is 82 The rest of platform knowledge used by EAS is directly read from the Energy 83 Model (EM) framework. The EM of a platform is 83 Model (EM) framework. The EM of a platform is composed of a power cost table 84 per 'performance domain' in the system (see Do 84 per 'performance domain' in the system (see Documentation/power/energy-model.rst 85 for further details about performance domains) !! 85 for futher details about performance domains). 86 86 87 The scheduler manages references to the EM obj 87 The scheduler manages references to the EM objects in the topology code when the 88 scheduling domains are built, or re-built. For 88 scheduling domains are built, or re-built. For each root domain (rd), the 89 scheduler maintains a singly linked list of al 89 scheduler maintains a singly linked list of all performance domains intersecting 90 the current rd->span. Each node in the list co 90 the current rd->span. Each node in the list contains a pointer to a struct 91 em_perf_domain as provided by the EM framework 91 em_perf_domain as provided by the EM framework. 92 92 93 The lists are attached to the root domains in 93 The lists are attached to the root domains in order to cope with exclusive 94 cpuset configurations. Since the boundaries of 94 cpuset configurations. Since the boundaries of exclusive cpusets do not 95 necessarily match those of performance domains 95 necessarily match those of performance domains, the lists of different root 96 domains can contain duplicate elements. 96 domains can contain duplicate elements. 97 97 98 Example 1. 98 Example 1. 99 Let us consider a platform with 12 CPUs, s 99 Let us consider a platform with 12 CPUs, split in 3 performance domains 100 (pd0, pd4 and pd8), organized as follows:: 100 (pd0, pd4 and pd8), organized as follows:: 101 101 102 CPUs: 0 1 2 3 4 5 6 7 8 9 102 CPUs: 0 1 2 3 4 5 6 7 8 9 10 11 103 PDs: |--pd0--|--pd4--|---p 103 PDs: |--pd0--|--pd4--|---pd8---| 104 RDs: |----rd1----|-----rd2 104 RDs: |----rd1----|-----rd2-----| 105 105 106 Now, consider that userspace decided to sp 106 Now, consider that userspace decided to split the system with two 107 exclusive cpusets, hence creating two inde 107 exclusive cpusets, hence creating two independent root domains, each 108 containing 6 CPUs. The two root domains ar 108 containing 6 CPUs. The two root domains are denoted rd1 and rd2 in the 109 above figure. Since pd4 intersects with bo 109 above figure. Since pd4 intersects with both rd1 and rd2, it will be 110 present in the linked list '->pd' attached 110 present in the linked list '->pd' attached to each of them: 111 111 112 * rd1->pd: pd0 -> pd4 112 * rd1->pd: pd0 -> pd4 113 * rd2->pd: pd4 -> pd8 113 * rd2->pd: pd4 -> pd8 114 114 115 Please note that the scheduler will create 115 Please note that the scheduler will create two duplicate list nodes for 116 pd4 (one for each list). However, both jus 116 pd4 (one for each list). However, both just hold a pointer to the same 117 shared data structure of the EM framework. 117 shared data structure of the EM framework. 118 118 119 Since the access to these lists can happen con 119 Since the access to these lists can happen concurrently with hotplug and other 120 things, they are protected by RCU, like the re 120 things, they are protected by RCU, like the rest of topology structures 121 manipulated by the scheduler. 121 manipulated by the scheduler. 122 122 123 EAS also maintains a static key (sched_energy_ 123 EAS also maintains a static key (sched_energy_present) which is enabled when at 124 least one root domain meets all conditions for 124 least one root domain meets all conditions for EAS to start. Those conditions 125 are summarized in Section 6. 125 are summarized in Section 6. 126 126 127 127 128 4. Energy-Aware task placement 128 4. Energy-Aware task placement 129 ------------------------------ 129 ------------------------------ 130 130 131 EAS overrides the CFS task wake-up balancing c 131 EAS overrides the CFS task wake-up balancing code. It uses the EM of the 132 platform and the PELT signals to choose an ene 132 platform and the PELT signals to choose an energy-efficient target CPU during 133 wake-up balance. When EAS is enabled, select_t 133 wake-up balance. When EAS is enabled, select_task_rq_fair() calls 134 find_energy_efficient_cpu() to do the placemen 134 find_energy_efficient_cpu() to do the placement decision. This function looks 135 for the CPU with the highest spare capacity (C 135 for the CPU with the highest spare capacity (CPU capacity - CPU utilization) in 136 each performance domain since it is the one wh 136 each performance domain since it is the one which will allow us to keep the 137 frequency the lowest. Then, the function check 137 frequency the lowest. Then, the function checks if placing the task there could 138 save energy compared to leaving it on prev_cpu 138 save energy compared to leaving it on prev_cpu, i.e. the CPU where the task ran 139 in its previous activation. 139 in its previous activation. 140 140 141 find_energy_efficient_cpu() uses compute_energ 141 find_energy_efficient_cpu() uses compute_energy() to estimate what will be the 142 energy consumed by the system if the waking ta 142 energy consumed by the system if the waking task was migrated. compute_energy() 143 looks at the current utilization landscape of 143 looks at the current utilization landscape of the CPUs and adjusts it to 144 'simulate' the task migration. The EM framewor 144 'simulate' the task migration. The EM framework provides the em_pd_energy() API 145 which computes the expected energy consumption 145 which computes the expected energy consumption of each performance domain for 146 the given utilization landscape. 146 the given utilization landscape. 147 147 148 An example of energy-optimized task placement 148 An example of energy-optimized task placement decision is detailed below. 149 149 150 Example 2. 150 Example 2. 151 Let us consider a (fake) platform with 2 i 151 Let us consider a (fake) platform with 2 independent performance domains 152 composed of two CPUs each. CPU0 and CPU1 a 152 composed of two CPUs each. CPU0 and CPU1 are little CPUs; CPU2 and CPU3 153 are big. 153 are big. 154 154 155 The scheduler must decide where to place a 155 The scheduler must decide where to place a task P whose util_avg = 200 156 and prev_cpu = 0. 156 and prev_cpu = 0. 157 157 158 The current utilization landscape of the C 158 The current utilization landscape of the CPUs is depicted on the graph 159 below. CPUs 0-3 have a util_avg of 400, 10 159 below. CPUs 0-3 have a util_avg of 400, 100, 600 and 500 respectively 160 Each performance domain has three Operatin 160 Each performance domain has three Operating Performance Points (OPPs). 161 The CPU capacity and power cost associated 161 The CPU capacity and power cost associated with each OPP is listed in 162 the Energy Model table. The util_avg of P 162 the Energy Model table. The util_avg of P is shown on the figures 163 below as 'PP':: 163 below as 'PP':: 164 164 165 CPU util. 165 CPU util. 166 1024 - - - - - - - 166 1024 - - - - - - - Energy Model 167 167 +-----------+-------------+ 168 168 | Little | Big | 169 768 ============= 169 768 ============= +-----+-----+------+------+ 170 170 | Cap | Pwr | Cap | Pwr | 171 171 +-----+-----+------+------+ 172 512 =========== - ##- - - - - 172 512 =========== - ##- - - - - | 170 | 50 | 512 | 400 | 173 ## ## 173 ## ## | 341 | 150 | 768 | 800 | 174 341 -PP - - - - ## ## 174 341 -PP - - - - ## ## | 512 | 300 | 1024 | 1700 | 175 PP ## ## 175 PP ## ## +-----+-----+------+------+ 176 170 -## - - - - ## ## 176 170 -## - - - - ## ## 177 ## ## ## ## 177 ## ## ## ## 178 ------------ ------------- 178 ------------ ------------- 179 CPU0 CPU1 CPU2 CPU3 179 CPU0 CPU1 CPU2 CPU3 180 180 181 Current OPP: ===== Other OPP: - - 181 Current OPP: ===== Other OPP: - - - util_avg (100 each): ## 182 182 183 183 184 find_energy_efficient_cpu() will first loo 184 find_energy_efficient_cpu() will first look for the CPUs with the 185 maximum spare capacity in the two performa 185 maximum spare capacity in the two performance domains. In this example, 186 CPU1 and CPU3. Then it will estimate the e 186 CPU1 and CPU3. Then it will estimate the energy of the system if P was 187 placed on either of them, and check if tha 187 placed on either of them, and check if that would save some energy 188 compared to leaving P on CPU0. EAS assumes 188 compared to leaving P on CPU0. EAS assumes that OPPs follow utilization 189 (which is coherent with the behaviour of t 189 (which is coherent with the behaviour of the schedutil CPUFreq 190 governor, see Section 6. for more details 190 governor, see Section 6. for more details on this topic). 191 191 192 **Case 1. P is migrated to CPU1**:: 192 **Case 1. P is migrated to CPU1**:: 193 193 194 1024 - - - - - - - 194 1024 - - - - - - - 195 195 196 En 196 Energy calculation: 197 768 ============= * 197 768 ============= * CPU0: 200 / 341 * 150 = 88 198 * 198 * CPU1: 300 / 341 * 150 = 131 199 * 199 * CPU2: 600 / 768 * 800 = 625 200 512 - - - - - - - ##- - - - - * 200 512 - - - - - - - ##- - - - - * CPU3: 500 / 768 * 800 = 520 201 ## ## 201 ## ## => total_energy = 1364 202 341 =========== ## ## 202 341 =========== ## ## 203 PP ## ## 203 PP ## ## 204 170 -## - - PP- ## ## 204 170 -## - - PP- ## ## 205 ## ## ## ## 205 ## ## ## ## 206 ------------ ------------- 206 ------------ ------------- 207 CPU0 CPU1 CPU2 CPU3 207 CPU0 CPU1 CPU2 CPU3 208 208 209 209 210 **Case 2. P is migrated to CPU3**:: 210 **Case 2. P is migrated to CPU3**:: 211 211 212 1024 - - - - - - - 212 1024 - - - - - - - 213 213 214 En 214 Energy calculation: 215 768 ============= * 215 768 ============= * CPU0: 200 / 341 * 150 = 88 216 * 216 * CPU1: 100 / 341 * 150 = 43 217 PP * 217 PP * CPU2: 600 / 768 * 800 = 625 218 512 - - - - - - - ##- - -PP - * 218 512 - - - - - - - ##- - -PP - * CPU3: 700 / 768 * 800 = 729 219 ## ## 219 ## ## => total_energy = 1485 220 341 =========== ## ## 220 341 =========== ## ## 221 ## ## 221 ## ## 222 170 -## - - - - ## ## 222 170 -## - - - - ## ## 223 ## ## ## ## 223 ## ## ## ## 224 ------------ ------------- 224 ------------ ------------- 225 CPU0 CPU1 CPU2 CPU3 225 CPU0 CPU1 CPU2 CPU3 226 226 227 227 228 **Case 3. P stays on prev_cpu / CPU 0**:: 228 **Case 3. P stays on prev_cpu / CPU 0**:: 229 229 230 1024 - - - - - - - 230 1024 - - - - - - - 231 231 232 En 232 Energy calculation: 233 768 ============= * 233 768 ============= * CPU0: 400 / 512 * 300 = 234 234 * 234 * CPU1: 100 / 512 * 300 = 58 235 * 235 * CPU2: 600 / 768 * 800 = 625 236 512 =========== - ##- - - - - * 236 512 =========== - ##- - - - - * CPU3: 500 / 768 * 800 = 520 237 ## ## 237 ## ## => total_energy = 1437 238 341 -PP - - - - ## ## 238 341 -PP - - - - ## ## 239 PP ## ## 239 PP ## ## 240 170 -## - - - - ## ## 240 170 -## - - - - ## ## 241 ## ## ## ## 241 ## ## ## ## 242 ------------ ------------- 242 ------------ ------------- 243 CPU0 CPU1 CPU2 CPU3 243 CPU0 CPU1 CPU2 CPU3 244 244 245 245 246 From these calculations, the Case 1 has th 246 From these calculations, the Case 1 has the lowest total energy. So CPU 1 247 is be the best candidate from an energy-ef 247 is be the best candidate from an energy-efficiency standpoint. 248 248 249 Big CPUs are generally more power hungry than 249 Big CPUs are generally more power hungry than the little ones and are thus used 250 mainly when a task doesn't fit the littles. Ho 250 mainly when a task doesn't fit the littles. However, little CPUs aren't always 251 necessarily more energy-efficient than big CPU 251 necessarily more energy-efficient than big CPUs. For some systems, the high OPPs 252 of the little CPUs can be less energy-efficien 252 of the little CPUs can be less energy-efficient than the lowest OPPs of the 253 bigs, for example. So, if the little CPUs happ 253 bigs, for example. So, if the little CPUs happen to have enough utilization at 254 a specific point in time, a small task waking 254 a specific point in time, a small task waking up at that moment could be better 255 of executing on the big side in order to save 255 of executing on the big side in order to save energy, even though it would fit 256 on the little side. 256 on the little side. 257 257 258 And even in the case where all OPPs of the big 258 And even in the case where all OPPs of the big CPUs are less energy-efficient 259 than those of the little, using the big CPUs f 259 than those of the little, using the big CPUs for a small task might still, under 260 specific conditions, save energy. Indeed, plac 260 specific conditions, save energy. Indeed, placing a task on a little CPU can 261 result in raising the OPP of the entire perfor 261 result in raising the OPP of the entire performance domain, and that will 262 increase the cost of the tasks already running 262 increase the cost of the tasks already running there. If the waking task is 263 placed on a big CPU, its own execution cost mi 263 placed on a big CPU, its own execution cost might be higher than if it was 264 running on a little, but it won't impact the o 264 running on a little, but it won't impact the other tasks of the little CPUs 265 which will keep running at a lower OPP. So, wh 265 which will keep running at a lower OPP. So, when considering the total energy 266 consumed by CPUs, the extra cost of running th 266 consumed by CPUs, the extra cost of running that one task on a big core can be 267 smaller than the cost of raising the OPP on th 267 smaller than the cost of raising the OPP on the little CPUs for all the other 268 tasks. 268 tasks. 269 269 270 The examples above would be nearly impossible 270 The examples above would be nearly impossible to get right in a generic way, and 271 for all platforms, without knowing the cost of 271 for all platforms, without knowing the cost of running at different OPPs on all 272 CPUs of the system. Thanks to its EM-based des 272 CPUs of the system. Thanks to its EM-based design, EAS should cope with them 273 correctly without too many troubles. However, 273 correctly without too many troubles. However, in order to ensure a minimal 274 impact on throughput for high-utilization scen 274 impact on throughput for high-utilization scenarios, EAS also implements another 275 mechanism called 'over-utilization'. 275 mechanism called 'over-utilization'. 276 276 277 277 278 5. Over-utilization 278 5. Over-utilization 279 ------------------- 279 ------------------- 280 280 281 From a general standpoint, the use-cases where 281 From a general standpoint, the use-cases where EAS can help the most are those 282 involving a light/medium CPU utilization. When 282 involving a light/medium CPU utilization. Whenever long CPU-bound tasks are 283 being run, they will require all of the availa 283 being run, they will require all of the available CPU capacity, and there isn't 284 much that can be done by the scheduler to save !! 284 much that can be done by the scheduler to save energy without severly harming 285 throughput. In order to avoid hurting performa 285 throughput. In order to avoid hurting performance with EAS, CPUs are flagged as 286 'over-utilized' as soon as they are used at mo 286 'over-utilized' as soon as they are used at more than 80% of their compute 287 capacity. As long as no CPUs are over-utilized 287 capacity. As long as no CPUs are over-utilized in a root domain, load balancing 288 is disabled and EAS overridess the wake-up bal 288 is disabled and EAS overridess the wake-up balancing code. EAS is likely to load 289 the most energy efficient CPUs of the system m 289 the most energy efficient CPUs of the system more than the others if that can be 290 done without harming throughput. So, the load- 290 done without harming throughput. So, the load-balancer is disabled to prevent 291 it from breaking the energy-efficient task pla 291 it from breaking the energy-efficient task placement found by EAS. It is safe to 292 do so when the system isn't overutilized since 292 do so when the system isn't overutilized since being below the 80% tipping point 293 implies that: 293 implies that: 294 294 295 a. there is some idle time on all CPUs, so 295 a. there is some idle time on all CPUs, so the utilization signals used by 296 EAS are likely to accurately represent 296 EAS are likely to accurately represent the 'size' of the various tasks 297 in the system; 297 in the system; 298 b. all tasks should already be provided wi 298 b. all tasks should already be provided with enough CPU capacity, 299 regardless of their nice values; 299 regardless of their nice values; 300 c. since there is spare capacity all tasks 300 c. since there is spare capacity all tasks must be blocking/sleeping 301 regularly and balancing at wake-up is s 301 regularly and balancing at wake-up is sufficient. 302 302 303 As soon as one CPU goes above the 80% tipping 303 As soon as one CPU goes above the 80% tipping point, at least one of the three 304 assumptions above becomes incorrect. In this s 304 assumptions above becomes incorrect. In this scenario, the 'overutilized' flag 305 is raised for the entire root domain, EAS is d 305 is raised for the entire root domain, EAS is disabled, and the load-balancer is 306 re-enabled. By doing so, the scheduler falls b 306 re-enabled. By doing so, the scheduler falls back onto load-based algorithms for 307 wake-up and load balance under CPU-bound condi 307 wake-up and load balance under CPU-bound conditions. This provides a better 308 respect of the nice values of tasks. 308 respect of the nice values of tasks. 309 309 310 Since the notion of overutilization largely re 310 Since the notion of overutilization largely relies on detecting whether or not 311 there is some idle time in the system, the CPU 311 there is some idle time in the system, the CPU capacity 'stolen' by higher 312 (than CFS) scheduling classes (as well as IRQ) 312 (than CFS) scheduling classes (as well as IRQ) must be taken into account. As 313 such, the detection of overutilization account 313 such, the detection of overutilization accounts for the capacity used not only 314 by CFS tasks, but also by the other scheduling 314 by CFS tasks, but also by the other scheduling classes and IRQ. 315 315 316 316 317 6. Dependencies and requirements for EAS 317 6. Dependencies and requirements for EAS 318 ---------------------------------------- 318 ---------------------------------------- 319 319 320 Energy Aware Scheduling depends on the CPUs of 320 Energy Aware Scheduling depends on the CPUs of the system having specific 321 hardware properties and on other features of t 321 hardware properties and on other features of the kernel being enabled. This 322 section lists these dependencies and provides 322 section lists these dependencies and provides hints as to how they can be met. 323 323 324 324 325 6.1 - Asymmetric CPU topology 325 6.1 - Asymmetric CPU topology 326 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 326 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 327 327 328 328 329 As mentioned in the introduction, EAS is only 329 As mentioned in the introduction, EAS is only supported on platforms with 330 asymmetric CPU topologies for now. This requir 330 asymmetric CPU topologies for now. This requirement is checked at run-time by 331 looking for the presence of the SD_ASYM_CPUCAP !! 331 looking for the presence of the SD_ASYM_CPUCAPACITY flag when the scheduling 332 domains are built. 332 domains are built. 333 333 334 See Documentation/scheduler/sched-capacity.rst !! 334 The flag is set/cleared automatically by the scheduler topology code whenever 335 flag to be set in the sched_domain hierarchy. !! 335 there are CPUs with different capacities in a root domain. The capacities of >> 336 CPUs are provided by arch-specific code through the arch_scale_cpu_capacity() >> 337 callback. As an example, arm and arm64 share an implementation of this callback >> 338 which uses a combination of CPUFreq data and device-tree bindings to compute the >> 339 capacity of CPUs (see drivers/base/arch_topology.c for more details). >> 340 >> 341 So, in order to use EAS on your platform your architecture must implement the >> 342 arch_scale_cpu_capacity() callback, and some of the CPUs must have a lower >> 343 capacity than others. 336 344 337 Please note that EAS is not fundamentally inco 345 Please note that EAS is not fundamentally incompatible with SMP, but no 338 significant savings on SMP platforms have been 346 significant savings on SMP platforms have been observed yet. This restriction 339 could be amended in the future if proven other 347 could be amended in the future if proven otherwise. 340 348 341 349 342 6.2 - Energy Model presence 350 6.2 - Energy Model presence 343 ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 351 ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 344 352 345 EAS uses the EM of a platform to estimate the 353 EAS uses the EM of a platform to estimate the impact of scheduling decisions on 346 energy. So, your platform must provide power c 354 energy. So, your platform must provide power cost tables to the EM framework in 347 order to make EAS start. To do so, please refe 355 order to make EAS start. To do so, please refer to documentation of the 348 independent EM framework in Documentation/powe 356 independent EM framework in Documentation/power/energy-model.rst. 349 357 350 Please also note that the scheduling domains n 358 Please also note that the scheduling domains need to be re-built after the 351 EM has been registered in order to start EAS. 359 EM has been registered in order to start EAS. 352 360 353 EAS uses the EM to make a forecasting decision << 354 more focused on the difference when checking p << 355 placement. For EAS it doesn't matter whether t << 356 in milli-Watts or in an 'abstract scale'. << 357 << 358 361 359 6.3 - Energy Model complexity 362 6.3 - Energy Model complexity 360 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 363 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 361 364 362 EAS does not impose any complexity limit on th !! 365 The task wake-up path is very latency-sensitive. When the EM of a platform is 363 restricts the number of CPUs to EM_MAX_NUM_CPU !! 366 too complex (too many CPUs, too many performance domains, too many performance 364 the energy estimation. !! 367 states, ...), the cost of using it in the wake-up path can become prohibitive. >> 368 The energy-aware wake-up algorithm has a complexity of: >> 369 >> 370 C = Nd * (Nc + Ns) >> 371 >> 372 with: Nd the number of performance domains; Nc the number of CPUs; and Ns the >> 373 total number of OPPs (ex: for two perf. domains with 4 OPPs each, Ns = 8). >> 374 >> 375 A complexity check is performed at the root domain level, when scheduling >> 376 domains are built. EAS will not start on a root domain if its C happens to be >> 377 higher than the completely arbitrary EM_MAX_COMPLEXITY threshold (2048 at the >> 378 time of writing). >> 379 >> 380 If you really want to use EAS but the complexity of your platform's Energy >> 381 Model is too high to be used with a single root domain, you're left with only >> 382 two possible options: >> 383 >> 384 1. split your system into separate, smaller, root domains using exclusive >> 385 cpusets and enable EAS locally on each of them. This option has the >> 386 benefit to work out of the box but the drawback of preventing load >> 387 balance between root domains, which can result in an unbalanced system >> 388 overall; >> 389 2. submit patches to reduce the complexity of the EAS wake-up algorithm, >> 390 hence enabling it to cope with larger EMs in reasonable time. 365 391 366 392 367 6.4 - Schedutil governor 393 6.4 - Schedutil governor 368 ^^^^^^^^^^^^^^^^^^^^^^^^ 394 ^^^^^^^^^^^^^^^^^^^^^^^^ 369 395 370 EAS tries to predict at which OPP will the CPU 396 EAS tries to predict at which OPP will the CPUs be running in the close future 371 in order to estimate their energy consumption. 397 in order to estimate their energy consumption. To do so, it is assumed that OPPs 372 of CPUs follow their utilization. 398 of CPUs follow their utilization. 373 399 374 Although it is very difficult to provide hard 400 Although it is very difficult to provide hard guarantees regarding the accuracy 375 of this assumption in practice (because the ha 401 of this assumption in practice (because the hardware might not do what it is 376 told to do, for example), schedutil as opposed 402 told to do, for example), schedutil as opposed to other CPUFreq governors at 377 least _requests_ frequencies calculated using 403 least _requests_ frequencies calculated using the utilization signals. 378 Consequently, the only sane governor to use to 404 Consequently, the only sane governor to use together with EAS is schedutil, 379 because it is the only one providing some degr 405 because it is the only one providing some degree of consistency between 380 frequency requests and energy predictions. 406 frequency requests and energy predictions. 381 407 382 Using EAS with any other governor than schedut 408 Using EAS with any other governor than schedutil is not supported. 383 409 384 410 385 6.5 Scale-invariant utilization signals 411 6.5 Scale-invariant utilization signals 386 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 412 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 387 413 388 In order to make accurate prediction across CP 414 In order to make accurate prediction across CPUs and for all performance 389 states, EAS needs frequency-invariant and CPU- 415 states, EAS needs frequency-invariant and CPU-invariant PELT signals. These can 390 be obtained using the architecture-defined arc 416 be obtained using the architecture-defined arch_scale{cpu,freq}_capacity() 391 callbacks. 417 callbacks. 392 418 393 Using EAS on a platform that doesn't implement 419 Using EAS on a platform that doesn't implement these two callbacks is not 394 supported. 420 supported. 395 421 396 422 397 6.6 Multithreading (SMT) 423 6.6 Multithreading (SMT) 398 ^^^^^^^^^^^^^^^^^^^^^^^^ 424 ^^^^^^^^^^^^^^^^^^^^^^^^ 399 425 400 EAS in its current form is SMT unaware and is 426 EAS in its current form is SMT unaware and is not able to leverage 401 multithreaded hardware to save energy. EAS con 427 multithreaded hardware to save energy. EAS considers threads as independent 402 CPUs, which can actually be counter-productive 428 CPUs, which can actually be counter-productive for both performance and energy. 403 429 404 EAS on SMT is not supported. 430 EAS on SMT is not supported.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.