~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/scheduler/sched-energy.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/scheduler/sched-energy.rst (Version linux-6.12-rc7) and /Documentation/scheduler/sched-energy.rst (Version linux-6.1.116)


  1 =======================                             1 =======================
  2 Energy Aware Scheduling                             2 Energy Aware Scheduling
  3 =======================                             3 =======================
  4                                                     4 
  5 1. Introduction                                     5 1. Introduction
  6 ---------------                                     6 ---------------
  7                                                     7 
  8 Energy Aware Scheduling (or EAS) gives the sch      8 Energy Aware Scheduling (or EAS) gives the scheduler the ability to predict
  9 the impact of its decisions on the energy cons      9 the impact of its decisions on the energy consumed by CPUs. EAS relies on an
 10 Energy Model (EM) of the CPUs to select an ene     10 Energy Model (EM) of the CPUs to select an energy efficient CPU for each task,
 11 with a minimal impact on throughput. This docu     11 with a minimal impact on throughput. This document aims at providing an
 12 introduction on how EAS works, what are the ma     12 introduction on how EAS works, what are the main design decisions behind it, and
 13 details what is needed to get it to run.           13 details what is needed to get it to run.
 14                                                    14 
 15 Before going any further, please note that at      15 Before going any further, please note that at the time of writing::
 16                                                    16 
 17    /!\ EAS does not support platforms with sym     17    /!\ EAS does not support platforms with symmetric CPU topologies /!\
 18                                                    18 
 19 EAS operates only on heterogeneous CPU topolog     19 EAS operates only on heterogeneous CPU topologies (such as Arm big.LITTLE)
 20 because this is where the potential for saving     20 because this is where the potential for saving energy through scheduling is
 21 the highest.                                       21 the highest.
 22                                                    22 
 23 The actual EM used by EAS is _not_ maintained      23 The actual EM used by EAS is _not_ maintained by the scheduler, but by a
 24 dedicated framework. For details about this fr     24 dedicated framework. For details about this framework and what it provides,
 25 please refer to its documentation (see Documen     25 please refer to its documentation (see Documentation/power/energy-model.rst).
 26                                                    26 
 27                                                    27 
 28 2. Background and Terminology                      28 2. Background and Terminology
 29 -----------------------------                      29 -----------------------------
 30                                                    30 
 31 To make it clear from the start:                   31 To make it clear from the start:
 32  - energy = [joule] (resource like a battery o     32  - energy = [joule] (resource like a battery on powered devices)
 33  - power = energy/time = [joule/second] = [wat     33  - power = energy/time = [joule/second] = [watt]
 34                                                    34 
 35 The goal of EAS is to minimize energy, while s     35 The goal of EAS is to minimize energy, while still getting the job done. That
 36 is, we want to maximize::                          36 is, we want to maximize::
 37                                                    37 
 38         performance [inst/s]                       38         performance [inst/s]
 39         --------------------                       39         --------------------
 40             power [W]                              40             power [W]
 41                                                    41 
 42 which is equivalent to minimizing::                42 which is equivalent to minimizing::
 43                                                    43 
 44         energy [J]                                 44         energy [J]
 45         -----------                                45         -----------
 46         instruction                                46         instruction
 47                                                    47 
 48 while still getting 'good' performance. It is      48 while still getting 'good' performance. It is essentially an alternative
 49 optimization objective to the current performa     49 optimization objective to the current performance-only objective for the
 50 scheduler. This alternative considers two obje     50 scheduler. This alternative considers two objectives: energy-efficiency and
 51 performance.                                       51 performance.
 52                                                    52 
 53 The idea behind introducing an EM is to allow      53 The idea behind introducing an EM is to allow the scheduler to evaluate the
 54 implications of its decisions rather than blin     54 implications of its decisions rather than blindly applying energy-saving
 55 techniques that may have positive effects only     55 techniques that may have positive effects only on some platforms. At the same
 56 time, the EM must be as simple as possible to      56 time, the EM must be as simple as possible to minimize the scheduler latency
 57 impact.                                            57 impact.
 58                                                    58 
 59 In short, EAS changes the way CFS tasks are as     59 In short, EAS changes the way CFS tasks are assigned to CPUs. When it is time
 60 for the scheduler to decide where a task shoul     60 for the scheduler to decide where a task should run (during wake-up), the EM
 61 is used to break the tie between several good      61 is used to break the tie between several good CPU candidates and pick the one
 62 that is predicted to yield the best energy con     62 that is predicted to yield the best energy consumption without harming the
 63 system's throughput. The predictions made by E     63 system's throughput. The predictions made by EAS rely on specific elements of
 64 knowledge about the platform's topology, which     64 knowledge about the platform's topology, which include the 'capacity' of CPUs,
 65 and their respective energy costs.                 65 and their respective energy costs.
 66                                                    66 
 67                                                    67 
 68 3. Topology information                            68 3. Topology information
 69 -----------------------                            69 -----------------------
 70                                                    70 
 71 EAS (as well as the rest of the scheduler) use     71 EAS (as well as the rest of the scheduler) uses the notion of 'capacity' to
 72 differentiate CPUs with different computing th     72 differentiate CPUs with different computing throughput. The 'capacity' of a CPU
 73 represents the amount of work it can absorb wh     73 represents the amount of work it can absorb when running at its highest
 74 frequency compared to the most capable CPU of      74 frequency compared to the most capable CPU of the system. Capacity values are
 75 normalized in a 1024 range, and are comparable     75 normalized in a 1024 range, and are comparable with the utilization signals of
 76 tasks and CPUs computed by the Per-Entity Load     76 tasks and CPUs computed by the Per-Entity Load Tracking (PELT) mechanism. Thanks
 77 to capacity and utilization values, EAS is abl     77 to capacity and utilization values, EAS is able to estimate how big/busy a
 78 task/CPU is, and to take this into considerati     78 task/CPU is, and to take this into consideration when evaluating performance vs
 79 energy trade-offs. The capacity of CPUs is pro     79 energy trade-offs. The capacity of CPUs is provided via arch-specific code
 80 through the arch_scale_cpu_capacity() callback     80 through the arch_scale_cpu_capacity() callback.
 81                                                    81 
 82 The rest of platform knowledge used by EAS is      82 The rest of platform knowledge used by EAS is directly read from the Energy
 83 Model (EM) framework. The EM of a platform is      83 Model (EM) framework. The EM of a platform is composed of a power cost table
 84 per 'performance domain' in the system (see Do     84 per 'performance domain' in the system (see Documentation/power/energy-model.rst
 85 for further details about performance domains) !!  85 for futher details about performance domains).
 86                                                    86 
 87 The scheduler manages references to the EM obj     87 The scheduler manages references to the EM objects in the topology code when the
 88 scheduling domains are built, or re-built. For     88 scheduling domains are built, or re-built. For each root domain (rd), the
 89 scheduler maintains a singly linked list of al     89 scheduler maintains a singly linked list of all performance domains intersecting
 90 the current rd->span. Each node in the list co     90 the current rd->span. Each node in the list contains a pointer to a struct
 91 em_perf_domain as provided by the EM framework     91 em_perf_domain as provided by the EM framework.
 92                                                    92 
 93 The lists are attached to the root domains in      93 The lists are attached to the root domains in order to cope with exclusive
 94 cpuset configurations. Since the boundaries of     94 cpuset configurations. Since the boundaries of exclusive cpusets do not
 95 necessarily match those of performance domains     95 necessarily match those of performance domains, the lists of different root
 96 domains can contain duplicate elements.            96 domains can contain duplicate elements.
 97                                                    97 
 98 Example 1.                                         98 Example 1.
 99     Let us consider a platform with 12 CPUs, s     99     Let us consider a platform with 12 CPUs, split in 3 performance domains
100     (pd0, pd4 and pd8), organized as follows::    100     (pd0, pd4 and pd8), organized as follows::
101                                                   101 
102                   CPUs:   0 1 2 3 4 5 6 7 8 9     102                   CPUs:   0 1 2 3 4 5 6 7 8 9 10 11
103                   PDs:   |--pd0--|--pd4--|---p    103                   PDs:   |--pd0--|--pd4--|---pd8---|
104                   RDs:   |----rd1----|-----rd2    104                   RDs:   |----rd1----|-----rd2-----|
105                                                   105 
106     Now, consider that userspace decided to sp    106     Now, consider that userspace decided to split the system with two
107     exclusive cpusets, hence creating two inde    107     exclusive cpusets, hence creating two independent root domains, each
108     containing 6 CPUs. The two root domains ar    108     containing 6 CPUs. The two root domains are denoted rd1 and rd2 in the
109     above figure. Since pd4 intersects with bo    109     above figure. Since pd4 intersects with both rd1 and rd2, it will be
110     present in the linked list '->pd' attached    110     present in the linked list '->pd' attached to each of them:
111                                                   111 
112        * rd1->pd: pd0 -> pd4                      112        * rd1->pd: pd0 -> pd4
113        * rd2->pd: pd4 -> pd8                      113        * rd2->pd: pd4 -> pd8
114                                                   114 
115     Please note that the scheduler will create    115     Please note that the scheduler will create two duplicate list nodes for
116     pd4 (one for each list). However, both jus    116     pd4 (one for each list). However, both just hold a pointer to the same
117     shared data structure of the EM framework.    117     shared data structure of the EM framework.
118                                                   118 
119 Since the access to these lists can happen con    119 Since the access to these lists can happen concurrently with hotplug and other
120 things, they are protected by RCU, like the re    120 things, they are protected by RCU, like the rest of topology structures
121 manipulated by the scheduler.                     121 manipulated by the scheduler.
122                                                   122 
123 EAS also maintains a static key (sched_energy_    123 EAS also maintains a static key (sched_energy_present) which is enabled when at
124 least one root domain meets all conditions for    124 least one root domain meets all conditions for EAS to start. Those conditions
125 are summarized in Section 6.                      125 are summarized in Section 6.
126                                                   126 
127                                                   127 
128 4. Energy-Aware task placement                    128 4. Energy-Aware task placement
129 ------------------------------                    129 ------------------------------
130                                                   130 
131 EAS overrides the CFS task wake-up balancing c    131 EAS overrides the CFS task wake-up balancing code. It uses the EM of the
132 platform and the PELT signals to choose an ene    132 platform and the PELT signals to choose an energy-efficient target CPU during
133 wake-up balance. When EAS is enabled, select_t    133 wake-up balance. When EAS is enabled, select_task_rq_fair() calls
134 find_energy_efficient_cpu() to do the placemen    134 find_energy_efficient_cpu() to do the placement decision. This function looks
135 for the CPU with the highest spare capacity (C    135 for the CPU with the highest spare capacity (CPU capacity - CPU utilization) in
136 each performance domain since it is the one wh    136 each performance domain since it is the one which will allow us to keep the
137 frequency the lowest. Then, the function check    137 frequency the lowest. Then, the function checks if placing the task there could
138 save energy compared to leaving it on prev_cpu    138 save energy compared to leaving it on prev_cpu, i.e. the CPU where the task ran
139 in its previous activation.                       139 in its previous activation.
140                                                   140 
141 find_energy_efficient_cpu() uses compute_energ    141 find_energy_efficient_cpu() uses compute_energy() to estimate what will be the
142 energy consumed by the system if the waking ta    142 energy consumed by the system if the waking task was migrated. compute_energy()
143 looks at the current utilization landscape of     143 looks at the current utilization landscape of the CPUs and adjusts it to
144 'simulate' the task migration. The EM framewor    144 'simulate' the task migration. The EM framework provides the em_pd_energy() API
145 which computes the expected energy consumption    145 which computes the expected energy consumption of each performance domain for
146 the given utilization landscape.                  146 the given utilization landscape.
147                                                   147 
148 An example of energy-optimized task placement     148 An example of energy-optimized task placement decision is detailed below.
149                                                   149 
150 Example 2.                                        150 Example 2.
151     Let us consider a (fake) platform with 2 i    151     Let us consider a (fake) platform with 2 independent performance domains
152     composed of two CPUs each. CPU0 and CPU1 a    152     composed of two CPUs each. CPU0 and CPU1 are little CPUs; CPU2 and CPU3
153     are big.                                      153     are big.
154                                                   154 
155     The scheduler must decide where to place a    155     The scheduler must decide where to place a task P whose util_avg = 200
156     and prev_cpu = 0.                             156     and prev_cpu = 0.
157                                                   157 
158     The current utilization landscape of the C    158     The current utilization landscape of the CPUs is depicted on the graph
159     below. CPUs 0-3 have a util_avg of 400, 10    159     below. CPUs 0-3 have a util_avg of 400, 100, 600 and 500 respectively
160     Each performance domain has three Operatin    160     Each performance domain has three Operating Performance Points (OPPs).
161     The CPU capacity and power cost associated    161     The CPU capacity and power cost associated with each OPP is listed in
162     the Energy Model table. The util_avg of P     162     the Energy Model table. The util_avg of P is shown on the figures
163     below as 'PP'::                               163     below as 'PP'::
164                                                   164 
165      CPU util.                                    165      CPU util.
166       1024                 - - - - - - -          166       1024                 - - - - - - -              Energy Model
167                                                   167                                                +-----------+-------------+
168                                                   168                                                |  Little   |     Big     |
169        768                 =============          169        768                 =============       +-----+-----+------+------+
170                                                   170                                                | Cap | Pwr | Cap  | Pwr  |
171                                                   171                                                +-----+-----+------+------+
172        512  ===========    - ##- - - - -          172        512  ===========    - ##- - - - -       | 170 | 50  | 512  | 400  |
173                              ##     ##            173                              ##     ##         | 341 | 150 | 768  | 800  |
174        341  -PP - - - -      ##     ##            174        341  -PP - - - -      ##     ##         | 512 | 300 | 1024 | 1700 |
175              PP              ##     ##            175              PP              ##     ##         +-----+-----+------+------+
176        170  -## - - - -      ##     ##            176        170  -## - - - -      ##     ##
177              ##     ##       ##     ##            177              ##     ##       ##     ##
178            ------------    -------------          178            ------------    -------------
179             CPU0   CPU1     CPU2   CPU3           179             CPU0   CPU1     CPU2   CPU3
180                                                   180 
181       Current OPP: =====       Other OPP: - -     181       Current OPP: =====       Other OPP: - - -     util_avg (100 each): ##
182                                                   182 
183                                                   183 
184     find_energy_efficient_cpu() will first loo    184     find_energy_efficient_cpu() will first look for the CPUs with the
185     maximum spare capacity in the two performa    185     maximum spare capacity in the two performance domains. In this example,
186     CPU1 and CPU3. Then it will estimate the e    186     CPU1 and CPU3. Then it will estimate the energy of the system if P was
187     placed on either of them, and check if tha    187     placed on either of them, and check if that would save some energy
188     compared to leaving P on CPU0. EAS assumes    188     compared to leaving P on CPU0. EAS assumes that OPPs follow utilization
189     (which is coherent with the behaviour of t    189     (which is coherent with the behaviour of the schedutil CPUFreq
190     governor, see Section 6. for more details     190     governor, see Section 6. for more details on this topic).
191                                                   191 
192     **Case 1. P is migrated to CPU1**::           192     **Case 1. P is migrated to CPU1**::
193                                                   193 
194       1024                 - - - - - - -          194       1024                 - - - - - - -
195                                                   195 
196                                             En    196                                             Energy calculation:
197        768                 =============     *    197        768                 =============     * CPU0: 200 / 341 * 150 = 88
198                                              *    198                                              * CPU1: 300 / 341 * 150 = 131
199                                              *    199                                              * CPU2: 600 / 768 * 800 = 625
200        512  - - - - - -    - ##- - - - -     *    200        512  - - - - - -    - ##- - - - -     * CPU3: 500 / 768 * 800 = 520
201                              ##     ##            201                              ##     ##          => total_energy = 1364
202        341  ===========      ##     ##            202        341  ===========      ##     ##
203                     PP       ##     ##            203                     PP       ##     ##
204        170  -## - - PP-      ##     ##            204        170  -## - - PP-      ##     ##
205              ##     ##       ##     ##            205              ##     ##       ##     ##
206            ------------    -------------          206            ------------    -------------
207             CPU0   CPU1     CPU2   CPU3           207             CPU0   CPU1     CPU2   CPU3
208                                                   208 
209                                                   209 
210     **Case 2. P is migrated to CPU3**::           210     **Case 2. P is migrated to CPU3**::
211                                                   211 
212       1024                 - - - - - - -          212       1024                 - - - - - - -
213                                                   213 
214                                             En    214                                             Energy calculation:
215        768                 =============     *    215        768                 =============     * CPU0: 200 / 341 * 150 = 88
216                                              *    216                                              * CPU1: 100 / 341 * 150 = 43
217                                     PP       *    217                                     PP       * CPU2: 600 / 768 * 800 = 625
218        512  - - - - - -    - ##- - -PP -     *    218        512  - - - - - -    - ##- - -PP -     * CPU3: 700 / 768 * 800 = 729
219                              ##     ##            219                              ##     ##          => total_energy = 1485
220        341  ===========      ##     ##            220        341  ===========      ##     ##
221                              ##     ##            221                              ##     ##
222        170  -## - - - -      ##     ##            222        170  -## - - - -      ##     ##
223              ##     ##       ##     ##            223              ##     ##       ##     ##
224            ------------    -------------          224            ------------    -------------
225             CPU0   CPU1     CPU2   CPU3           225             CPU0   CPU1     CPU2   CPU3
226                                                   226 
227                                                   227 
228     **Case 3. P stays on prev_cpu / CPU 0**::     228     **Case 3. P stays on prev_cpu / CPU 0**::
229                                                   229 
230       1024                 - - - - - - -          230       1024                 - - - - - - -
231                                                   231 
232                                             En    232                                             Energy calculation:
233        768                 =============     *    233        768                 =============     * CPU0: 400 / 512 * 300 = 234
234                                              *    234                                              * CPU1: 100 / 512 * 300 = 58
235                                              *    235                                              * CPU2: 600 / 768 * 800 = 625
236        512  ===========    - ##- - - - -     *    236        512  ===========    - ##- - - - -     * CPU3: 500 / 768 * 800 = 520
237                              ##     ##            237                              ##     ##          => total_energy = 1437
238        341  -PP - - - -      ##     ##            238        341  -PP - - - -      ##     ##
239              PP              ##     ##            239              PP              ##     ##
240        170  -## - - - -      ##     ##            240        170  -## - - - -      ##     ##
241              ##     ##       ##     ##            241              ##     ##       ##     ##
242            ------------    -------------          242            ------------    -------------
243             CPU0   CPU1     CPU2   CPU3           243             CPU0   CPU1     CPU2   CPU3
244                                                   244 
245                                                   245 
246     From these calculations, the Case 1 has th    246     From these calculations, the Case 1 has the lowest total energy. So CPU 1
247     is be the best candidate from an energy-ef    247     is be the best candidate from an energy-efficiency standpoint.
248                                                   248 
249 Big CPUs are generally more power hungry than     249 Big CPUs are generally more power hungry than the little ones and are thus used
250 mainly when a task doesn't fit the littles. Ho    250 mainly when a task doesn't fit the littles. However, little CPUs aren't always
251 necessarily more energy-efficient than big CPU    251 necessarily more energy-efficient than big CPUs. For some systems, the high OPPs
252 of the little CPUs can be less energy-efficien    252 of the little CPUs can be less energy-efficient than the lowest OPPs of the
253 bigs, for example. So, if the little CPUs happ    253 bigs, for example. So, if the little CPUs happen to have enough utilization at
254 a specific point in time, a small task waking     254 a specific point in time, a small task waking up at that moment could be better
255 of executing on the big side in order to save     255 of executing on the big side in order to save energy, even though it would fit
256 on the little side.                               256 on the little side.
257                                                   257 
258 And even in the case where all OPPs of the big    258 And even in the case where all OPPs of the big CPUs are less energy-efficient
259 than those of the little, using the big CPUs f    259 than those of the little, using the big CPUs for a small task might still, under
260 specific conditions, save energy. Indeed, plac    260 specific conditions, save energy. Indeed, placing a task on a little CPU can
261 result in raising the OPP of the entire perfor    261 result in raising the OPP of the entire performance domain, and that will
262 increase the cost of the tasks already running    262 increase the cost of the tasks already running there. If the waking task is
263 placed on a big CPU, its own execution cost mi    263 placed on a big CPU, its own execution cost might be higher than if it was
264 running on a little, but it won't impact the o    264 running on a little, but it won't impact the other tasks of the little CPUs
265 which will keep running at a lower OPP. So, wh    265 which will keep running at a lower OPP. So, when considering the total energy
266 consumed by CPUs, the extra cost of running th    266 consumed by CPUs, the extra cost of running that one task on a big core can be
267 smaller than the cost of raising the OPP on th    267 smaller than the cost of raising the OPP on the little CPUs for all the other
268 tasks.                                            268 tasks.
269                                                   269 
270 The examples above would be nearly impossible     270 The examples above would be nearly impossible to get right in a generic way, and
271 for all platforms, without knowing the cost of    271 for all platforms, without knowing the cost of running at different OPPs on all
272 CPUs of the system. Thanks to its EM-based des    272 CPUs of the system. Thanks to its EM-based design, EAS should cope with them
273 correctly without too many troubles. However,     273 correctly without too many troubles. However, in order to ensure a minimal
274 impact on throughput for high-utilization scen    274 impact on throughput for high-utilization scenarios, EAS also implements another
275 mechanism called 'over-utilization'.              275 mechanism called 'over-utilization'.
276                                                   276 
277                                                   277 
278 5. Over-utilization                               278 5. Over-utilization
279 -------------------                               279 -------------------
280                                                   280 
281 From a general standpoint, the use-cases where    281 From a general standpoint, the use-cases where EAS can help the most are those
282 involving a light/medium CPU utilization. When    282 involving a light/medium CPU utilization. Whenever long CPU-bound tasks are
283 being run, they will require all of the availa    283 being run, they will require all of the available CPU capacity, and there isn't
284 much that can be done by the scheduler to save !! 284 much that can be done by the scheduler to save energy without severly harming
285 throughput. In order to avoid hurting performa    285 throughput. In order to avoid hurting performance with EAS, CPUs are flagged as
286 'over-utilized' as soon as they are used at mo    286 'over-utilized' as soon as they are used at more than 80% of their compute
287 capacity. As long as no CPUs are over-utilized    287 capacity. As long as no CPUs are over-utilized in a root domain, load balancing
288 is disabled and EAS overridess the wake-up bal    288 is disabled and EAS overridess the wake-up balancing code. EAS is likely to load
289 the most energy efficient CPUs of the system m    289 the most energy efficient CPUs of the system more than the others if that can be
290 done without harming throughput. So, the load-    290 done without harming throughput. So, the load-balancer is disabled to prevent
291 it from breaking the energy-efficient task pla    291 it from breaking the energy-efficient task placement found by EAS. It is safe to
292 do so when the system isn't overutilized since    292 do so when the system isn't overutilized since being below the 80% tipping point
293 implies that:                                     293 implies that:
294                                                   294 
295     a. there is some idle time on all CPUs, so    295     a. there is some idle time on all CPUs, so the utilization signals used by
296        EAS are likely to accurately represent     296        EAS are likely to accurately represent the 'size' of the various tasks
297        in the system;                             297        in the system;
298     b. all tasks should already be provided wi    298     b. all tasks should already be provided with enough CPU capacity,
299        regardless of their nice values;           299        regardless of their nice values;
300     c. since there is spare capacity all tasks    300     c. since there is spare capacity all tasks must be blocking/sleeping
301        regularly and balancing at wake-up is s    301        regularly and balancing at wake-up is sufficient.
302                                                   302 
303 As soon as one CPU goes above the 80% tipping     303 As soon as one CPU goes above the 80% tipping point, at least one of the three
304 assumptions above becomes incorrect. In this s    304 assumptions above becomes incorrect. In this scenario, the 'overutilized' flag
305 is raised for the entire root domain, EAS is d    305 is raised for the entire root domain, EAS is disabled, and the load-balancer is
306 re-enabled. By doing so, the scheduler falls b    306 re-enabled. By doing so, the scheduler falls back onto load-based algorithms for
307 wake-up and load balance under CPU-bound condi    307 wake-up and load balance under CPU-bound conditions. This provides a better
308 respect of the nice values of tasks.              308 respect of the nice values of tasks.
309                                                   309 
310 Since the notion of overutilization largely re    310 Since the notion of overutilization largely relies on detecting whether or not
311 there is some idle time in the system, the CPU    311 there is some idle time in the system, the CPU capacity 'stolen' by higher
312 (than CFS) scheduling classes (as well as IRQ)    312 (than CFS) scheduling classes (as well as IRQ) must be taken into account. As
313 such, the detection of overutilization account    313 such, the detection of overutilization accounts for the capacity used not only
314 by CFS tasks, but also by the other scheduling    314 by CFS tasks, but also by the other scheduling classes and IRQ.
315                                                   315 
316                                                   316 
317 6. Dependencies and requirements for EAS          317 6. Dependencies and requirements for EAS
318 ----------------------------------------          318 ----------------------------------------
319                                                   319 
320 Energy Aware Scheduling depends on the CPUs of    320 Energy Aware Scheduling depends on the CPUs of the system having specific
321 hardware properties and on other features of t    321 hardware properties and on other features of the kernel being enabled. This
322 section lists these dependencies and provides     322 section lists these dependencies and provides hints as to how they can be met.
323                                                   323 
324                                                   324 
325 6.1 - Asymmetric CPU topology                     325 6.1 - Asymmetric CPU topology
326 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                     326 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
327                                                   327 
328                                                   328 
329 As mentioned in the introduction, EAS is only     329 As mentioned in the introduction, EAS is only supported on platforms with
330 asymmetric CPU topologies for now. This requir    330 asymmetric CPU topologies for now. This requirement is checked at run-time by
331 looking for the presence of the SD_ASYM_CPUCAP    331 looking for the presence of the SD_ASYM_CPUCAPACITY_FULL flag when the scheduling
332 domains are built.                                332 domains are built.
333                                                   333 
334 See Documentation/scheduler/sched-capacity.rst    334 See Documentation/scheduler/sched-capacity.rst for requirements to be met for this
335 flag to be set in the sched_domain hierarchy.     335 flag to be set in the sched_domain hierarchy.
336                                                   336 
337 Please note that EAS is not fundamentally inco    337 Please note that EAS is not fundamentally incompatible with SMP, but no
338 significant savings on SMP platforms have been    338 significant savings on SMP platforms have been observed yet. This restriction
339 could be amended in the future if proven other    339 could be amended in the future if proven otherwise.
340                                                   340 
341                                                   341 
342 6.2 - Energy Model presence                       342 6.2 - Energy Model presence
343 ^^^^^^^^^^^^^^^^^^^^^^^^^^^                       343 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
344                                                   344 
345 EAS uses the EM of a platform to estimate the     345 EAS uses the EM of a platform to estimate the impact of scheduling decisions on
346 energy. So, your platform must provide power c    346 energy. So, your platform must provide power cost tables to the EM framework in
347 order to make EAS start. To do so, please refe    347 order to make EAS start. To do so, please refer to documentation of the
348 independent EM framework in Documentation/powe    348 independent EM framework in Documentation/power/energy-model.rst.
349                                                   349 
350 Please also note that the scheduling domains n    350 Please also note that the scheduling domains need to be re-built after the
351 EM has been registered in order to start EAS.     351 EM has been registered in order to start EAS.
352                                                   352 
353 EAS uses the EM to make a forecasting decision    353 EAS uses the EM to make a forecasting decision on energy usage and thus it is
354 more focused on the difference when checking p    354 more focused on the difference when checking possible options for task
355 placement. For EAS it doesn't matter whether t    355 placement. For EAS it doesn't matter whether the EM power values are expressed
356 in milli-Watts or in an 'abstract scale'.         356 in milli-Watts or in an 'abstract scale'.
357                                                   357 
358                                                   358 
359 6.3 - Energy Model complexity                     359 6.3 - Energy Model complexity
360 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                     360 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
361                                                   361 
362 EAS does not impose any complexity limit on th !! 362 The task wake-up path is very latency-sensitive. When the EM of a platform is
363 restricts the number of CPUs to EM_MAX_NUM_CPU !! 363 too complex (too many CPUs, too many performance domains, too many performance
364 the energy estimation.                         !! 364 states, ...), the cost of using it in the wake-up path can become prohibitive.
                                                   >> 365 The energy-aware wake-up algorithm has a complexity of:
                                                   >> 366 
                                                   >> 367         C = Nd * (Nc + Ns)
                                                   >> 368 
                                                   >> 369 with: Nd the number of performance domains; Nc the number of CPUs; and Ns the
                                                   >> 370 total number of OPPs (ex: for two perf. domains with 4 OPPs each, Ns = 8).
                                                   >> 371 
                                                   >> 372 A complexity check is performed at the root domain level, when scheduling
                                                   >> 373 domains are built. EAS will not start on a root domain if its C happens to be
                                                   >> 374 higher than the completely arbitrary EM_MAX_COMPLEXITY threshold (2048 at the
                                                   >> 375 time of writing).
                                                   >> 376 
                                                   >> 377 If you really want to use EAS but the complexity of your platform's Energy
                                                   >> 378 Model is too high to be used with a single root domain, you're left with only
                                                   >> 379 two possible options:
                                                   >> 380 
                                                   >> 381     1. split your system into separate, smaller, root domains using exclusive
                                                   >> 382        cpusets and enable EAS locally on each of them. This option has the
                                                   >> 383        benefit to work out of the box but the drawback of preventing load
                                                   >> 384        balance between root domains, which can result in an unbalanced system
                                                   >> 385        overall;
                                                   >> 386     2. submit patches to reduce the complexity of the EAS wake-up algorithm,
                                                   >> 387        hence enabling it to cope with larger EMs in reasonable time.
365                                                   388 
366                                                   389 
367 6.4 - Schedutil governor                          390 6.4 - Schedutil governor
368 ^^^^^^^^^^^^^^^^^^^^^^^^                          391 ^^^^^^^^^^^^^^^^^^^^^^^^
369                                                   392 
370 EAS tries to predict at which OPP will the CPU    393 EAS tries to predict at which OPP will the CPUs be running in the close future
371 in order to estimate their energy consumption.    394 in order to estimate their energy consumption. To do so, it is assumed that OPPs
372 of CPUs follow their utilization.                 395 of CPUs follow their utilization.
373                                                   396 
374 Although it is very difficult to provide hard     397 Although it is very difficult to provide hard guarantees regarding the accuracy
375 of this assumption in practice (because the ha    398 of this assumption in practice (because the hardware might not do what it is
376 told to do, for example), schedutil as opposed    399 told to do, for example), schedutil as opposed to other CPUFreq governors at
377 least _requests_ frequencies calculated using     400 least _requests_ frequencies calculated using the utilization signals.
378 Consequently, the only sane governor to use to    401 Consequently, the only sane governor to use together with EAS is schedutil,
379 because it is the only one providing some degr    402 because it is the only one providing some degree of consistency between
380 frequency requests and energy predictions.        403 frequency requests and energy predictions.
381                                                   404 
382 Using EAS with any other governor than schedut    405 Using EAS with any other governor than schedutil is not supported.
383                                                   406 
384                                                   407 
385 6.5 Scale-invariant utilization signals           408 6.5 Scale-invariant utilization signals
386 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^           409 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
387                                                   410 
388 In order to make accurate prediction across CP    411 In order to make accurate prediction across CPUs and for all performance
389 states, EAS needs frequency-invariant and CPU-    412 states, EAS needs frequency-invariant and CPU-invariant PELT signals. These can
390 be obtained using the architecture-defined arc    413 be obtained using the architecture-defined arch_scale{cpu,freq}_capacity()
391 callbacks.                                        414 callbacks.
392                                                   415 
393 Using EAS on a platform that doesn't implement    416 Using EAS on a platform that doesn't implement these two callbacks is not
394 supported.                                        417 supported.
395                                                   418 
396                                                   419 
397 6.6 Multithreading (SMT)                          420 6.6 Multithreading (SMT)
398 ^^^^^^^^^^^^^^^^^^^^^^^^                          421 ^^^^^^^^^^^^^^^^^^^^^^^^
399                                                   422 
400 EAS in its current form is SMT unaware and is     423 EAS in its current form is SMT unaware and is not able to leverage
401 multithreaded hardware to save energy. EAS con    424 multithreaded hardware to save energy. EAS considers threads as independent
402 CPUs, which can actually be counter-productive    425 CPUs, which can actually be counter-productive for both performance and energy.
403                                                   426 
404 EAS on SMT is not supported.                      427 EAS on SMT is not supported.
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php