~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~
sched-util-clamp.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~
Diff markup

Differences between /Documentation/scheduler/sched-util-clamp.rst (Architecture ppc) and /Documentation/scheduler/sched-util-clamp.rst (Architecture i386)

  1 .. SPDX-License-Identifier: GPL-2.0                 1 .. SPDX-License-Identifier: GPL-2.0
  2                                                     2 
  3 ====================                                3 ====================
  4 Utilization Clamping                                4 Utilization Clamping
  5 ====================                                5 ====================
  6                                                     6 
  7 1. Introduction                                     7 1. Introduction
  8 ===============                                     8 ===============
  9                                                     9 
 10 Utilization clamping, also known as util clamp     10 Utilization clamping, also known as util clamp or uclamp, is a scheduler
 11 feature that allows user space to help in mana     11 feature that allows user space to help in managing the performance requirement
 12 of tasks. It was introduced in v5.3 release. T     12 of tasks. It was introduced in v5.3 release. The CGroup support was merged in
 13 v5.4.                                              13 v5.4.
 14                                                    14 
 15 Uclamp is a hinting mechanism that allows the      15 Uclamp is a hinting mechanism that allows the scheduler to understand the
 16 performance requirements and restrictions of t     16 performance requirements and restrictions of the tasks, thus it helps the
 17 scheduler to make a better decision. And when      17 scheduler to make a better decision. And when schedutil cpufreq governor is
 18 used, util clamp will influence the CPU freque     18 used, util clamp will influence the CPU frequency selection as well.
 19                                                    19 
 20 Since the scheduler and schedutil are both dri     20 Since the scheduler and schedutil are both driven by PELT (util_avg) signals,
 21 util clamp acts on that to achieve its goal by     21 util clamp acts on that to achieve its goal by clamping the signal to a certain
 22 point; hence the name. That is, by clamping ut     22 point; hence the name. That is, by clamping utilization we are making the
 23 system run at a certain performance point.         23 system run at a certain performance point.
 24                                                    24 
 25 The right way to view util clamp is as a mecha     25 The right way to view util clamp is as a mechanism to make request or hint on
 26 performance constraints. It consists of two tu     26 performance constraints. It consists of two tunables:
 27                                                    27 
 28         * UCLAMP_MIN, which sets the lower bou     28         * UCLAMP_MIN, which sets the lower bound.
 29         * UCLAMP_MAX, which sets the upper bou     29         * UCLAMP_MAX, which sets the upper bound.
 30                                                    30 
 31 These two bounds will ensure a task will opera     31 These two bounds will ensure a task will operate within this performance range
 32 of the system. UCLAMP_MIN implies boosting a t     32 of the system. UCLAMP_MIN implies boosting a task, while UCLAMP_MAX implies
 33 capping a task.                                    33 capping a task.
 34                                                    34 
 35 One can tell the system (scheduler) that some      35 One can tell the system (scheduler) that some tasks require a minimum
 36 performance point to operate at to deliver the     36 performance point to operate at to deliver the desired user experience. Or one
 37 can tell the system that some tasks should be      37 can tell the system that some tasks should be restricted from consuming too
 38 much resources and should not go above a speci     38 much resources and should not go above a specific performance point. Viewing
 39 the uclamp values as performance points rather     39 the uclamp values as performance points rather than utilization is a better
 40 abstraction from user space point of view.         40 abstraction from user space point of view.
 41                                                    41 
 42 As an example, a game can use util clamp to fo     42 As an example, a game can use util clamp to form a feedback loop with its
 43 perceived Frames Per Second (FPS). It can dyna     43 perceived Frames Per Second (FPS). It can dynamically increase the minimum
 44 performance point required by its display pipe     44 performance point required by its display pipeline to ensure no frame is
 45 dropped. It can also dynamically 'prime' up th     45 dropped. It can also dynamically 'prime' up these tasks if it knows in the
 46 coming few hundred milliseconds a computationa     46 coming few hundred milliseconds a computationally intensive scene is about to
 47 happen.                                            47 happen.
 48                                                    48 
 49 On mobile hardware where the capability of the     49 On mobile hardware where the capability of the devices varies a lot, this
 50 dynamic feedback loop offers a great flexibili     50 dynamic feedback loop offers a great flexibility to ensure best user experience
 51 given the capabilities of any system.              51 given the capabilities of any system.
 52                                                    52 
 53 Of course a static configuration is possible t     53 Of course a static configuration is possible too. The exact usage will depend
 54 on the system, application and the desired out     54 on the system, application and the desired outcome.
 55                                                    55 
 56 Another example is in Android where tasks are      56 Another example is in Android where tasks are classified as background,
 57 foreground, top-app, etc. Util clamp can be us     57 foreground, top-app, etc. Util clamp can be used to constrain how much
 58 resources background tasks are consuming by ca     58 resources background tasks are consuming by capping the performance point they
 59 can run at. This constraint helps reserve reso     59 can run at. This constraint helps reserve resources for important tasks, like
 60 the ones belonging to the currently active app     60 the ones belonging to the currently active app (top-app group). Beside this
 61 helps in limiting how much power they consume.     61 helps in limiting how much power they consume. This can be more obvious in
 62 heterogeneous systems (e.g. Arm big.LITTLE); t     62 heterogeneous systems (e.g. Arm big.LITTLE); the constraint will help bias the
 63 background tasks to stay on the little cores w     63 background tasks to stay on the little cores which will ensure that:
 64                                                    64 
 65         1. The big cores are free to run top-a     65         1. The big cores are free to run top-app tasks immediately. top-app
 66            tasks are the tasks the user is cur     66            tasks are the tasks the user is currently interacting with, hence
 67            the most important tasks in the sys     67            the most important tasks in the system.
 68         2. They don't run on a power hungry co     68         2. They don't run on a power hungry core and drain battery even if they
 69            are CPU intensive tasks.                69            are CPU intensive tasks.
 70                                                    70 
 71 .. note::                                          71 .. note::
 72   **little cores**:                                72   **little cores**:
 73     CPUs with capacity < 1024                      73     CPUs with capacity < 1024
 74                                                    74 
 75   **big cores**:                                   75   **big cores**:
 76     CPUs with capacity = 1024                      76     CPUs with capacity = 1024
 77                                                    77 
 78 By making these uclamp performance requests, o     78 By making these uclamp performance requests, or rather hints, user space can
 79 ensure system resources are used optimally to      79 ensure system resources are used optimally to deliver the best possible user
 80 experience.                                        80 experience.
 81                                                    81 
 82 Another use case is to help with **overcoming      82 Another use case is to help with **overcoming the ramp up latency inherit in
 83 how scheduler utilization signal is calculated     83 how scheduler utilization signal is calculated**.
 84                                                    84 
 85 On the other hand, a busy task for instance th     85 On the other hand, a busy task for instance that requires to run at maximum
 86 performance point will suffer a delay of ~200m     86 performance point will suffer a delay of ~200ms (PELT HALFIFE = 32ms) for the
 87 scheduler to realize that. This is known to af     87 scheduler to realize that. This is known to affect workloads like gaming on
 88 mobile devices where frames will drop due to s     88 mobile devices where frames will drop due to slow response time to select the
 89 higher frequency required for the tasks to fin     89 higher frequency required for the tasks to finish their work in time. Setting
 90 UCLAMP_MIN=1024 will ensure such tasks will al     90 UCLAMP_MIN=1024 will ensure such tasks will always see the highest performance
 91 level when they start running.                     91 level when they start running.
 92                                                    92 
 93 The overall visible effect goes beyond better      93 The overall visible effect goes beyond better perceived user
 94 experience/performance and stretches to help a     94 experience/performance and stretches to help achieve a better overall
 95 performance/watt if used effectively.              95 performance/watt if used effectively.
 96                                                    96 
 97 User space can form a feedback loop with the t     97 User space can form a feedback loop with the thermal subsystem too to ensure
 98 the device doesn't heat up to the point where      98 the device doesn't heat up to the point where it will throttle.
 99                                                    99 
100 Both SCHED_NORMAL/OTHER and SCHED_FIFO/RR hono    100 Both SCHED_NORMAL/OTHER and SCHED_FIFO/RR honour uclamp requests/hints.
101                                                   101 
102 In the SCHED_FIFO/RR case, uclamp gives the op    102 In the SCHED_FIFO/RR case, uclamp gives the option to run RT tasks at any
103 performance point rather than being tied to MA    103 performance point rather than being tied to MAX frequency all the time. Which
104 can be useful on general purpose systems that     104 can be useful on general purpose systems that run on battery powered devices.
105                                                   105 
106 Note that by design RT tasks don't have per-ta    106 Note that by design RT tasks don't have per-task PELT signal and must always
107 run at a constant frequency to combat undeterm    107 run at a constant frequency to combat undeterministic DVFS rampup delays.
108                                                   108 
109 Note that using schedutil always implies a sin    109 Note that using schedutil always implies a single delay to modify the frequency
110 when an RT task wakes up. This cost is unchang    110 when an RT task wakes up. This cost is unchanged by using uclamp. Uclamp only
111 helps picking what frequency to request instea    111 helps picking what frequency to request instead of schedutil always requesting
112 MAX for all RT tasks.                             112 MAX for all RT tasks.
113                                                   113 
114 See :ref:`section 3.4 <uclamp-default-values>`    114 See :ref:`section 3.4 <uclamp-default-values>` for default values and
115 :ref:`3.4.1 <sched-util-clamp-min-rt-default>`    115 :ref:`3.4.1 <sched-util-clamp-min-rt-default>` on how to change RT tasks
116 default value.                                    116 default value.
117                                                   117 
118 2. Design                                         118 2. Design
119 =========                                         119 =========
120                                                   120 
121 Util clamp is a property of every task in the     121 Util clamp is a property of every task in the system. It sets the boundaries of
122 its utilization signal; acting as a bias mecha    122 its utilization signal; acting as a bias mechanism that influences certain
123 decisions within the scheduler.                   123 decisions within the scheduler.
124                                                   124 
125 The actual utilization signal of a task is nev    125 The actual utilization signal of a task is never clamped in reality. If you
126 inspect PELT signals at any point of time you     126 inspect PELT signals at any point of time you should continue to see them as
127 they are intact. Clamping happens only when ne    127 they are intact. Clamping happens only when needed, e.g: when a task wakes up
128 and the scheduler needs to select a suitable C    128 and the scheduler needs to select a suitable CPU for it to run on.
129                                                   129 
130 Since the goal of util clamp is to allow reque    130 Since the goal of util clamp is to allow requesting a minimum and maximum
131 performance point for a task to run on, it mus    131 performance point for a task to run on, it must be able to influence the
132 frequency selection as well as task placement     132 frequency selection as well as task placement to be most effective. Both of
133 which have implications on the utilization val    133 which have implications on the utilization value at CPU runqueue (rq for short)
134 level, which brings us to the main design chal    134 level, which brings us to the main design challenge.
135                                                   135 
136 When a task wakes up on an rq, the utilization    136 When a task wakes up on an rq, the utilization signal of the rq will be
137 affected by the uclamp settings of all the tas    137 affected by the uclamp settings of all the tasks enqueued on it. For example if
138 a task requests to run at UTIL_MIN = 512, then    138 a task requests to run at UTIL_MIN = 512, then the util signal of the rq needs
139 to respect to this request as well as all othe    139 to respect to this request as well as all other requests from all of the
140 enqueued tasks.                                   140 enqueued tasks.
141                                                   141 
142 To be able to aggregate the util clamp value o    142 To be able to aggregate the util clamp value of all the tasks attached to the
143 rq, uclamp must do some housekeeping at every     143 rq, uclamp must do some housekeeping at every enqueue/dequeue, which is the
144 scheduler hot path. Hence care must be taken s    144 scheduler hot path. Hence care must be taken since any slow down will have
145 significant impact on a lot of use cases and c    145 significant impact on a lot of use cases and could hinder its usability in
146 practice.                                         146 practice.
147                                                   147 
148 The way this is handled is by dividing the uti    148 The way this is handled is by dividing the utilization range into buckets
149 (struct uclamp_bucket) which allows us to redu    149 (struct uclamp_bucket) which allows us to reduce the search space from every
150 task on the rq to only a subset of tasks on th    150 task on the rq to only a subset of tasks on the top-most bucket.
151                                                   151 
152 When a task is enqueued, the counter in the ma    152 When a task is enqueued, the counter in the matching bucket is incremented,
153 and on dequeue it is decremented. This makes k    153 and on dequeue it is decremented. This makes keeping track of the effective
154 uclamp value at rq level a lot easier.            154 uclamp value at rq level a lot easier.
155                                                   155 
156 As tasks are enqueued and dequeued, we keep tr    156 As tasks are enqueued and dequeued, we keep track of the current effective
157 uclamp value of the rq. See :ref:`section 2.1     157 uclamp value of the rq. See :ref:`section 2.1 <uclamp-buckets>` for details on
158 how this works.                                   158 how this works.
159                                                   159 
160 Later at any path that wants to identify the e    160 Later at any path that wants to identify the effective uclamp value of the rq,
161 it will simply need to read this effective ucl    161 it will simply need to read this effective uclamp value of the rq at that exact
162 moment of time it needs to take a decision.       162 moment of time it needs to take a decision.
163                                                   163 
164 For task placement case, only Energy Aware and    164 For task placement case, only Energy Aware and Capacity Aware Scheduling
165 (EAS/CAS) make use of uclamp for now, which im    165 (EAS/CAS) make use of uclamp for now, which implies that it is applied on
166 heterogeneous systems only.                       166 heterogeneous systems only.
167 When a task wakes up, the scheduler will look     167 When a task wakes up, the scheduler will look at the current effective uclamp
168 value of every rq and compare it with the pote    168 value of every rq and compare it with the potential new value if the task were
169 to be enqueued there. Favoring the rq that wil    169 to be enqueued there. Favoring the rq that will end up with the most energy
170 efficient combination.                            170 efficient combination.
171                                                   171 
172 Similarly in schedutil, when it needs to make     172 Similarly in schedutil, when it needs to make a frequency update it will look
173 at the current effective uclamp value of the r    173 at the current effective uclamp value of the rq which is influenced by the set
174 of tasks currently enqueued there and select t    174 of tasks currently enqueued there and select the appropriate frequency that
175 will satisfy constraints from requests.           175 will satisfy constraints from requests.
176                                                   176 
177 Other paths like setting overutilization state    177 Other paths like setting overutilization state (which effectively disables EAS)
178 make use of uclamp as well. Such cases are con    178 make use of uclamp as well. Such cases are considered necessary housekeeping to
179 allow the 2 main use cases above and will not     179 allow the 2 main use cases above and will not be covered in detail here as they
180 could change with implementation details.         180 could change with implementation details.
181                                                   181 
182 .. _uclamp-buckets:                               182 .. _uclamp-buckets:
183                                                   183 
184 2.1. Buckets                                      184 2.1. Buckets
185 ------------                                      185 ------------
186                                                   186 
187 ::                                                187 ::
188                                                   188 
189                            [struct rq]            189                            [struct rq]
190                                                   190 
191   (bottom)                                        191   (bottom)                                                    (top)
192                                                   192 
193     0                                             193     0                                                          1024
194     |                                             194     |                                                           |
195     +-----------+-----------+-----------+----     195     +-----------+-----------+-----------+----   ----+-----------+
196     |  Bucket 0 |  Bucket 1 |  Bucket 2 |    .    196     |  Bucket 0 |  Bucket 1 |  Bucket 2 |    ...    |  Bucket N |
197     +-----------+-----------+-----------+----     197     +-----------+-----------+-----------+----   ----+-----------+
198        :           :                              198        :           :                                   :
199        +- p0       +- p3                          199        +- p0       +- p3                               +- p4
200        :                                          200        :                                               :
201        +- p1                                      201        +- p1                                           +- p5
202        :                                          202        :
203        +- p2                                      203        +- p2
204                                                   204 
205                                                   205 
206 .. note::                                         206 .. note::
207   The diagram above is an illustration rather     207   The diagram above is an illustration rather than a true depiction of the
208   internal data structure.                        208   internal data structure.
209                                                   209 
210 To reduce the search space when trying to deci    210 To reduce the search space when trying to decide the effective uclamp value of
211 an rq as tasks are enqueued/dequeued, the whol    211 an rq as tasks are enqueued/dequeued, the whole utilization range is divided
212 into N buckets where N is configured at compil    212 into N buckets where N is configured at compile time by setting
213 CONFIG_UCLAMP_BUCKETS_COUNT. By default it is     213 CONFIG_UCLAMP_BUCKETS_COUNT. By default it is set to 5.
214                                                   214 
215 The rq has a bucket for each uclamp_id tunable    215 The rq has a bucket for each uclamp_id tunables: [UCLAMP_MIN, UCLAMP_MAX].
216                                                   216 
217 The range of each bucket is 1024/N. For exampl    217 The range of each bucket is 1024/N. For example, for the default value of
218 5 there will be 5 buckets, each of which will     218 5 there will be 5 buckets, each of which will cover the following range:
219                                                   219 
220 ::                                                220 ::
221                                                   221 
222         DELTA = round_closest(1024/5) = 204.8     222         DELTA = round_closest(1024/5) = 204.8 = 205
223                                                   223 
224         Bucket 0: [0:204]                         224         Bucket 0: [0:204]
225         Bucket 1: [205:409]                       225         Bucket 1: [205:409]
226         Bucket 2: [410:614]                       226         Bucket 2: [410:614]
227         Bucket 3: [615:819]                       227         Bucket 3: [615:819]
228         Bucket 4: [820:1024]                      228         Bucket 4: [820:1024]
229                                                   229 
230 When a task p with following tunable parameter    230 When a task p with following tunable parameters
231                                                   231 
232 ::                                                232 ::
233                                                   233 
234         p->uclamp[UCLAMP_MIN] = 300               234         p->uclamp[UCLAMP_MIN] = 300
235         p->uclamp[UCLAMP_MAX] = 1024              235         p->uclamp[UCLAMP_MAX] = 1024
236                                                   236 
237 is enqueued into the rq, bucket 1 will be incr    237 is enqueued into the rq, bucket 1 will be incremented for UCLAMP_MIN and bucket
238 4 will be incremented for UCLAMP_MAX to reflec    238 4 will be incremented for UCLAMP_MAX to reflect the fact the rq has a task in
239 this range.                                       239 this range.
240                                                   240 
241 The rq then keeps track of its current effecti    241 The rq then keeps track of its current effective uclamp value for each
242 uclamp_id.                                        242 uclamp_id.
243                                                   243 
244 When a task p is enqueued, the rq value change    244 When a task p is enqueued, the rq value changes to:
245                                                   245 
246 ::                                                246 ::
247                                                   247 
248         // update bucket logic goes here          248         // update bucket logic goes here
249         rq->uclamp[UCLAMP_MIN] = max(rq->uclam    249         rq->uclamp[UCLAMP_MIN] = max(rq->uclamp[UCLAMP_MIN], p->uclamp[UCLAMP_MIN])
250         // repeat for UCLAMP_MAX                  250         // repeat for UCLAMP_MAX
251                                                   251 
252 Similarly, when p is dequeued the rq value cha    252 Similarly, when p is dequeued the rq value changes to:
253                                                   253 
254 ::                                                254 ::
255                                                   255 
256         // update bucket logic goes here          256         // update bucket logic goes here
257         rq->uclamp[UCLAMP_MIN] = search_top_bu    257         rq->uclamp[UCLAMP_MIN] = search_top_bucket_for_highest_value()
258         // repeat for UCLAMP_MAX                  258         // repeat for UCLAMP_MAX
259                                                   259 
260 When all buckets are empty, the rq uclamp valu    260 When all buckets are empty, the rq uclamp values are reset to system defaults.
261 See :ref:`section 3.4 <uclamp-default-values>`    261 See :ref:`section 3.4 <uclamp-default-values>` for details on default values.
262                                                   262 
263                                                   263 
264 2.2. Max aggregation                              264 2.2. Max aggregation
265 --------------------                              265 --------------------
266                                                   266 
267 Util clamp is tuned to honour the request for     267 Util clamp is tuned to honour the request for the task that requires the
268 highest performance point.                        268 highest performance point.
269                                                   269 
270 When multiple tasks are attached to the same r    270 When multiple tasks are attached to the same rq, then util clamp must make sure
271 the task that needs the highest performance po    271 the task that needs the highest performance point gets it even if there's
272 another task that doesn't need it or is disall    272 another task that doesn't need it or is disallowed from reaching this point.
273                                                   273 
274 For example, if there are multiple tasks attac    274 For example, if there are multiple tasks attached to an rq with the following
275 values:                                           275 values:
276                                                   276 
277 ::                                                277 ::
278                                                   278 
279         p0->uclamp[UCLAMP_MIN] = 300              279         p0->uclamp[UCLAMP_MIN] = 300
280         p0->uclamp[UCLAMP_MAX] = 900              280         p0->uclamp[UCLAMP_MAX] = 900
281                                                   281 
282         p1->uclamp[UCLAMP_MIN] = 500              282         p1->uclamp[UCLAMP_MIN] = 500
283         p1->uclamp[UCLAMP_MAX] = 500              283         p1->uclamp[UCLAMP_MAX] = 500
284                                                   284 
285 then assuming both p0 and p1 are enqueued to t    285 then assuming both p0 and p1 are enqueued to the same rq, both UCLAMP_MIN
286 and UCLAMP_MAX become:                            286 and UCLAMP_MAX become:
287                                                   287 
288 ::                                                288 ::
289                                                   289 
290         rq->uclamp[UCLAMP_MIN] = max(300, 500)    290         rq->uclamp[UCLAMP_MIN] = max(300, 500) = 500
291         rq->uclamp[UCLAMP_MAX] = max(900, 500)    291         rq->uclamp[UCLAMP_MAX] = max(900, 500) = 900
292                                                   292 
293 As we shall see in :ref:`section 5.1 <uclamp-c    293 As we shall see in :ref:`section 5.1 <uclamp-capping-fail>`, this max
294 aggregation is the cause of one of limitations    294 aggregation is the cause of one of limitations when using util clamp, in
295 particular for UCLAMP_MAX hint when user space    295 particular for UCLAMP_MAX hint when user space would like to save power.
296                                                   296 
297 2.3. Hierarchical aggregation                     297 2.3. Hierarchical aggregation
298 -----------------------------                     298 -----------------------------
299                                                   299 
300 As stated earlier, util clamp is a property of    300 As stated earlier, util clamp is a property of every task in the system. But
301 the actual applied (effective) value can be in    301 the actual applied (effective) value can be influenced by more than just the
302 request made by the task or another actor on i    302 request made by the task or another actor on its behalf (middleware library).
303                                                   303 
304 The effective util clamp value of any task is     304 The effective util clamp value of any task is restricted as follows:
305                                                   305 
306   1. By the uclamp settings defined by the cgr    306   1. By the uclamp settings defined by the cgroup CPU controller it is attached
307      to, if any.                                  307      to, if any.
308   2. The restricted value in (1) is then furth    308   2. The restricted value in (1) is then further restricted by the system wide
309      uclamp settings.                             309      uclamp settings.
310                                                   310 
311 :ref:`Section 3 <uclamp-interfaces>` discusses    311 :ref:`Section 3 <uclamp-interfaces>` discusses the interfaces and will expand
312 further on that.                                  312 further on that.
313                                                   313 
314 For now suffice to say that if a task makes a     314 For now suffice to say that if a task makes a request, its actual effective
315 value will have to adhere to some restrictions    315 value will have to adhere to some restrictions imposed by cgroup and system
316 wide settings.                                    316 wide settings.
317                                                   317 
318 The system will still accept the request even     318 The system will still accept the request even if effectively will be beyond the
319 constraints, but as soon as the task moves to     319 constraints, but as soon as the task moves to a different cgroup or a sysadmin
320 modifies the system settings, the request will    320 modifies the system settings, the request will be satisfied only if it is
321 within new constraints.                           321 within new constraints.
322                                                   322 
323 In other words, this aggregation will not caus    323 In other words, this aggregation will not cause an error when a task changes
324 its uclamp values, but rather the system may n    324 its uclamp values, but rather the system may not be able to satisfy requests
325 based on those factors.                           325 based on those factors.
326                                                   326 
327 2.4. Range                                        327 2.4. Range
328 ----------                                        328 ----------
329                                                   329 
330 Uclamp performance request has the range of 0     330 Uclamp performance request has the range of 0 to 1024 inclusive.
331                                                   331 
332 For cgroup interface percentage is used (that     332 For cgroup interface percentage is used (that is 0 to 100 inclusive).
333 Just like other cgroup interfaces, you can use    333 Just like other cgroup interfaces, you can use 'max' instead of 100.
334                                                   334 
335 .. _uclamp-interfaces:                            335 .. _uclamp-interfaces:
336                                                   336 
337 3. Interfaces                                     337 3. Interfaces
338 =============                                     338 =============
339                                                   339 
340 3.1. Per task interface                           340 3.1. Per task interface
341 -----------------------                           341 -----------------------
342                                                   342 
343 sched_setattr() syscall was extended to accept    343 sched_setattr() syscall was extended to accept two new fields:
344                                                   344 
345 * sched_util_min: requests the minimum perform    345 * sched_util_min: requests the minimum performance point the system should run
346   at when this task is running. Or lower perfo    346   at when this task is running. Or lower performance bound.
347 * sched_util_max: requests the maximum perform    347 * sched_util_max: requests the maximum performance point the system should run
348   at when this task is running. Or upper perfo    348   at when this task is running. Or upper performance bound.
349                                                   349 
350 For example, the following scenario have 40% t    350 For example, the following scenario have 40% to 80% utilization constraints:
351                                                   351 
352 ::                                                352 ::
353                                                   353 
354         attr->sched_util_min = 40% * 1024;        354         attr->sched_util_min = 40% * 1024;
355         attr->sched_util_max = 80% * 1024;        355         attr->sched_util_max = 80% * 1024;
356                                                   356 
357 When task @p is running, **the scheduler shoul    357 When task @p is running, **the scheduler should try its best to ensure it
358 starts at 40% performance level**. If the task    358 starts at 40% performance level**. If the task runs for a long enough time so
359 that its actual utilization goes above 80%, th    359 that its actual utilization goes above 80%, the utilization, or performance
360 level, will be capped.                            360 level, will be capped.
361                                                   361 
362 The special value -1 is used to reset the ucla    362 The special value -1 is used to reset the uclamp settings to the system
363 default.                                          363 default.
364                                                   364 
365 Note that resetting the uclamp value to system    365 Note that resetting the uclamp value to system default using -1 is not the same
366 as manually setting uclamp value to system def    366 as manually setting uclamp value to system default. This distinction is
367 important because as we shall see in system in    367 important because as we shall see in system interfaces, the default value for
368 RT could be changed. SCHED_NORMAL/OTHER might     368 RT could be changed. SCHED_NORMAL/OTHER might gain similar knobs too in the
369 future.                                           369 future.
370                                                   370 
371 3.2. cgroup interface                             371 3.2. cgroup interface
372 ---------------------                             372 ---------------------
373                                                   373 
374 There are two uclamp related values in the CPU    374 There are two uclamp related values in the CPU cgroup controller:
375                                                   375 
376 * cpu.uclamp.min                                  376 * cpu.uclamp.min
377 * cpu.uclamp.max                                  377 * cpu.uclamp.max
378                                                   378 
379 When a task is attached to a CPU controller, i    379 When a task is attached to a CPU controller, its uclamp values will be impacted
380 as follows:                                       380 as follows:
381                                                   381 
382 * cpu.uclamp.min is a protection as described     382 * cpu.uclamp.min is a protection as described in :ref:`section 3-3 of cgroup
383   v2 documentation <cgroupv2-protections-distr    383   v2 documentation <cgroupv2-protections-distributor>`.
384                                                   384 
385   If a task uclamp_min value is lower than cpu    385   If a task uclamp_min value is lower than cpu.uclamp.min, then the task will
386   inherit the cgroup cpu.uclamp.min value.        386   inherit the cgroup cpu.uclamp.min value.
387                                                   387 
388   In a cgroup hierarchy, effective cpu.uclamp.    388   In a cgroup hierarchy, effective cpu.uclamp.min is the max of (child,
389   parent).                                        389   parent).
390                                                   390 
391 * cpu.uclamp.max is a limit as described in :r    391 * cpu.uclamp.max is a limit as described in :ref:`section 3-2 of cgroup v2
392   documentation <cgroupv2-limits-distributor>`    392   documentation <cgroupv2-limits-distributor>`.
393                                                   393 
394   If a task uclamp_max value is higher than cp    394   If a task uclamp_max value is higher than cpu.uclamp.max, then the task will
395   inherit the cgroup cpu.uclamp.max value.        395   inherit the cgroup cpu.uclamp.max value.
396                                                   396 
397   In a cgroup hierarchy, effective cpu.uclamp.    397   In a cgroup hierarchy, effective cpu.uclamp.max is the min of (child,
398   parent).                                        398   parent).
399                                                   399 
400 For example, given following parameters:          400 For example, given following parameters:
401                                                   401 
402 ::                                                402 ::
403                                                   403 
404         p0->uclamp[UCLAMP_MIN] = // system def    404         p0->uclamp[UCLAMP_MIN] = // system default;
405         p0->uclamp[UCLAMP_MAX] = // system def    405         p0->uclamp[UCLAMP_MAX] = // system default;
406                                                   406 
407         p1->uclamp[UCLAMP_MIN] = 40% * 1024;      407         p1->uclamp[UCLAMP_MIN] = 40% * 1024;
408         p1->uclamp[UCLAMP_MAX] = 50% * 1024;      408         p1->uclamp[UCLAMP_MAX] = 50% * 1024;
409                                                   409 
410         cgroup0->cpu.uclamp.min = 20% * 1024;     410         cgroup0->cpu.uclamp.min = 20% * 1024;
411         cgroup0->cpu.uclamp.max = 60% * 1024;     411         cgroup0->cpu.uclamp.max = 60% * 1024;
412                                                   412 
413         cgroup1->cpu.uclamp.min = 60% * 1024;     413         cgroup1->cpu.uclamp.min = 60% * 1024;
414         cgroup1->cpu.uclamp.max = 100% * 1024;    414         cgroup1->cpu.uclamp.max = 100% * 1024;
415                                                   415 
416 when p0 and p1 are attached to cgroup0, the va    416 when p0 and p1 are attached to cgroup0, the values become:
417                                                   417 
418 ::                                                418 ::
419                                                   419 
420         p0->uclamp[UCLAMP_MIN] = cgroup0->cpu.    420         p0->uclamp[UCLAMP_MIN] = cgroup0->cpu.uclamp.min = 20% * 1024;
421         p0->uclamp[UCLAMP_MAX] = cgroup0->cpu.    421         p0->uclamp[UCLAMP_MAX] = cgroup0->cpu.uclamp.max = 60% * 1024;
422                                                   422 
423         p1->uclamp[UCLAMP_MIN] = 40% * 1024; /    423         p1->uclamp[UCLAMP_MIN] = 40% * 1024; // intact
424         p1->uclamp[UCLAMP_MAX] = 50% * 1024; /    424         p1->uclamp[UCLAMP_MAX] = 50% * 1024; // intact
425                                                   425 
426 when p0 and p1 are attached to cgroup1, these     426 when p0 and p1 are attached to cgroup1, these instead become:
427                                                   427 
428 ::                                                428 ::
429                                                   429 
430         p0->uclamp[UCLAMP_MIN] = cgroup1->cpu.    430         p0->uclamp[UCLAMP_MIN] = cgroup1->cpu.uclamp.min = 60% * 1024;
431         p0->uclamp[UCLAMP_MAX] = cgroup1->cpu.    431         p0->uclamp[UCLAMP_MAX] = cgroup1->cpu.uclamp.max = 100% * 1024;
432                                                   432 
433         p1->uclamp[UCLAMP_MIN] = cgroup1->cpu.    433         p1->uclamp[UCLAMP_MIN] = cgroup1->cpu.uclamp.min = 60% * 1024;
434         p1->uclamp[UCLAMP_MAX] = 50% * 1024; /    434         p1->uclamp[UCLAMP_MAX] = 50% * 1024; // intact
435                                                   435 
436 Note that cgroup interfaces allows cpu.uclamp.    436 Note that cgroup interfaces allows cpu.uclamp.max value to be lower than
437 cpu.uclamp.min. Other interfaces don't allow t    437 cpu.uclamp.min. Other interfaces don't allow that.
438                                                   438 
439 3.3. System interface                             439 3.3. System interface
440 ---------------------                             440 ---------------------
441                                                   441 
442 3.3.1 sched_util_clamp_min                        442 3.3.1 sched_util_clamp_min
443 --------------------------                        443 --------------------------
444                                                   444 
445 System wide limit of allowed UCLAMP_MIN range.    445 System wide limit of allowed UCLAMP_MIN range. By default it is set to 1024,
446 which means that permitted effective UCLAMP_MI    446 which means that permitted effective UCLAMP_MIN range for tasks is [0:1024].
447 By changing it to 512 for example the range re    447 By changing it to 512 for example the range reduces to [0:512]. This is useful
448 to restrict how much boosting tasks are allowe    448 to restrict how much boosting tasks are allowed to acquire.
449                                                   449 
450 Requests from tasks to go above this knob valu    450 Requests from tasks to go above this knob value will still succeed, but
451 they won't be satisfied until it is more than     451 they won't be satisfied until it is more than p->uclamp[UCLAMP_MIN].
452                                                   452 
453 The value must be smaller than or equal to sch    453 The value must be smaller than or equal to sched_util_clamp_max.
454                                                   454 
455 3.3.2 sched_util_clamp_max                        455 3.3.2 sched_util_clamp_max
456 --------------------------                        456 --------------------------
457                                                   457 
458 System wide limit of allowed UCLAMP_MAX range.    458 System wide limit of allowed UCLAMP_MAX range. By default it is set to 1024,
459 which means that permitted effective UCLAMP_MA    459 which means that permitted effective UCLAMP_MAX range for tasks is [0:1024].
460                                                   460 
461 By changing it to 512 for example the effectiv    461 By changing it to 512 for example the effective allowed range reduces to
462 [0:512]. This means is that no task can run ab    462 [0:512]. This means is that no task can run above 512, which implies that all
463 rqs are restricted too. IOW, the whole system     463 rqs are restricted too. IOW, the whole system is capped to half its performance
464 capacity.                                         464 capacity.
465                                                   465 
466 This is useful to restrict the overall maximum    466 This is useful to restrict the overall maximum performance point of the system.
467 For example, it can be handy to limit performa    467 For example, it can be handy to limit performance when running low on battery
468 or when the system wants to limit access to mo    468 or when the system wants to limit access to more energy hungry performance
469 levels when it's in idle state or screen is of    469 levels when it's in idle state or screen is off.
470                                                   470 
471 Requests from tasks to go above this knob valu    471 Requests from tasks to go above this knob value will still succeed, but they
472 won't be satisfied until it is more than p->uc    472 won't be satisfied until it is more than p->uclamp[UCLAMP_MAX].
473                                                   473 
474 The value must be greater than or equal to sch    474 The value must be greater than or equal to sched_util_clamp_min.
475                                                   475 
476 .. _uclamp-default-values:                        476 .. _uclamp-default-values:
477                                                   477 
478 3.4. Default values                               478 3.4. Default values
479 -------------------                               479 -------------------
480                                                   480 
481 By default all SCHED_NORMAL/SCHED_OTHER tasks     481 By default all SCHED_NORMAL/SCHED_OTHER tasks are initialized to:
482                                                   482 
483 ::                                                483 ::
484                                                   484 
485         p_fair->uclamp[UCLAMP_MIN] = 0            485         p_fair->uclamp[UCLAMP_MIN] = 0
486         p_fair->uclamp[UCLAMP_MAX] = 1024         486         p_fair->uclamp[UCLAMP_MAX] = 1024
487                                                   487 
488 That is, by default they're boosted to run at     488 That is, by default they're boosted to run at the maximum performance point of
489 changed at boot or runtime. No argument was ma    489 changed at boot or runtime. No argument was made yet as to why we should
490 provide this, but can be added in the future.     490 provide this, but can be added in the future.
491                                                   491 
492 For SCHED_FIFO/SCHED_RR tasks:                    492 For SCHED_FIFO/SCHED_RR tasks:
493                                                   493 
494 ::                                                494 ::
495                                                   495 
496         p_rt->uclamp[UCLAMP_MIN] = 1024           496         p_rt->uclamp[UCLAMP_MIN] = 1024
497         p_rt->uclamp[UCLAMP_MAX] = 1024           497         p_rt->uclamp[UCLAMP_MAX] = 1024
498                                                   498 
499 That is by default they're boosted to run at t    499 That is by default they're boosted to run at the maximum performance point of
500 the system which retains the historical behavi    500 the system which retains the historical behavior of the RT tasks.
501                                                   501 
502 RT tasks default uclamp_min value can be modif    502 RT tasks default uclamp_min value can be modified at boot or runtime via
503 sysctl. See below section.                        503 sysctl. See below section.
504                                                   504 
505 .. _sched-util-clamp-min-rt-default:              505 .. _sched-util-clamp-min-rt-default:
506                                                   506 
507 3.4.1 sched_util_clamp_min_rt_default             507 3.4.1 sched_util_clamp_min_rt_default
508 -------------------------------------             508 -------------------------------------
509                                                   509 
510 Running RT tasks at maximum performance point     510 Running RT tasks at maximum performance point is expensive on battery powered
511 devices and not necessary. To allow system dev    511 devices and not necessary. To allow system developer to offer good performance
512 guarantees for these tasks without pushing it     512 guarantees for these tasks without pushing it all the way to maximum
513 performance point, this sysctl knob allows tun    513 performance point, this sysctl knob allows tuning the best boost value to
514 address the system requirement without burning    514 address the system requirement without burning power running at maximum
515 performance point all the time.                   515 performance point all the time.
516                                                   516 
517 Application developer are encouraged to use th    517 Application developer are encouraged to use the per task util clamp interface
518 to ensure they are performance and power aware    518 to ensure they are performance and power aware. Ideally this knob should be set
519 to 0 by system designers and leave the task of    519 to 0 by system designers and leave the task of managing performance
520 requirements to the apps.                         520 requirements to the apps.
521                                                   521 
522 4. How to use util clamp                          522 4. How to use util clamp
523 ========================                          523 ========================
524                                                   524 
525 Util clamp promotes the concept of user space     525 Util clamp promotes the concept of user space assisted power and performance
526 management. At the scheduler level there is no    526 management. At the scheduler level there is no info required to make the best
527 decision. However, with util clamp user space     527 decision. However, with util clamp user space can hint to the scheduler to make
528 better decision about task placement and frequ    528 better decision about task placement and frequency selection.
529                                                   529 
530 Best results are achieved by not making any as    530 Best results are achieved by not making any assumptions about the system the
531 application is running on and to use it in con    531 application is running on and to use it in conjunction with a feedback loop to
532 dynamically monitor and adjust. Ultimately thi    532 dynamically monitor and adjust. Ultimately this will allow for a better user
533 experience at a better perf/watt.                 533 experience at a better perf/watt.
534                                                   534 
535 For some systems and use cases, static setup w    535 For some systems and use cases, static setup will help to achieve good results.
536 Portability will be a problem in this case. Ho    536 Portability will be a problem in this case. How much work one can do at 100,
537 200 or 1024 is different for each system. Unle    537 200 or 1024 is different for each system. Unless there's a specific target
538 system, static setup should be avoided.           538 system, static setup should be avoided.
539                                                   539 
540 There are enough possibilities to create a who    540 There are enough possibilities to create a whole framework based on util clamp
541 or self contained app that makes use of it dir    541 or self contained app that makes use of it directly.
542                                                   542 
543 4.1. Boost important and DVFS-latency-sensitiv    543 4.1. Boost important and DVFS-latency-sensitive tasks
544 ----------------------------------------------    544 -----------------------------------------------------
545                                                   545 
546 A GUI task might not be busy to warrant drivin    546 A GUI task might not be busy to warrant driving the frequency high when it
547 wakes up. However, it requires to finish its w    547 wakes up. However, it requires to finish its work within a specific time window
548 to deliver the desired user experience. The ri    548 to deliver the desired user experience. The right frequency it requires at
549 wakeup will be system dependent. On some under    549 wakeup will be system dependent. On some underpowered systems it will be high,
550 on other overpowered ones it will be low or 0.    550 on other overpowered ones it will be low or 0.
551                                                   551 
552 This task can increase its UCLAMP_MIN value ev    552 This task can increase its UCLAMP_MIN value every time it misses the deadline
553 to ensure on next wake up it runs at a higher     553 to ensure on next wake up it runs at a higher performance point. It should try
554 to approach the lowest UCLAMP_MIN value that a    554 to approach the lowest UCLAMP_MIN value that allows to meet its deadline on any
555 particular system to achieve the best possible    555 particular system to achieve the best possible perf/watt for that system.
556                                                   556 
557 On heterogeneous systems, it might be importan    557 On heterogeneous systems, it might be important for this task to run on
558 a faster CPU.                                     558 a faster CPU.
559                                                   559 
560 **Generally it is advised to perceive the inpu    560 **Generally it is advised to perceive the input as performance level or point
561 which will imply both task placement and frequ    561 which will imply both task placement and frequency selection**.
562                                                   562 
563 4.2. Cap background tasks                         563 4.2. Cap background tasks
564 -------------------------                         564 -------------------------
565                                                   565 
566 Like explained for Android case in the introdu    566 Like explained for Android case in the introduction. Any app can lower
567 UCLAMP_MAX for some background tasks that don'    567 UCLAMP_MAX for some background tasks that don't care about performance but
568 could end up being busy and consume unnecessar    568 could end up being busy and consume unnecessary system resources on the system.
569                                                   569 
570 4.3. Powersave mode                               570 4.3. Powersave mode
571 -------------------                               571 -------------------
572                                                   572 
573 sched_util_clamp_max system wide interface can    573 sched_util_clamp_max system wide interface can be used to limit all tasks from
574 operating at the higher performance points whi    574 operating at the higher performance points which are usually energy
575 inefficient.                                      575 inefficient.
576                                                   576 
577 This is not unique to uclamp as one can achiev    577 This is not unique to uclamp as one can achieve the same by reducing max
578 frequency of the cpufreq governor. It can be c    578 frequency of the cpufreq governor. It can be considered a more convenient
579 alternative interface.                            579 alternative interface.
580                                                   580 
581 4.4. Per-app performance restriction              581 4.4. Per-app performance restriction
582 ------------------------------------              582 ------------------------------------
583                                                   583 
584 Middleware/Utility can provide the user an opt    584 Middleware/Utility can provide the user an option to set UCLAMP_MIN/MAX for an
585 app every time it is executed to guarantee a m    585 app every time it is executed to guarantee a minimum performance point and/or
586 limit it from draining system power at the cos    586 limit it from draining system power at the cost of reduced performance for
587 these apps.                                       587 these apps.
588                                                   588 
589 If you want to prevent your laptop from heatin    589 If you want to prevent your laptop from heating up while on the go from
590 compiling the kernel and happy to sacrifice pe    590 compiling the kernel and happy to sacrifice performance to save power, but
591 still would like to keep your browser performa    591 still would like to keep your browser performance intact, uclamp makes it
592 possible.                                         592 possible.
593                                                   593 
594 5. Limitations                                    594 5. Limitations
595 ==============                                    595 ==============
596                                                   596 
597 .. _uclamp-capping-fail:                          597 .. _uclamp-capping-fail:
598                                                   598 
599 5.1. Capping frequency with uclamp_max fails u    599 5.1. Capping frequency with uclamp_max fails under certain conditions
600 ----------------------------------------------    600 ---------------------------------------------------------------------
601                                                   601 
602 If task p0 is capped to run at 512:               602 If task p0 is capped to run at 512:
603                                                   603 
604 ::                                                604 ::
605                                                   605 
606         p0->uclamp[UCLAMP_MAX] = 512              606         p0->uclamp[UCLAMP_MAX] = 512
607                                                   607 
608 and it shares the rq with p1 which is free to     608 and it shares the rq with p1 which is free to run at any performance point:
609                                                   609 
610 ::                                                610 ::
611                                                   611 
612         p1->uclamp[UCLAMP_MAX] = 1024             612         p1->uclamp[UCLAMP_MAX] = 1024
613                                                   613 
614 then due to max aggregation the rq will be all    614 then due to max aggregation the rq will be allowed to reach max performance
615 point:                                            615 point:
616                                                   616 
617 ::                                                617 ::
618                                                   618 
619         rq->uclamp[UCLAMP_MAX] = max(512, 1024    619         rq->uclamp[UCLAMP_MAX] = max(512, 1024) = 1024
620                                                   620 
621 Assuming both p0 and p1 have UCLAMP_MIN = 0, t    621 Assuming both p0 and p1 have UCLAMP_MIN = 0, then the frequency selection for
622 the rq will depend on the actual utilization v    622 the rq will depend on the actual utilization value of the tasks.
623                                                   623 
624 If p1 is a small task but p0 is a CPU intensiv    624 If p1 is a small task but p0 is a CPU intensive task, then due to the fact that
625 both are running at the same rq, p1 will cause    625 both are running at the same rq, p1 will cause the frequency capping to be left
626 from the rq although p1, which is allowed to r    626 from the rq although p1, which is allowed to run at any performance point,
627 doesn't actually need to run at that frequency    627 doesn't actually need to run at that frequency.
628                                                   628 
629 5.2. UCLAMP_MAX can break PELT (util_avg) sign    629 5.2. UCLAMP_MAX can break PELT (util_avg) signal
630 ----------------------------------------------    630 ------------------------------------------------
631                                                   631 
632 PELT assumes that frequency will always increa    632 PELT assumes that frequency will always increase as the signals grow to ensure
633 there's always some idle time on the CPU. But     633 there's always some idle time on the CPU. But with UCLAMP_MAX, this frequency
634 increase will be prevented which can lead to n    634 increase will be prevented which can lead to no idle time in some
635 circumstances. When there's no idle time, a ta    635 circumstances. When there's no idle time, a task will stuck in a busy loop,
636 which would result in util_avg being 1024.        636 which would result in util_avg being 1024.
637                                                   637 
638 Combing with issue described below, this can l    638 Combing with issue described below, this can lead to unwanted frequency spikes
639 when severely capped tasks share the rq with a    639 when severely capped tasks share the rq with a small non capped task.
640                                                   640 
641 As an example if task p, which have:              641 As an example if task p, which have:
642                                                   642 
643 ::                                                643 ::
644                                                   644 
645         p0->util_avg = 300                        645         p0->util_avg = 300
646         p0->uclamp[UCLAMP_MAX] = 0                646         p0->uclamp[UCLAMP_MAX] = 0
647                                                   647 
648 wakes up on an idle CPU, then it will run at m    648 wakes up on an idle CPU, then it will run at min frequency (Fmin) this
649 CPU is capable of. The max CPU frequency (Fmax    649 CPU is capable of. The max CPU frequency (Fmax) matters here as well,
650 since it designates the shortest computational    650 since it designates the shortest computational time to finish the task's
651 work on this CPU.                                 651 work on this CPU.
652                                                   652 
653 ::                                                653 ::
654                                                   654 
655         rq->uclamp[UCLAMP_MAX] = 0                655         rq->uclamp[UCLAMP_MAX] = 0
656                                                   656 
657 If the ratio of Fmax/Fmin is 3, then maximum v    657 If the ratio of Fmax/Fmin is 3, then maximum value will be:
658                                                   658 
659 ::                                                659 ::
660                                                   660 
661         300 * (Fmax/Fmin) = 900                   661         300 * (Fmax/Fmin) = 900
662                                                   662 
663 which indicates the CPU will still see idle ti    663 which indicates the CPU will still see idle time since 900 is < 1024. The
664 _actual_ util_avg will not be 900 though, but     664 _actual_ util_avg will not be 900 though, but somewhere between 300 and 900. As
665 long as there's idle time, p->util_avg updates    665 long as there's idle time, p->util_avg updates will be off by a some margin,
666 but not proportional to Fmax/Fmin.                666 but not proportional to Fmax/Fmin.
667                                                   667 
668 ::                                                668 ::
669                                                   669 
670         p0->util_avg = 300 + small_error          670         p0->util_avg = 300 + small_error
671                                                   671 
672 Now if the ratio of Fmax/Fmin is 4, the maximu    672 Now if the ratio of Fmax/Fmin is 4, the maximum value becomes:
673                                                   673 
674 ::                                                674 ::
675                                                   675 
676         300 * (Fmax/Fmin) = 1200                  676         300 * (Fmax/Fmin) = 1200
677                                                   677 
678 which is higher than 1024 and indicates that t    678 which is higher than 1024 and indicates that the CPU has no idle time. When
679 this happens, then the _actual_ util_avg will     679 this happens, then the _actual_ util_avg will become:
680                                                   680 
681 ::                                                681 ::
682                                                   682 
683         p0->util_avg = 1024                       683         p0->util_avg = 1024
684                                                   684 
685 If task p1 wakes up on this CPU, which have:      685 If task p1 wakes up on this CPU, which have:
686                                                   686 
687 ::                                                687 ::
688                                                   688 
689         p1->util_avg = 200                        689         p1->util_avg = 200
690         p1->uclamp[UCLAMP_MAX] = 1024             690         p1->uclamp[UCLAMP_MAX] = 1024
691                                                   691 
692 then the effective UCLAMP_MAX for the CPU will    692 then the effective UCLAMP_MAX for the CPU will be 1024 according to max
693 aggregation rule. But since the capped p0 task    693 aggregation rule. But since the capped p0 task was running and throttled
694 severely, then the rq->util_avg will be:          694 severely, then the rq->util_avg will be:
695                                                   695 
696 ::                                                696 ::
697                                                   697 
698         p0->util_avg = 1024                       698         p0->util_avg = 1024
699         p1->util_avg = 200                        699         p1->util_avg = 200
700                                                   700 
701         rq->util_avg = 1024                       701         rq->util_avg = 1024
702         rq->uclamp[UCLAMP_MAX] = 1024             702         rq->uclamp[UCLAMP_MAX] = 1024
703                                                   703 
704 Hence lead to a frequency spike since if p0 wa    704 Hence lead to a frequency spike since if p0 wasn't throttled we should get:
705                                                   705 
706 ::                                                706 ::
707                                                   707 
708         p0->util_avg = 300                        708         p0->util_avg = 300
709         p1->util_avg = 200                        709         p1->util_avg = 200
710                                                   710 
711         rq->util_avg = 500                        711         rq->util_avg = 500
712                                                   712 
713 and run somewhere near mid performance point o    713 and run somewhere near mid performance point of that CPU, not the Fmax we get.
714                                                   714 
715 5.3. Schedutil response time issues               715 5.3. Schedutil response time issues
716 -----------------------------------               716 -----------------------------------
717                                                   717 
718 schedutil has three limitations:                  718 schedutil has three limitations:
719                                                   719 
720         1. Hardware takes non-zero time to res    720         1. Hardware takes non-zero time to respond to any frequency change
721            request. On some platforms can be i    721            request. On some platforms can be in the order of few ms.
722         2. Non fast-switch systems require a w    722         2. Non fast-switch systems require a worker deadline thread to wake up
723            and perform the frequency change, w    723            and perform the frequency change, which adds measurable overhead.
724         3. schedutil rate_limit_us drops any r    724         3. schedutil rate_limit_us drops any requests during this rate_limit_us
725            window.                                725            window.
726                                                   726 
727 If a relatively small task is doing critical j    727 If a relatively small task is doing critical job and requires a certain
728 performance point when it wakes up and starts     728 performance point when it wakes up and starts running, then all these
729 limitations will prevent it from getting what     729 limitations will prevent it from getting what it wants in the time scale it
730 expects.                                          730 expects.
731                                                   731 
732 This limitation is not only impactful when usi    732 This limitation is not only impactful when using uclamp, but will be more
733 prevalent as we no longer gradually ramp up or    733 prevalent as we no longer gradually ramp up or down. We could easily be
734 jumping between frequencies depending on the o    734 jumping between frequencies depending on the order tasks wake up, and their
735 respective uclamp values.                         735 respective uclamp values.
736                                                   736 
737 We regard that as a limitation of the capabili    737 We regard that as a limitation of the capabilities of the underlying system
738 itself.                                           738 itself.
739                                                   739 
740 There is room to improve the behavior of sched    740 There is room to improve the behavior of schedutil rate_limit_us, but not much
741 to be done for 1 or 2. They are considered har    741 to be done for 1 or 2. They are considered hard limitations of the system.
~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.
TOMOYO Linux Cross Reference Linux/Documentation/scheduler/sched-util-clamp.rst

Diff markup

Differences between /Documentation/scheduler/sched-util-clamp.rst (Architecture ppc) and /Documentation/scheduler/sched-util-clamp.rst (Architecture i386)

TOMOYO Linux Cross Reference
Linux/Documentation/scheduler/sched-util-clamp.rst