~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/scheduler/sched-util-clamp.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/scheduler/sched-util-clamp.rst (Version linux-6.12-rc7) and /Documentation/scheduler/sched-util-clamp.rst (Version linux-5.0.21)


  1 .. SPDX-License-Identifier: GPL-2.0               
  2                                                   
  3 ====================                              
  4 Utilization Clamping                              
  5 ====================                              
  6                                                   
  7 1. Introduction                                   
  8 ===============                                   
  9                                                   
 10 Utilization clamping, also known as util clamp    
 11 feature that allows user space to help in mana    
 12 of tasks. It was introduced in v5.3 release. T    
 13 v5.4.                                             
 14                                                   
 15 Uclamp is a hinting mechanism that allows the     
 16 performance requirements and restrictions of t    
 17 scheduler to make a better decision. And when     
 18 used, util clamp will influence the CPU freque    
 19                                                   
 20 Since the scheduler and schedutil are both dri    
 21 util clamp acts on that to achieve its goal by    
 22 point; hence the name. That is, by clamping ut    
 23 system run at a certain performance point.        
 24                                                   
 25 The right way to view util clamp is as a mecha    
 26 performance constraints. It consists of two tu    
 27                                                   
 28         * UCLAMP_MIN, which sets the lower bou    
 29         * UCLAMP_MAX, which sets the upper bou    
 30                                                   
 31 These two bounds will ensure a task will opera    
 32 of the system. UCLAMP_MIN implies boosting a t    
 33 capping a task.                                   
 34                                                   
 35 One can tell the system (scheduler) that some     
 36 performance point to operate at to deliver the    
 37 can tell the system that some tasks should be     
 38 much resources and should not go above a speci    
 39 the uclamp values as performance points rather    
 40 abstraction from user space point of view.        
 41                                                   
 42 As an example, a game can use util clamp to fo    
 43 perceived Frames Per Second (FPS). It can dyna    
 44 performance point required by its display pipe    
 45 dropped. It can also dynamically 'prime' up th    
 46 coming few hundred milliseconds a computationa    
 47 happen.                                           
 48                                                   
 49 On mobile hardware where the capability of the    
 50 dynamic feedback loop offers a great flexibili    
 51 given the capabilities of any system.             
 52                                                   
 53 Of course a static configuration is possible t    
 54 on the system, application and the desired out    
 55                                                   
 56 Another example is in Android where tasks are     
 57 foreground, top-app, etc. Util clamp can be us    
 58 resources background tasks are consuming by ca    
 59 can run at. This constraint helps reserve reso    
 60 the ones belonging to the currently active app    
 61 helps in limiting how much power they consume.    
 62 heterogeneous systems (e.g. Arm big.LITTLE); t    
 63 background tasks to stay on the little cores w    
 64                                                   
 65         1. The big cores are free to run top-a    
 66            tasks are the tasks the user is cur    
 67            the most important tasks in the sys    
 68         2. They don't run on a power hungry co    
 69            are CPU intensive tasks.               
 70                                                   
 71 .. note::                                         
 72   **little cores**:                               
 73     CPUs with capacity < 1024                     
 74                                                   
 75   **big cores**:                                  
 76     CPUs with capacity = 1024                     
 77                                                   
 78 By making these uclamp performance requests, o    
 79 ensure system resources are used optimally to     
 80 experience.                                       
 81                                                   
 82 Another use case is to help with **overcoming     
 83 how scheduler utilization signal is calculated    
 84                                                   
 85 On the other hand, a busy task for instance th    
 86 performance point will suffer a delay of ~200m    
 87 scheduler to realize that. This is known to af    
 88 mobile devices where frames will drop due to s    
 89 higher frequency required for the tasks to fin    
 90 UCLAMP_MIN=1024 will ensure such tasks will al    
 91 level when they start running.                    
 92                                                   
 93 The overall visible effect goes beyond better     
 94 experience/performance and stretches to help a    
 95 performance/watt if used effectively.             
 96                                                   
 97 User space can form a feedback loop with the t    
 98 the device doesn't heat up to the point where     
 99                                                   
100 Both SCHED_NORMAL/OTHER and SCHED_FIFO/RR hono    
101                                                   
102 In the SCHED_FIFO/RR case, uclamp gives the op    
103 performance point rather than being tied to MA    
104 can be useful on general purpose systems that     
105                                                   
106 Note that by design RT tasks don't have per-ta    
107 run at a constant frequency to combat undeterm    
108                                                   
109 Note that using schedutil always implies a sin    
110 when an RT task wakes up. This cost is unchang    
111 helps picking what frequency to request instea    
112 MAX for all RT tasks.                             
113                                                   
114 See :ref:`section 3.4 <uclamp-default-values>`    
115 :ref:`3.4.1 <sched-util-clamp-min-rt-default>`    
116 default value.                                    
117                                                   
118 2. Design                                         
119 =========                                         
120                                                   
121 Util clamp is a property of every task in the     
122 its utilization signal; acting as a bias mecha    
123 decisions within the scheduler.                   
124                                                   
125 The actual utilization signal of a task is nev    
126 inspect PELT signals at any point of time you     
127 they are intact. Clamping happens only when ne    
128 and the scheduler needs to select a suitable C    
129                                                   
130 Since the goal of util clamp is to allow reque    
131 performance point for a task to run on, it mus    
132 frequency selection as well as task placement     
133 which have implications on the utilization val    
134 level, which brings us to the main design chal    
135                                                   
136 When a task wakes up on an rq, the utilization    
137 affected by the uclamp settings of all the tas    
138 a task requests to run at UTIL_MIN = 512, then    
139 to respect to this request as well as all othe    
140 enqueued tasks.                                   
141                                                   
142 To be able to aggregate the util clamp value o    
143 rq, uclamp must do some housekeeping at every     
144 scheduler hot path. Hence care must be taken s    
145 significant impact on a lot of use cases and c    
146 practice.                                         
147                                                   
148 The way this is handled is by dividing the uti    
149 (struct uclamp_bucket) which allows us to redu    
150 task on the rq to only a subset of tasks on th    
151                                                   
152 When a task is enqueued, the counter in the ma    
153 and on dequeue it is decremented. This makes k    
154 uclamp value at rq level a lot easier.            
155                                                   
156 As tasks are enqueued and dequeued, we keep tr    
157 uclamp value of the rq. See :ref:`section 2.1     
158 how this works.                                   
159                                                   
160 Later at any path that wants to identify the e    
161 it will simply need to read this effective ucl    
162 moment of time it needs to take a decision.       
163                                                   
164 For task placement case, only Energy Aware and    
165 (EAS/CAS) make use of uclamp for now, which im    
166 heterogeneous systems only.                       
167 When a task wakes up, the scheduler will look     
168 value of every rq and compare it with the pote    
169 to be enqueued there. Favoring the rq that wil    
170 efficient combination.                            
171                                                   
172 Similarly in schedutil, when it needs to make     
173 at the current effective uclamp value of the r    
174 of tasks currently enqueued there and select t    
175 will satisfy constraints from requests.           
176                                                   
177 Other paths like setting overutilization state    
178 make use of uclamp as well. Such cases are con    
179 allow the 2 main use cases above and will not     
180 could change with implementation details.         
181                                                   
182 .. _uclamp-buckets:                               
183                                                   
184 2.1. Buckets                                      
185 ------------                                      
186                                                   
187 ::                                                
188                                                   
189                            [struct rq]            
190                                                   
191   (bottom)                                        
192                                                   
193     0                                             
194     |                                             
195     +-----------+-----------+-----------+----     
196     |  Bucket 0 |  Bucket 1 |  Bucket 2 |    .    
197     +-----------+-----------+-----------+----     
198        :           :                              
199        +- p0       +- p3                          
200        :                                          
201        +- p1                                      
202        :                                          
203        +- p2                                      
204                                                   
205                                                   
206 .. note::                                         
207   The diagram above is an illustration rather     
208   internal data structure.                        
209                                                   
210 To reduce the search space when trying to deci    
211 an rq as tasks are enqueued/dequeued, the whol    
212 into N buckets where N is configured at compil    
213 CONFIG_UCLAMP_BUCKETS_COUNT. By default it is     
214                                                   
215 The rq has a bucket for each uclamp_id tunable    
216                                                   
217 The range of each bucket is 1024/N. For exampl    
218 5 there will be 5 buckets, each of which will     
219                                                   
220 ::                                                
221                                                   
222         DELTA = round_closest(1024/5) = 204.8     
223                                                   
224         Bucket 0: [0:204]                         
225         Bucket 1: [205:409]                       
226         Bucket 2: [410:614]                       
227         Bucket 3: [615:819]                       
228         Bucket 4: [820:1024]                      
229                                                   
230 When a task p with following tunable parameter    
231                                                   
232 ::                                                
233                                                   
234         p->uclamp[UCLAMP_MIN] = 300               
235         p->uclamp[UCLAMP_MAX] = 1024              
236                                                   
237 is enqueued into the rq, bucket 1 will be incr    
238 4 will be incremented for UCLAMP_MAX to reflec    
239 this range.                                       
240                                                   
241 The rq then keeps track of its current effecti    
242 uclamp_id.                                        
243                                                   
244 When a task p is enqueued, the rq value change    
245                                                   
246 ::                                                
247                                                   
248         // update bucket logic goes here          
249         rq->uclamp[UCLAMP_MIN] = max(rq->uclam    
250         // repeat for UCLAMP_MAX                  
251                                                   
252 Similarly, when p is dequeued the rq value cha    
253                                                   
254 ::                                                
255                                                   
256         // update bucket logic goes here          
257         rq->uclamp[UCLAMP_MIN] = search_top_bu    
258         // repeat for UCLAMP_MAX                  
259                                                   
260 When all buckets are empty, the rq uclamp valu    
261 See :ref:`section 3.4 <uclamp-default-values>`    
262                                                   
263                                                   
264 2.2. Max aggregation                              
265 --------------------                              
266                                                   
267 Util clamp is tuned to honour the request for     
268 highest performance point.                        
269                                                   
270 When multiple tasks are attached to the same r    
271 the task that needs the highest performance po    
272 another task that doesn't need it or is disall    
273                                                   
274 For example, if there are multiple tasks attac    
275 values:                                           
276                                                   
277 ::                                                
278                                                   
279         p0->uclamp[UCLAMP_MIN] = 300              
280         p0->uclamp[UCLAMP_MAX] = 900              
281                                                   
282         p1->uclamp[UCLAMP_MIN] = 500              
283         p1->uclamp[UCLAMP_MAX] = 500              
284                                                   
285 then assuming both p0 and p1 are enqueued to t    
286 and UCLAMP_MAX become:                            
287                                                   
288 ::                                                
289                                                   
290         rq->uclamp[UCLAMP_MIN] = max(300, 500)    
291         rq->uclamp[UCLAMP_MAX] = max(900, 500)    
292                                                   
293 As we shall see in :ref:`section 5.1 <uclamp-c    
294 aggregation is the cause of one of limitations    
295 particular for UCLAMP_MAX hint when user space    
296                                                   
297 2.3. Hierarchical aggregation                     
298 -----------------------------                     
299                                                   
300 As stated earlier, util clamp is a property of    
301 the actual applied (effective) value can be in    
302 request made by the task or another actor on i    
303                                                   
304 The effective util clamp value of any task is     
305                                                   
306   1. By the uclamp settings defined by the cgr    
307      to, if any.                                  
308   2. The restricted value in (1) is then furth    
309      uclamp settings.                             
310                                                   
311 :ref:`Section 3 <uclamp-interfaces>` discusses    
312 further on that.                                  
313                                                   
314 For now suffice to say that if a task makes a     
315 value will have to adhere to some restrictions    
316 wide settings.                                    
317                                                   
318 The system will still accept the request even     
319 constraints, but as soon as the task moves to     
320 modifies the system settings, the request will    
321 within new constraints.                           
322                                                   
323 In other words, this aggregation will not caus    
324 its uclamp values, but rather the system may n    
325 based on those factors.                           
326                                                   
327 2.4. Range                                        
328 ----------                                        
329                                                   
330 Uclamp performance request has the range of 0     
331                                                   
332 For cgroup interface percentage is used (that     
333 Just like other cgroup interfaces, you can use    
334                                                   
335 .. _uclamp-interfaces:                            
336                                                   
337 3. Interfaces                                     
338 =============                                     
339                                                   
340 3.1. Per task interface                           
341 -----------------------                           
342                                                   
343 sched_setattr() syscall was extended to accept    
344                                                   
345 * sched_util_min: requests the minimum perform    
346   at when this task is running. Or lower perfo    
347 * sched_util_max: requests the maximum perform    
348   at when this task is running. Or upper perfo    
349                                                   
350 For example, the following scenario have 40% t    
351                                                   
352 ::                                                
353                                                   
354         attr->sched_util_min = 40% * 1024;        
355         attr->sched_util_max = 80% * 1024;        
356                                                   
357 When task @p is running, **the scheduler shoul    
358 starts at 40% performance level**. If the task    
359 that its actual utilization goes above 80%, th    
360 level, will be capped.                            
361                                                   
362 The special value -1 is used to reset the ucla    
363 default.                                          
364                                                   
365 Note that resetting the uclamp value to system    
366 as manually setting uclamp value to system def    
367 important because as we shall see in system in    
368 RT could be changed. SCHED_NORMAL/OTHER might     
369 future.                                           
370                                                   
371 3.2. cgroup interface                             
372 ---------------------                             
373                                                   
374 There are two uclamp related values in the CPU    
375                                                   
376 * cpu.uclamp.min                                  
377 * cpu.uclamp.max                                  
378                                                   
379 When a task is attached to a CPU controller, i    
380 as follows:                                       
381                                                   
382 * cpu.uclamp.min is a protection as described     
383   v2 documentation <cgroupv2-protections-distr    
384                                                   
385   If a task uclamp_min value is lower than cpu    
386   inherit the cgroup cpu.uclamp.min value.        
387                                                   
388   In a cgroup hierarchy, effective cpu.uclamp.    
389   parent).                                        
390                                                   
391 * cpu.uclamp.max is a limit as described in :r    
392   documentation <cgroupv2-limits-distributor>`    
393                                                   
394   If a task uclamp_max value is higher than cp    
395   inherit the cgroup cpu.uclamp.max value.        
396                                                   
397   In a cgroup hierarchy, effective cpu.uclamp.    
398   parent).                                        
399                                                   
400 For example, given following parameters:          
401                                                   
402 ::                                                
403                                                   
404         p0->uclamp[UCLAMP_MIN] = // system def    
405         p0->uclamp[UCLAMP_MAX] = // system def    
406                                                   
407         p1->uclamp[UCLAMP_MIN] = 40% * 1024;      
408         p1->uclamp[UCLAMP_MAX] = 50% * 1024;      
409                                                   
410         cgroup0->cpu.uclamp.min = 20% * 1024;     
411         cgroup0->cpu.uclamp.max = 60% * 1024;     
412                                                   
413         cgroup1->cpu.uclamp.min = 60% * 1024;     
414         cgroup1->cpu.uclamp.max = 100% * 1024;    
415                                                   
416 when p0 and p1 are attached to cgroup0, the va    
417                                                   
418 ::                                                
419                                                   
420         p0->uclamp[UCLAMP_MIN] = cgroup0->cpu.    
421         p0->uclamp[UCLAMP_MAX] = cgroup0->cpu.    
422                                                   
423         p1->uclamp[UCLAMP_MIN] = 40% * 1024; /    
424         p1->uclamp[UCLAMP_MAX] = 50% * 1024; /    
425                                                   
426 when p0 and p1 are attached to cgroup1, these     
427                                                   
428 ::                                                
429                                                   
430         p0->uclamp[UCLAMP_MIN] = cgroup1->cpu.    
431         p0->uclamp[UCLAMP_MAX] = cgroup1->cpu.    
432                                                   
433         p1->uclamp[UCLAMP_MIN] = cgroup1->cpu.    
434         p1->uclamp[UCLAMP_MAX] = 50% * 1024; /    
435                                                   
436 Note that cgroup interfaces allows cpu.uclamp.    
437 cpu.uclamp.min. Other interfaces don't allow t    
438                                                   
439 3.3. System interface                             
440 ---------------------                             
441                                                   
442 3.3.1 sched_util_clamp_min                        
443 --------------------------                        
444                                                   
445 System wide limit of allowed UCLAMP_MIN range.    
446 which means that permitted effective UCLAMP_MI    
447 By changing it to 512 for example the range re    
448 to restrict how much boosting tasks are allowe    
449                                                   
450 Requests from tasks to go above this knob valu    
451 they won't be satisfied until it is more than     
452                                                   
453 The value must be smaller than or equal to sch    
454                                                   
455 3.3.2 sched_util_clamp_max                        
456 --------------------------                        
457                                                   
458 System wide limit of allowed UCLAMP_MAX range.    
459 which means that permitted effective UCLAMP_MA    
460                                                   
461 By changing it to 512 for example the effectiv    
462 [0:512]. This means is that no task can run ab    
463 rqs are restricted too. IOW, the whole system     
464 capacity.                                         
465                                                   
466 This is useful to restrict the overall maximum    
467 For example, it can be handy to limit performa    
468 or when the system wants to limit access to mo    
469 levels when it's in idle state or screen is of    
470                                                   
471 Requests from tasks to go above this knob valu    
472 won't be satisfied until it is more than p->uc    
473                                                   
474 The value must be greater than or equal to sch    
475                                                   
476 .. _uclamp-default-values:                        
477                                                   
478 3.4. Default values                               
479 -------------------                               
480                                                   
481 By default all SCHED_NORMAL/SCHED_OTHER tasks     
482                                                   
483 ::                                                
484                                                   
485         p_fair->uclamp[UCLAMP_MIN] = 0            
486         p_fair->uclamp[UCLAMP_MAX] = 1024         
487                                                   
488 That is, by default they're boosted to run at     
489 changed at boot or runtime. No argument was ma    
490 provide this, but can be added in the future.     
491                                                   
492 For SCHED_FIFO/SCHED_RR tasks:                    
493                                                   
494 ::                                                
495                                                   
496         p_rt->uclamp[UCLAMP_MIN] = 1024           
497         p_rt->uclamp[UCLAMP_MAX] = 1024           
498                                                   
499 That is by default they're boosted to run at t    
500 the system which retains the historical behavi    
501                                                   
502 RT tasks default uclamp_min value can be modif    
503 sysctl. See below section.                        
504                                                   
505 .. _sched-util-clamp-min-rt-default:              
506                                                   
507 3.4.1 sched_util_clamp_min_rt_default             
508 -------------------------------------             
509                                                   
510 Running RT tasks at maximum performance point     
511 devices and not necessary. To allow system dev    
512 guarantees for these tasks without pushing it     
513 performance point, this sysctl knob allows tun    
514 address the system requirement without burning    
515 performance point all the time.                   
516                                                   
517 Application developer are encouraged to use th    
518 to ensure they are performance and power aware    
519 to 0 by system designers and leave the task of    
520 requirements to the apps.                         
521                                                   
522 4. How to use util clamp                          
523 ========================                          
524                                                   
525 Util clamp promotes the concept of user space     
526 management. At the scheduler level there is no    
527 decision. However, with util clamp user space     
528 better decision about task placement and frequ    
529                                                   
530 Best results are achieved by not making any as    
531 application is running on and to use it in con    
532 dynamically monitor and adjust. Ultimately thi    
533 experience at a better perf/watt.                 
534                                                   
535 For some systems and use cases, static setup w    
536 Portability will be a problem in this case. Ho    
537 200 or 1024 is different for each system. Unle    
538 system, static setup should be avoided.           
539                                                   
540 There are enough possibilities to create a who    
541 or self contained app that makes use of it dir    
542                                                   
543 4.1. Boost important and DVFS-latency-sensitiv    
544 ----------------------------------------------    
545                                                   
546 A GUI task might not be busy to warrant drivin    
547 wakes up. However, it requires to finish its w    
548 to deliver the desired user experience. The ri    
549 wakeup will be system dependent. On some under    
550 on other overpowered ones it will be low or 0.    
551                                                   
552 This task can increase its UCLAMP_MIN value ev    
553 to ensure on next wake up it runs at a higher     
554 to approach the lowest UCLAMP_MIN value that a    
555 particular system to achieve the best possible    
556                                                   
557 On heterogeneous systems, it might be importan    
558 a faster CPU.                                     
559                                                   
560 **Generally it is advised to perceive the inpu    
561 which will imply both task placement and frequ    
562                                                   
563 4.2. Cap background tasks                         
564 -------------------------                         
565                                                   
566 Like explained for Android case in the introdu    
567 UCLAMP_MAX for some background tasks that don'    
568 could end up being busy and consume unnecessar    
569                                                   
570 4.3. Powersave mode                               
571 -------------------                               
572                                                   
573 sched_util_clamp_max system wide interface can    
574 operating at the higher performance points whi    
575 inefficient.                                      
576                                                   
577 This is not unique to uclamp as one can achiev    
578 frequency of the cpufreq governor. It can be c    
579 alternative interface.                            
580                                                   
581 4.4. Per-app performance restriction              
582 ------------------------------------              
583                                                   
584 Middleware/Utility can provide the user an opt    
585 app every time it is executed to guarantee a m    
586 limit it from draining system power at the cos    
587 these apps.                                       
588                                                   
589 If you want to prevent your laptop from heatin    
590 compiling the kernel and happy to sacrifice pe    
591 still would like to keep your browser performa    
592 possible.                                         
593                                                   
594 5. Limitations                                    
595 ==============                                    
596                                                   
597 .. _uclamp-capping-fail:                          
598                                                   
599 5.1. Capping frequency with uclamp_max fails u    
600 ----------------------------------------------    
601                                                   
602 If task p0 is capped to run at 512:               
603                                                   
604 ::                                                
605                                                   
606         p0->uclamp[UCLAMP_MAX] = 512              
607                                                   
608 and it shares the rq with p1 which is free to     
609                                                   
610 ::                                                
611                                                   
612         p1->uclamp[UCLAMP_MAX] = 1024             
613                                                   
614 then due to max aggregation the rq will be all    
615 point:                                            
616                                                   
617 ::                                                
618                                                   
619         rq->uclamp[UCLAMP_MAX] = max(512, 1024    
620                                                   
621 Assuming both p0 and p1 have UCLAMP_MIN = 0, t    
622 the rq will depend on the actual utilization v    
623                                                   
624 If p1 is a small task but p0 is a CPU intensiv    
625 both are running at the same rq, p1 will cause    
626 from the rq although p1, which is allowed to r    
627 doesn't actually need to run at that frequency    
628                                                   
629 5.2. UCLAMP_MAX can break PELT (util_avg) sign    
630 ----------------------------------------------    
631                                                   
632 PELT assumes that frequency will always increa    
633 there's always some idle time on the CPU. But     
634 increase will be prevented which can lead to n    
635 circumstances. When there's no idle time, a ta    
636 which would result in util_avg being 1024.        
637                                                   
638 Combing with issue described below, this can l    
639 when severely capped tasks share the rq with a    
640                                                   
641 As an example if task p, which have:              
642                                                   
643 ::                                                
644                                                   
645         p0->util_avg = 300                        
646         p0->uclamp[UCLAMP_MAX] = 0                
647                                                   
648 wakes up on an idle CPU, then it will run at m    
649 CPU is capable of. The max CPU frequency (Fmax    
650 since it designates the shortest computational    
651 work on this CPU.                                 
652                                                   
653 ::                                                
654                                                   
655         rq->uclamp[UCLAMP_MAX] = 0                
656                                                   
657 If the ratio of Fmax/Fmin is 3, then maximum v    
658                                                   
659 ::                                                
660                                                   
661         300 * (Fmax/Fmin) = 900                   
662                                                   
663 which indicates the CPU will still see idle ti    
664 _actual_ util_avg will not be 900 though, but     
665 long as there's idle time, p->util_avg updates    
666 but not proportional to Fmax/Fmin.                
667                                                   
668 ::                                                
669                                                   
670         p0->util_avg = 300 + small_error          
671                                                   
672 Now if the ratio of Fmax/Fmin is 4, the maximu    
673                                                   
674 ::                                                
675                                                   
676         300 * (Fmax/Fmin) = 1200                  
677                                                   
678 which is higher than 1024 and indicates that t    
679 this happens, then the _actual_ util_avg will     
680                                                   
681 ::                                                
682                                                   
683         p0->util_avg = 1024                       
684                                                   
685 If task p1 wakes up on this CPU, which have:      
686                                                   
687 ::                                                
688                                                   
689         p1->util_avg = 200                        
690         p1->uclamp[UCLAMP_MAX] = 1024             
691                                                   
692 then the effective UCLAMP_MAX for the CPU will    
693 aggregation rule. But since the capped p0 task    
694 severely, then the rq->util_avg will be:          
695                                                   
696 ::                                                
697                                                   
698         p0->util_avg = 1024                       
699         p1->util_avg = 200                        
700                                                   
701         rq->util_avg = 1024                       
702         rq->uclamp[UCLAMP_MAX] = 1024             
703                                                   
704 Hence lead to a frequency spike since if p0 wa    
705                                                   
706 ::                                                
707                                                   
708         p0->util_avg = 300                        
709         p1->util_avg = 200                        
710                                                   
711         rq->util_avg = 500                        
712                                                   
713 and run somewhere near mid performance point o    
714                                                   
715 5.3. Schedutil response time issues               
716 -----------------------------------               
717                                                   
718 schedutil has three limitations:                  
719                                                   
720         1. Hardware takes non-zero time to res    
721            request. On some platforms can be i    
722         2. Non fast-switch systems require a w    
723            and perform the frequency change, w    
724         3. schedutil rate_limit_us drops any r    
725            window.                                
726                                                   
727 If a relatively small task is doing critical j    
728 performance point when it wakes up and starts     
729 limitations will prevent it from getting what     
730 expects.                                          
731                                                   
732 This limitation is not only impactful when usi    
733 prevalent as we no longer gradually ramp up or    
734 jumping between frequencies depending on the o    
735 respective uclamp values.                         
736                                                   
737 We regard that as a limitation of the capabili    
738 itself.                                           
739                                                   
740 There is room to improve the behavior of sched    
741 to be done for 1 or 2. They are considered har    
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php