~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/scheduler/sched-deadline.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/scheduler/sched-deadline.rst (Version linux-6.11.5) and /Documentation/scheduler/sched-deadline.rst (Version linux-4.17.19)


  1 ========================                          
  2 Deadline Task Scheduling                          
  3 ========================                          
  4                                                   
  5 .. CONTENTS                                       
  6                                                   
  7     0. WARNING                                    
  8     1. Overview                                   
  9     2. Scheduling algorithm                       
 10       2.1 Main algorithm                          
 11       2.2 Bandwidth reclaiming                    
 12     3. Scheduling Real-Time Tasks                 
 13       3.1 Definitions                             
 14       3.2 Schedulability Analysis for Uniproce    
 15       3.3 Schedulability Analysis for Multipro    
 16       3.4 Relationship with SCHED_DEADLINE Par    
 17     4. Bandwidth management                       
 18       4.1 System-wide settings                    
 19       4.2 Task interface                          
 20       4.3 Default behavior                        
 21       4.4 Behavior of sched_yield()               
 22     5. Tasks CPU affinity                         
 23       5.1 SCHED_DEADLINE and cpusets HOWTO        
 24     6. Future plans                               
 25     A. Test suite                                 
 26     B. Minimal main()                             
 27                                                   
 28                                                   
 29 0. WARNING                                        
 30 ==========                                        
 31                                                   
 32  Fiddling with these settings can result in an    
 33  system behavior. As for -rt (group) schedulin    
 34  know what they're doing.                         
 35                                                   
 36                                                   
 37 1. Overview                                       
 38 ===========                                       
 39                                                   
 40  The SCHED_DEADLINE policy contained inside th    
 41  basically an implementation of the Earliest D    
 42  algorithm, augmented with a mechanism (called    
 43  that makes it possible to isolate the behavio    
 44                                                   
 45                                                   
 46 2. Scheduling algorithm                           
 47 =======================                           
 48                                                   
 49 2.1 Main algorithm                                
 50 ------------------                                
 51                                                   
 52  SCHED_DEADLINE [18] uses three parameters, na    
 53  "deadline", to schedule tasks. A SCHED_DEADLI    
 54  "runtime" microseconds of execution time ever    
 55  these "runtime" microseconds are available wi    
 56  from the beginning of the period.  In order t    
 57  every time the task wakes up, the scheduler c    
 58  consistent with the guarantee (using the CBS[    
 59  scheduled using EDF[1] on these scheduling de    
 60  earliest scheduling deadline is selected for     
 61  task actually receives "runtime" time units w    
 62  "admission control" strategy (see Section "4.    
 63  (clearly, if the system is overloaded this gu    
 64                                                   
 65  Summing up, the CBS[2,3] algorithm assigns sc    
 66  that each task runs for at most its runtime e    
 67  interference between different tasks (bandwid    
 68  algorithm selects the task with the earliest     
 69  to be executed next. Thanks to this feature,     
 70  with the "traditional" real-time task model (    
 71  use the new policy.                              
 72                                                   
 73  In more details, the CBS algorithm assigns sc    
 74  tasks in the following way:                      
 75                                                   
 76   - Each SCHED_DEADLINE task is characterized     
 77     "deadline", and "period" parameters;          
 78                                                   
 79   - The state of the task is described by a "s    
 80     a "remaining runtime". These two parameter    
 81                                                   
 82   - When a SCHED_DEADLINE task wakes up (becom    
 83     the scheduler checks if::                     
 84                                                   
 85                  remaining runtime                
 86         ----------------------------------        
 87         scheduling deadline - current time        
 88                                                   
 89     then, if the scheduling deadline is smalle    
 90     this condition is verified, the scheduling    
 91     remaining runtime are re-initialized as       
 92                                                   
 93          scheduling deadline = current time +     
 94          remaining runtime = runtime              
 95                                                   
 96     otherwise, the scheduling deadline and the    
 97     left unchanged;                               
 98                                                   
 99   - When a SCHED_DEADLINE task executes for an    
100     remaining runtime is decreased as::           
101                                                   
102          remaining runtime = remaining runtime    
103                                                   
104     (technically, the runtime is decreased at     
105     task is descheduled / preempted);             
106                                                   
107   - When the remaining runtime becomes less or    
108     said to be "throttled" (also known as "dep    
109     and cannot be scheduled until its scheduli    
110     time" for this task (see next item) is set    
111     value of the scheduling deadline;             
112                                                   
113   - When the current time is equal to the repl    
114     throttled task, the scheduling deadline an    
115     updated as::                                  
116                                                   
117          scheduling deadline = scheduling dead    
118          remaining runtime = remaining runtime    
119                                                   
120  The SCHED_FLAG_DL_OVERRUN flag in sched_attr'    
121  to get informed about runtime overruns throug    
122  signals.                                         
123                                                   
124                                                   
125 2.2 Bandwidth reclaiming                          
126 ------------------------                          
127                                                   
128  Bandwidth reclaiming for deadline tasks is ba    
129  Reclamation of Unused Bandwidth) algorithm [1    
130  when flag SCHED_FLAG_RECLAIM is set.             
131                                                   
132  The following diagram illustrates the state n    
133                                                   
134                              ------------         
135                  (d)        |   Active   |        
136               ------------->|            |        
137               |             | Contending |        
138               |              ------------         
139               |                A      |           
140           ----------           |      |           
141          |          |          |      |           
142          | Inactive |          |(b)   | (a)       
143          |          |          |      |           
144           ----------           |      |           
145               A                |      V           
146               |              ------------         
147               |             |   Active   |        
148               --------------|     Non    |        
149                  (c)        | Contending |        
150                              ------------         
151                                                   
152  A task can be in one of the following states:    
153                                                   
154   - ActiveContending: if it is ready for execu    
155                                                   
156   - ActiveNonContending: if it just blocked an    
157     time;                                         
158                                                   
159   - Inactive: if it is blocked and has surpass    
160                                                   
161  State transitions:                               
162                                                   
163   (a) When a task blocks, it does not become i    
164       bandwidth cannot be immediately reclaime    
165       real-time guarantees. It therefore enter    
166       ActiveNonContending. The scheduler arms     
167       the 0-lag time, when the task's bandwidt    
168       breaking the real-time guarantees.          
169                                                   
170       The 0-lag time for a task entering the A    
171       computed as::                               
172                                                   
173                         (runtime * dl_period)     
174              deadline - ---------------------     
175                              dl_runtime           
176                                                   
177       where runtime is the remaining runtime,     
178       are the reservation parameters.             
179                                                   
180   (b) If the task wakes up before the inactive    
181       the ActiveContending state and the "inac    
182       In addition, if the task wakes up on a d    
183       the task's utilization must be removed f    
184       utilization and must be added to the new    
185       In order to avoid races between a task w    
186       "inactive timer" is running on a differe    
187       flag is used to indicate that a task is     
188       (so, the flag is set when the task block    
189       "inactive timer" fires or when the task     
190                                                   
191   (c) When the "inactive timer" fires, the tas    
192       its utilization is removed from the runq    
193                                                   
194   (d) When an inactive task wakes up, it enter    
195       its utilization is added to the active u    
196       it has been enqueued.                       
197                                                   
198  For each runqueue, the algorithm GRUB keeps t    
199                                                   
200   - Active bandwidth (running_bw): this is the    
201     tasks in active state (i.e., ActiveContend    
202                                                   
203   - Total bandwidth (this_bw): this is the sum    
204     runqueue, including the tasks in Inactive     
205                                                   
206   - Maximum usable bandwidth (max_bw): This is    
207     deadline tasks and is currently set to the    
208                                                   
209                                                   
210  The algorithm reclaims the bandwidth of the t    
211  It does so by decrementing the runtime of the    
212  to                                               
213                                                   
214            dq = -(max{ Ui, (Umax - Uinact - Ue    
215                                                   
216  where:                                           
217                                                   
218   - Ui is the bandwidth of task Ti;               
219   - Umax is the maximum reclaimable utilizatio    
220     limits);                                      
221   - Uinact is the (per runqueue) inactive util    
222     (this_bq - running_bw);                       
223   - Uextra is the (per runqueue) extra reclaim    
224     (subjected to RT throttling limits).          
225                                                   
226                                                   
227  Let's now see a trivial example of two deadli    
228  to 4 and period equal to 8 (i.e., bandwidth e    
229                                                   
230          A            Task T1                     
231          |                                        
232          |                               |        
233          |                               |        
234          |--------                       |----    
235          |       |                       V        
236          |---|---|---|---|---|---|---|---|----    
237          0   1   2   3   4   5   6   7   8        
238                                                   
239                                                   
240          A            Task T2                     
241          |                                        
242          |                               |        
243          |                               |        
244          |       ------------------------|        
245          |       |                       V        
246          |---|---|---|---|---|---|---|---|----    
247          0   1   2   3   4   5   6   7   8        
248                                                   
249                                                   
250          A            running_bw                  
251          |                                        
252        1 -----------------               -----    
253          |               |               |        
254       0.5-               -----------------        
255          |                               |        
256          |---|---|---|---|---|---|---|---|----    
257          0   1   2   3   4   5   6   7   8        
258                                                   
259                                                   
260   - Time t = 0:                                   
261                                                   
262     Both tasks are ready for execution and the    
263     Suppose Task T1 is the first task to start    
264     Since there are no inactive tasks, its run    
265                                                   
266   - Time t = 2:                                   
267                                                   
268     Suppose that task T1 blocks                   
269     Task T1 therefore enters the ActiveNonCont    
270     runtime is equal to 2, its 0-lag time is e    
271     Task T2 start execution, with runtime stil    
272     there are no inactive tasks.                  
273                                                   
274   - Time t = 4:                                   
275                                                   
276     This is the 0-lag time for Task T1. Since     
277     meantime, it enters the Inactive state. It    
278     running_bw.                                   
279     Task T2 continues its execution. However,     
280     dq = - 0.5 dt because Uinact = 0.5.           
281     Task T2 therefore reclaims the bandwidth u    
282                                                   
283   - Time t = 8:                                   
284                                                   
285     Task T1 wakes up. It enters the ActiveCont    
286     running_bw is incremented.                    
287                                                   
288                                                   
289 2.3 Energy-aware scheduling                       
290 ---------------------------                       
291                                                   
292  When cpufreq's schedutil governor is selected    
293  GRUB-PA [19] algorithm, reducing the CPU oper    
294  value that still allows to meet the deadlines    
295  implemented only for ARM architectures.          
296                                                   
297  A particular care must be taken in case the t    
298  is of the same order of magnitude of the rese    
299  setting a fixed CPU frequency results in a lo    
300                                                   
301                                                   
302 3. Scheduling Real-Time Tasks                     
303 =============================                     
304                                                   
305                                                   
306                                                   
307  ..  BIG FAT WARNING *************************    
308                                                   
309  .. warning::                                     
310                                                   
311    This section contains a (not-thorough) summ    
312    scheduling theory, and how it applies to SC    
313    The reader can "safely" skip to Section 4 i    
314    how the scheduling policy can be used. Anyw    
315    to come back here and continue reading (onc    
316    satisfied :P) to be sure of fully understan    
317                                                   
318  .. ******************************************    
319                                                   
320  There are no limitations on what kind of task    
321  scheduling discipline, even if it must be sai    
322  suited for periodic or sporadic real-time tas    
323  timing behavior, e.g., multimedia, streaming,    
324                                                   
325 3.1 Definitions                                   
326 ------------------------                          
327                                                   
328  A typical real-time task is composed of a rep    
329  (task instances, or jobs) which are activated    
330  fashion.                                         
331  Each job J_j (where J_j is the j^th job of th    
332  arrival time r_j (the time when the job start    
333  time c_j needed to finish the job, and a job     
334  is the time within which the job should be fi    
335  time max{c_j} is called "Worst Case Execution    
336  A real-time task can be periodic with period     
337  sporadic with minimum inter-arrival time P is    
338  d_j = r_j + D, where D is the task's relative    
339  Summing up, a real-time task can be described    
340                                                   
341         Task = (WCET, D, P)                       
342                                                   
343  The utilization of a real-time task is define    
344  WCET and its period (or minimum inter-arrival    
345  the fraction of CPU time needed to execute th    
346                                                   
347  If the total utilization U=sum(WCET_i/P_i) is    
348  to the number of CPUs), then the scheduler is    
349  deadlines.                                       
350  Note that total utilization is defined as the    
351  WCET_i/P_i over all the real-time tasks in th    
352  multiple real-time tasks, the parameters of t    
353  with the "_i" suffix.                            
354  Moreover, if the total utilization is larger     
355  non- real-time tasks by real-time tasks.         
356  If, instead, the total utilization is smaller    
357  tasks will not be starved and the system migh    
358  deadlines.                                       
359  As a matter of fact, in this case it is possi    
360  for tardiness (defined as the maximum between    
361  between the finishing time of a job and its a    
362  More precisely, it can be proven that using a    
363  maximum tardiness of each task is smaller or     
364                                                   
365         ((M − 1) · WCET_max − WCET_min)/(    
366                                                   
367  where WCET_max = max{WCET_i} is the maximum W    
368  is the minimum WCET, and U_max = max{WCET_i/P    
369  utilization[12].                                 
370                                                   
371 3.2 Schedulability Analysis for Uniprocessor S    
372 ----------------------------------------------    
373                                                   
374  If M=1 (uniprocessor system), or in case of p    
375  real-time task is statically assigned to one     
376  possible to formally check if all the deadlin    
377  If D_i = P_i for all tasks, then EDF is able     
378  of all the tasks executing on a CPU if and on    
379  of the tasks running on such a CPU is smaller    
380  If D_i != P_i for some task, then it is possi    
381  a task as WCET_i/min{D_i,P_i}, and EDF is abl    
382  of all the tasks running on a CPU if the sum     
383  running on such a CPU is smaller or equal tha    
384                                                   
385         sum(WCET_i / min{D_i, P_i}) <= 1          
386                                                   
387  It is important to notice that this condition    
388  necessary: there are task sets that are sched    
389  condition. For example, consider the task set    
390  Task_1=(50ms,50ms,100ms) and Task_2=(10ms,100    
391  EDF is clearly able to schedule the two tasks    
392  (Task_1 is scheduled as soon as it is release    
393  to respect its deadline; Task_2 is scheduled     
394  its response time cannot be larger than 50ms     
395                                                   
396         50 / min{50,100} + 10 / min{100, 100}     
397                                                   
398  Of course it is possible to test the exact sc    
399  D_i != P_i (checking a condition that is both    
400  but this cannot be done by comparing the tota    
401  a constant. Instead, the so called "processor    
402  computing the total amount of CPU time h(t) n    
403  respect all of their deadlines in a time inte    
404  such a time with the interval size t. If h(t)    
405  the amount of time needed by the tasks in a t    
406  smaller than the size of the interval) for al    
407  EDF is able to schedule the tasks respecting     
408  performing this check for all possible values    
409  proven[4,5,6] that it is sufficient to perfor    
410  between 0 and a maximum value L. The cited pa    
411  mathematical details and explain how to compu    
412  In any case, this kind of analysis is too com    
413  time-consuming to be performed on-line. Hence    
414  4 Linux uses an admission test based on the t    
415                                                   
416 3.3 Schedulability Analysis for Multiprocessor    
417 ----------------------------------------------    
418                                                   
419  On multiprocessor systems with global EDF sch    
420  systems), a sufficient test for schedulabilit    
421  utilizations or densities: it can be shown th    
422  sets with utilizations slightly larger than 1    
423  of the number of CPUs.                           
424                                                   
425  Consider a set {Task_1,...Task_{M+1}} of M+1     
426  CPUs, with the first task Task_1=(P,P,P) havi    
427  and WCET equal to P. The remaining M tasks Ta    
428  arbitrarily small worst case execution time (    
429  period smaller than the one of the first task    
430  activate at the same time t, global EDF sched    
431  (because their absolute deadlines are equal t    
432  smaller than the absolute deadline of Task_1,    
433  result, Task_1 can be scheduled only at time     
434  time t + e + P, after its absolute deadline.     
435  task set is U = M · e / (P - 1) + P / P = M     
436  values of e this can become very close to 1.     
437  effect"[7]. Note: the example in the original    
438  slightly simplified here (for example, Dhall     
439  lim_{e->0}U).                                    
440                                                   
441  More complex schedulability tests for global     
442  real-time literature[8,9], but they are not b    
443  between total utilization (or density) and a     
444  have D_i = P_i, a sufficient schedulability c    
445  a simple way:                                    
446                                                   
447         sum(WCET_i / P_i) <= M - (M - 1) · U_    
448                                                   
449  where U_max = max{WCET_i / P_i}[10]. Notice t    
450  M - (M - 1) · U_max becomes M - M + 1 = 1 an    
451  just confirms the Dhall's effect. A more comp    
452  about schedulability tests for multi-processo    
453  found in [11].                                   
454                                                   
455  As seen, enforcing that the total utilization    
456  guarantee that global EDF schedules the tasks    
457  (in other words, global EDF is not an optimal    
458  a total utilization smaller than M is enough     
459  tasks are not starved and that the tardiness     
460  bound[12] (as previously noted). Different bo    
461  experienced by real-time tasks have been deve    
462  but the theoretical result that is important     
463  the total utilization is smaller or equal tha    
464  the tasks are limited.                           
465                                                   
466 3.4 Relationship with SCHED_DEADLINE Parameter    
467 ----------------------------------------------    
468                                                   
469  Finally, it is important to understand the re    
470  SCHED_DEADLINE scheduling parameters describe    
471  deadline and period) and the real-time task p    
472  described in this section. Note that the task    
473  represented by its absolute deadlines d_j = r    
474  SCHED_DEADLINE schedules the tasks according     
475  Section 2).                                      
476  If an admission test is used to guarantee tha    
477  are respected, then SCHED_DEADLINE can be use    
478  guaranteeing that all the jobs' deadlines of     
479  In order to do this, a task must be scheduled    
480                                                   
481   - runtime >= WCET                               
482   - deadline = D                                  
483   - period <= P                                   
484                                                   
485  IOW, if runtime >= WCET and if period is <= P    
486  and the absolute deadlines (d_j) coincide, so    
487  allows to respect the jobs' absolute deadline    
488  called "hard schedulability property" and is     
489  Notice that if runtime > deadline the admissi    
490  this task, as it is not possible to respect i    
491                                                   
492  References:                                      
493                                                   
494   1 - C. L. Liu and J. W. Layland. Scheduling     
495       ming in a hard-real-time environment. Jo    
496       Computing Machinery, 20(1), 1973.           
497   2 - L. Abeni , G. Buttazzo. Integrating Mult    
498       Real-Time Systems. Proceedings of the 19    
499       Symposium, 1998. http://retis.sssup.it/~    
500   3 - L. Abeni. Server Mechanisms for Multimed    
501       Technical Report. http://disi.unitn.it/~    
502   4 - J. Y. Leung and M.L. Merril. A Note on P    
503       Periodic, Real-Time Tasks. Information P    
504       no. 3, pp. 115-118, 1980.                   
505   5 - S. K. Baruah, A. K. Mok and L. E. Rosier    
506       Hard-Real-Time Sporadic Tasks on One Pro    
507       11th IEEE Real-time Systems Symposium, 1    
508   6 - S. K. Baruah, L. E. Rosier and R. R. How    
509       Concerning the Preemptive Scheduling of     
510       One Processor. Real-Time Systems Journal    
511       1990.                                       
512   7 - S. J. Dhall and C. L. Liu. On a real-tim    
513       research, vol. 26, no. 1, pp 127-140, 19    
514   8 - T. Baker. Multiprocessor EDF and Deadlin    
515       Analysis. Proceedings of the 24th IEEE R    
516   9 - T. Baker. An Analysis of EDF Schedulabil    
517       IEEE Transactions on Parallel and Distri    
518       pp 760-768, 2005.                           
519   10 - J. Goossens, S. Funk and S. Baruah, Pri    
520        Periodic Task Systems on Multiprocessor    
521        vol. 25, no. 2–3, pp. 187–205, 2003    
522   11 - R. Davis and A. Burns. A Survey of Hard    
523        Multiprocessor Systems. ACM Computing S    
524        http://www-users.cs.york.ac.uk/~robdavi    
525   12 - U. C. Devi and J. H. Anderson. Tardines    
526        Scheduling on a Multiprocessor. Real-Ti    
527        no. 2, pp 133-189, 2008.                   
528   13 - P. Valente and G. Lipari. An Upper Boun    
529        Real-Time Tasks Scheduled by EDF on Mul    
530        the 26th IEEE Real-Time Systems Symposi    
531   14 - J. Erickson, U. Devi and S. Baruah. Imp    
532        Global EDF. Proceedings of the 22nd Eur    
533        Real-Time Systems, 2010.                   
534   15 - G. Lipari, S. Baruah, Greedy reclamatio    
535        constant-bandwidth servers, 12th IEEE E    
536        Systems, 2000.                             
537   16 - L. Abeni, J. Lelli, C. Scordino, L. Pal    
538        SCHED DEADLINE. In Proceedings of the R    
539        Dusseldorf, Germany, 2014.                 
540   17 - L. Abeni, G. Lipari, A. Parri, Y. Sun,     
541        or sequential?. In Proceedings of the 3    
542        Computing, 2016.                           
543   18 - J. Lelli, C. Scordino, L. Abeni, D. Fag    
544        Linux kernel, Software: Practice and Ex    
545        2016.                                      
546   19 - C. Scordino, L. Abeni, J. Lelli, Energy    
547        the Linux Kernel, 33rd ACM/SIGAPP Sympo    
548        2018), Pau, France, April 2018.            
549                                                   
550                                                   
551 4. Bandwidth management                           
552 =======================                           
553                                                   
554  As previously mentioned, in order for -deadli    
555  effective and useful (that is, to be able to     
556  within "deadline"), it is important to have s    
557  of the available fractions of CPU time to the    
558  This is usually called "admission control" an    
559  no guarantee can be given on the actual sched    
560                                                   
561  As already stated in Section 3, a necessary c    
562  correctly schedule a set of real-time tasks i    
563  is smaller than M. When talking about -deadli    
564  the sum of the ratio between runtime and peri    
565  than M. Notice that the ratio runtime/period     
566  of a "traditional" real-time task, and is als    
567  "bandwidth".                                     
568  The interface used to control the CPU bandwid    
569  to -deadline tasks is similar to the one alre    
570  tasks with real-time group scheduling (a.k.a.    
571  Documentation/scheduler/sched-rt-group.rst),     
572  writable control files located in procfs (for    
573  Notice that per-group settings (controlled th    
574  defined for -deadline tasks, because more dis    
575  figure out how we want to manage SCHED_DEADLI    
576  level.                                           
577                                                   
578  A main difference between deadline bandwidth     
579  is that -deadline tasks have bandwidth on the    
580  and thus we don't need a higher level throttl    
581  desired bandwidth. In other words, this means    
582  only used at admission control time (i.e., wh    
583  sched_setattr()). Scheduling is then performe    
584  parameters, so that CPU bandwidth is allocate    
585  respecting their needs in terms of granularit    
586  interface we can put a cap on total utilizati    
587  \Sum (runtime_i / period_i) < global_dl_utili    
588                                                   
589 4.1 System wide settings                          
590 ------------------------                          
591                                                   
592  The system wide settings are configured under    
593                                                   
594  For now the -rt knobs are used for -deadline     
595  -deadline runtime is accounted against the -r    
596  isn't entirely desirable; however, it is bett    
597  now, and be able to change it easily later. T    
598  run -rt tasks from a -deadline server; in whi    
599  direct subset of dl_bw.                          
600                                                   
601  This means that, for a root_domain comprising    
602  can be created while the sum of their bandwid    
603                                                   
604    M * (sched_rt_runtime_us / sched_rt_period_    
605                                                   
606  It is also possible to disable this bandwidth    
607  be thus free of oversubscribing the system up    
608  This is done by writing -1 in /proc/sys/kerne    
609                                                   
610                                                   
611 4.2 Task interface                                
612 ------------------                                
613                                                   
614  Specifying a periodic/sporadic task that exec    
615  runtime at each instance, and that is schedul    
616  its own timing constraints needs, in general,    
617                                                   
618   - a (maximum/typical) instance execution tim    
619   - a minimum interval between consecutive ins    
620   - a time constraint by which each instance m    
621                                                   
622  Therefore:                                       
623                                                   
624   * a new struct sched_attr, containing all th    
625     provided;                                     
626   * the new scheduling related syscalls that m    
627     sched_setattr() and sched_getattr() are im    
628                                                   
629  For debugging purposes, the leftover runtime     
630  SCHED_DEADLINE task can be retrieved through     
631  dl.runtime and dl.deadline, both values in ns    
632  retrieve these values from production code is    
633                                                   
634                                                   
635 4.3 Default behavior                              
636 ---------------------                             
637                                                   
638  The default value for SCHED_DEADLINE bandwidt    
639  950000. With rt_period equal to 1000000, by d    
640  tasks can use at most 95%, multiplied by the     
641  root_domain, for each root_domain.               
642  This means that non -deadline tasks will rece    
643  and that -deadline tasks will receive their r    
644  worst-case delay respect to the "deadline" pa    
645  and the cpuset mechanism is used to implement    
646  Section 5), then this simple setting of the b    
647  deterministically guarantee that -deadline ta    
648  in a period.                                     
649                                                   
650  Finally, notice that in order not to jeopardi    
651  -deadline task cannot fork.                      
652                                                   
653                                                   
654 4.4 Behavior of sched_yield()                     
655 -----------------------------                     
656                                                   
657  When a SCHED_DEADLINE task calls sched_yield(    
658  remaining runtime and is immediately throttle    
659  period, when its runtime will be replenished     
660  dl_yielded is set and used to handle correctl    
661  replenishment after a call to sched_yield()).    
662                                                   
663  This behavior of sched_yield() allows the tas    
664  the beginning of the next period. Also, this     
665  future with bandwidth reclaiming mechanisms,     
666  make the leftoever runtime available for recl    
667  SCHED_DEADLINE tasks.                            
668                                                   
669                                                   
670 5. Tasks CPU affinity                             
671 =====================                             
672                                                   
673  -deadline tasks cannot have an affinity mask     
674  root_domain they are created on. However, aff    
675  through the cpuset facility (Documentation/ad    
676                                                   
677 5.1 SCHED_DEADLINE and cpusets HOWTO              
678 ------------------------------------              
679                                                   
680  An example of a simple configuration (pin a -    
681  follows (rt-app is used to create a -deadline    
682                                                   
683    mkdir /dev/cpuset                              
684    mount -t cgroup -o cpuset cpuset /dev/cpuse    
685    cd /dev/cpuset                                 
686    mkdir cpu0                                     
687    echo 0 > cpu0/cpuset.cpus                      
688    echo 0 > cpu0/cpuset.mems                      
689    echo 1 > cpuset.cpu_exclusive                  
690    echo 0 > cpuset.sched_load_balance             
691    echo 1 > cpu0/cpuset.cpu_exclusive             
692    echo 1 > cpu0/cpuset.mem_exclusive             
693    echo $$ > cpu0/tasks                           
694    rt-app -t 100000:10000:d:0 -D5 # it is now     
695                                   # task affin    
696                                                   
697 6. Future plans                                   
698 ===============                                   
699                                                   
700  Still missing:                                   
701                                                   
702   - programmatic way to retrieve current runti    
703   - refinements to deadline inheritance, espec    
704     of retaining bandwidth isolation among non    
705     being studied from both theoretical and pr    
706     hopefully we should be able to produce som    
707   - (c)group based bandwidth management, and m    
708   - access control for non-root users (and rel    
709     address), which is the best way to allow u    
710     and how to prevent non-root users "cheat"     
711                                                   
712  As already discussed, we are planning also to    
713  throttling patches [https://lore.kernel.org/r    
714  the preliminary phases of the merge and we re    
715  help us decide on the direction it should tak    
716                                                   
717 Appendix A. Test suite                            
718 ======================                            
719                                                   
720  The SCHED_DEADLINE policy can be easily teste    
721  are part of a wider Linux Scheduler validatio    
722  available as a GitHub repository: https://git    
723                                                   
724  The first testing application is called rt-ap    
725  start multiple threads with specific paramete    
726  SCHED_{OTHER,FIFO,RR,DEADLINE} scheduling pol    
727  parameters (e.g., niceness, priority, runtime    
728  is a valuable tool, as it can be used to synt    
729  workloads (maybe mimicking real use-cases) an    
730  behaves under such workloads. In this way, re    
731  rt-app is available at: https://github.com/sc    
732                                                   
733  Thread parameters can be specified from the c    
734  this::                                           
735                                                   
736   # rt-app -t 100000:10000:d -t 150000:20000:f    
737                                                   
738  The above creates 2 threads. The first one, s    
739  executes for 10ms every 100ms. The second one    
740  priority 10, executes for 20ms every 150ms. T    
741  of 5 seconds.                                    
742                                                   
743  More interestingly, configurations can be des    
744  can be passed as input to rt-app with somethi    
745                                                   
746   # rt-app my_config.json                         
747                                                   
748  The parameters that can be specified with the    
749  of the command line options. Please refer to     
750  details (`<rt-app-sources>/doc/*.json`).         
751                                                   
752  The second testing application is a modificat    
753  schedtool-dl, which can be used to setup SCHE    
754  certain pid/application. schedtool-dl is avai    
755  https://github.com/scheduler-tools/schedtool-    
756                                                   
757  The usage is straightforward::                   
758                                                   
759   # schedtool -E -t 10000000:100000000 -e ./my    
760                                                   
761  With this, my_cpuhog_app is put to run inside    
762  of 10ms every 100ms (note that parameters are    
763  You can also use schedtool to create a reserv    
764  application, given that you know its pid::       
765                                                   
766   # schedtool -E -t 10000000:100000000 my_app_    
767                                                   
768 Appendix B. Minimal main()                        
769 ==========================                        
770                                                   
771  We provide in what follows a simple (ugly) se    
772  showing how SCHED_DEADLINE reservations can b    
773  application developer::                          
774                                                   
775    #define _GNU_SOURCE                            
776    #include <unistd.h>                            
777    #include <stdio.h>                             
778    #include <stdlib.h>                            
779    #include <string.h>                            
780    #include <time.h>                              
781    #include <linux/unistd.h>                      
782    #include <linux/kernel.h>                      
783    #include <linux/types.h>                       
784    #include <sys/syscall.h>                       
785    #include <pthread.h>                           
786                                                   
787    #define gettid() syscall(__NR_gettid)          
788                                                   
789    #define SCHED_DEADLINE       6                 
790                                                   
791    /* XXX use the proper syscall numbers */       
792    #ifdef __x86_64__                              
793    #define __NR_sched_setattr           314       
794    #define __NR_sched_getattr           315       
795    #endif                                         
796                                                   
797    #ifdef __i386__                                
798    #define __NR_sched_setattr           351       
799    #define __NR_sched_getattr           352       
800    #endif                                         
801                                                   
802    #ifdef __arm__                                 
803    #define __NR_sched_setattr           380       
804    #define __NR_sched_getattr           381       
805    #endif                                         
806                                                   
807    static volatile int done;                      
808                                                   
809    struct sched_attr {                            
810         __u32 size;                               
811                                                   
812         __u32 sched_policy;                       
813         __u64 sched_flags;                        
814                                                   
815         /* SCHED_NORMAL, SCHED_BATCH */           
816         __s32 sched_nice;                         
817                                                   
818         /* SCHED_FIFO, SCHED_RR */                
819         __u32 sched_priority;                     
820                                                   
821         /* SCHED_DEADLINE (nsec) */               
822         __u64 sched_runtime;                      
823         __u64 sched_deadline;                     
824         __u64 sched_period;                       
825    };                                             
826                                                   
827    int sched_setattr(pid_t pid,                   
828                   const struct sched_attr *att    
829                   unsigned int flags)             
830    {                                              
831         return syscall(__NR_sched_setattr, pid    
832    }                                              
833                                                   
834    int sched_getattr(pid_t pid,                   
835                   struct sched_attr *attr,        
836                   unsigned int size,              
837                   unsigned int flags)             
838    {                                              
839         return syscall(__NR_sched_getattr, pid    
840    }                                              
841                                                   
842    void *run_deadline(void *data)                 
843    {                                              
844         struct sched_attr attr;                   
845         int x = 0;                                
846         int ret;                                  
847         unsigned int flags = 0;                   
848                                                   
849         printf("deadline thread started [%ld]\    
850                                                   
851         attr.size = sizeof(attr);                 
852         attr.sched_flags = 0;                     
853         attr.sched_nice = 0;                      
854         attr.sched_priority = 0;                  
855                                                   
856         /* This creates a 10ms/30ms reservatio    
857         attr.sched_policy = SCHED_DEADLINE;       
858         attr.sched_runtime = 10 * 1000 * 1000;    
859         attr.sched_period = attr.sched_deadlin    
860                                                   
861         ret = sched_setattr(0, &attr, flags);     
862         if (ret < 0) {                            
863                 done = 0;                         
864                 perror("sched_setattr");          
865                 exit(-1);                         
866         }                                         
867                                                   
868         while (!done) {                           
869                 x++;                              
870         }                                         
871                                                   
872         printf("deadline thread dies [%ld]\n",    
873         return NULL;                              
874    }                                              
875                                                   
876    int main (int argc, char **argv)               
877    {                                              
878         pthread_t thread;                         
879                                                   
880         printf("main thread [%ld]\n", gettid()    
881                                                   
882         pthread_create(&thread, NULL, run_dead    
883                                                   
884         sleep(10);                                
885                                                   
886         done = 1;                                 
887         pthread_join(thread, NULL);               
888                                                   
889         printf("main dies [%ld]\n", gettid());    
890         return 0;                                 
891    }                                              
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php