~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/scheduler/sched-nice-design.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/scheduler/sched-nice-design.rst (Version linux-6.12-rc7) and /Documentation/scheduler/sched-nice-design.rst (Version linux-5.8.18)


  1 =====================                               1 =====================
  2 Scheduler Nice Design                               2 Scheduler Nice Design
  3 =====================                               3 =====================
  4                                                     4 
  5 This document explains the thinking about the       5 This document explains the thinking about the revamped and streamlined
  6 nice-levels implementation in the new Linux sc      6 nice-levels implementation in the new Linux scheduler.
  7                                                     7 
  8 Nice levels were always pretty weak under Linu      8 Nice levels were always pretty weak under Linux and people continuously
  9 pestered us to make nice +19 tasks use up much      9 pestered us to make nice +19 tasks use up much less CPU time.
 10                                                    10 
 11 Unfortunately that was not that easy to implem     11 Unfortunately that was not that easy to implement under the old
 12 scheduler, (otherwise we'd have done it long a     12 scheduler, (otherwise we'd have done it long ago) because nice level
 13 support was historically coupled to timeslice      13 support was historically coupled to timeslice length, and timeslice
 14 units were driven by the HZ tick, so the small     14 units were driven by the HZ tick, so the smallest timeslice was 1/HZ.
 15                                                    15 
 16 In the O(1) scheduler (in 2003) we changed neg     16 In the O(1) scheduler (in 2003) we changed negative nice levels to be
 17 much stronger than they were before in 2.4 (an     17 much stronger than they were before in 2.4 (and people were happy about
 18 that change), and we also intentionally calibr     18 that change), and we also intentionally calibrated the linear timeslice
 19 rule so that nice +19 level would be _exactly_     19 rule so that nice +19 level would be _exactly_ 1 jiffy. To better
 20 understand it, the timeslice graph went like t     20 understand it, the timeslice graph went like this (cheesy ASCII art
 21 alert!)::                                          21 alert!)::
 22                                                    22 
 23                                                    23 
 24                    A                               24                    A
 25              \     | [timeslice length]            25              \     | [timeslice length]
 26               \    |                               26               \    |
 27                \   |                               27                \   |
 28                 \  |                               28                 \  |
 29                  \ |                               29                  \ |
 30                   \|___100msecs                    30                   \|___100msecs
 31                    |^ . _                          31                    |^ . _
 32                    |      ^ . _                    32                    |      ^ . _
 33                    |            ^ . _              33                    |            ^ . _
 34  -*----------------------------------*-----> [     34  -*----------------------------------*-----> [nice level]
 35  -20               |                +19            35  -20               |                +19
 36                    |                               36                    |
 37                    |                               37                    |
 38                                                    38 
 39 So that if someone wanted to really renice tas     39 So that if someone wanted to really renice tasks, +19 would give a much
 40 bigger hit than the normal linear rule would d     40 bigger hit than the normal linear rule would do. (The solution of
 41 changing the ABI to extend priorities was disc     41 changing the ABI to extend priorities was discarded early on.)
 42                                                    42 
 43 This approach worked to some degree for some t     43 This approach worked to some degree for some time, but later on with
 44 HZ=1000 it caused 1 jiffy to be 1 msec, which      44 HZ=1000 it caused 1 jiffy to be 1 msec, which meant 0.1% CPU usage which
 45 we felt to be a bit excessive. Excessive _not_     45 we felt to be a bit excessive. Excessive _not_ because it's too small of
 46 a CPU utilization, but because it causes too f     46 a CPU utilization, but because it causes too frequent (once per
 47 millisec) rescheduling. (and would thus trash      47 millisec) rescheduling. (and would thus trash the cache, etc. Remember,
 48 this was long ago when hardware was weaker and     48 this was long ago when hardware was weaker and caches were smaller, and
 49 people were running number crunching apps at n     49 people were running number crunching apps at nice +19.)
 50                                                    50 
 51 So for HZ=1000 we changed nice +19 to 5msecs,      51 So for HZ=1000 we changed nice +19 to 5msecs, because that felt like the
 52 right minimal granularity - and this translate     52 right minimal granularity - and this translates to 5% CPU utilization.
 53 But the fundamental HZ-sensitive property for      53 But the fundamental HZ-sensitive property for nice+19 still remained,
 54 and we never got a single complaint about nice     54 and we never got a single complaint about nice +19 being too _weak_ in
 55 terms of CPU utilization, we only got complain     55 terms of CPU utilization, we only got complaints about it (still) being
 56 too _strong_ :-)                                   56 too _strong_ :-)
 57                                                    57 
 58 To sum it up: we always wanted to make nice le     58 To sum it up: we always wanted to make nice levels more consistent, but
 59 within the constraints of HZ and jiffies and t     59 within the constraints of HZ and jiffies and their nasty design level
 60 coupling to timeslices and granularity it was      60 coupling to timeslices and granularity it was not really viable.
 61                                                    61 
 62 The second (less frequent but still periodical     62 The second (less frequent but still periodically occurring) complaint
 63 about Linux's nice level support was its asymm !!  63 about Linux's nice level support was its assymetry around the origo
 64 (which you can see demonstrated in the picture     64 (which you can see demonstrated in the picture above), or more
 65 accurately: the fact that nice level behavior      65 accurately: the fact that nice level behavior depended on the _absolute_
 66 nice level as well, while the nice API itself      66 nice level as well, while the nice API itself is fundamentally
 67 "relative":                                        67 "relative":
 68                                                    68 
 69    int nice(int inc);                              69    int nice(int inc);
 70                                                    70 
 71    asmlinkage long sys_nice(int increment)         71    asmlinkage long sys_nice(int increment)
 72                                                    72 
 73 (the first one is the glibc API, the second on     73 (the first one is the glibc API, the second one is the syscall API.)
 74 Note that the 'inc' is relative to the current     74 Note that the 'inc' is relative to the current nice level. Tools like
 75 bash's "nice" command mirror this relative API     75 bash's "nice" command mirror this relative API.
 76                                                    76 
 77 With the old scheduler, if you for example sta     77 With the old scheduler, if you for example started a niced task with +1
 78 and another task with +2, the CPU split betwee     78 and another task with +2, the CPU split between the two tasks would
 79 depend on the nice level of the parent shell -     79 depend on the nice level of the parent shell - if it was at nice -10 the
 80 CPU split was different than if it was at +5 o     80 CPU split was different than if it was at +5 or +10.
 81                                                    81 
 82 A third complaint against Linux's nice level s     82 A third complaint against Linux's nice level support was that negative
 83 nice levels were not 'punchy enough', so lots      83 nice levels were not 'punchy enough', so lots of people had to resort to
 84 run audio (and other multimedia) apps under RT     84 run audio (and other multimedia) apps under RT priorities such as
 85 SCHED_FIFO. But this caused other problems: SC     85 SCHED_FIFO. But this caused other problems: SCHED_FIFO is not starvation
 86 proof, and a buggy SCHED_FIFO app can also loc     86 proof, and a buggy SCHED_FIFO app can also lock up the system for good.
 87                                                    87 
 88 The new scheduler in v2.6.23 addresses all thr     88 The new scheduler in v2.6.23 addresses all three types of complaints:
 89                                                    89 
 90 To address the first complaint (of nice levels     90 To address the first complaint (of nice levels being not "punchy"
 91 enough), the scheduler was decoupled from 'tim     91 enough), the scheduler was decoupled from 'time slice' and HZ concepts
 92 (and granularity was made a separate concept f     92 (and granularity was made a separate concept from nice levels) and thus
 93 it was possible to implement better and more c     93 it was possible to implement better and more consistent nice +19
 94 support: with the new scheduler nice +19 tasks     94 support: with the new scheduler nice +19 tasks get a HZ-independent
 95 1.5%, instead of the variable 3%-5%-9% range t     95 1.5%, instead of the variable 3%-5%-9% range they got in the old
 96 scheduler.                                         96 scheduler.
 97                                                    97 
 98 To address the second complaint (of nice level     98 To address the second complaint (of nice levels not being consistent),
 99 the new scheduler makes nice(1) have the same      99 the new scheduler makes nice(1) have the same CPU utilization effect on
100 tasks, regardless of their absolute nice level    100 tasks, regardless of their absolute nice levels. So on the new
101 scheduler, running a nice +10 and a nice 11 ta    101 scheduler, running a nice +10 and a nice 11 task has the same CPU
102 utilization "split" between them as running a     102 utilization "split" between them as running a nice -5 and a nice -4
103 task. (one will get 55% of the CPU, the other     103 task. (one will get 55% of the CPU, the other 45%.) That is why nice
104 levels were changed to be "multiplicative" (or    104 levels were changed to be "multiplicative" (or exponential) - that way
105 it does not matter which nice level you start     105 it does not matter which nice level you start out from, the 'relative
106 result' will always be the same.                  106 result' will always be the same.
107                                                   107 
108 The third complaint (of negative nice levels n    108 The third complaint (of negative nice levels not being "punchy" enough
109 and forcing audio apps to run under the more d    109 and forcing audio apps to run under the more dangerous SCHED_FIFO
110 scheduling policy) is addressed by the new sch    110 scheduling policy) is addressed by the new scheduler almost
111 automatically: stronger negative nice levels a    111 automatically: stronger negative nice levels are an automatic
112 side-effect of the recalibrated dynamic range     112 side-effect of the recalibrated dynamic range of nice levels.
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php