~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/timers/hrtimers.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/timers/hrtimers.rst (Version linux-6.12-rc7) and /Documentation/timers/hrtimers.rst (Version linux-6.1.116)


  1 ==============================================      1 ======================================================
  2 hrtimers - subsystem for high-resolution kerne      2 hrtimers - subsystem for high-resolution kernel timers
  3 ==============================================      3 ======================================================
  4                                                     4 
  5 This patch introduces a new subsystem for high      5 This patch introduces a new subsystem for high-resolution kernel timers.
  6                                                     6 
  7 One might ask the question: we already have a       7 One might ask the question: we already have a timer subsystem
  8 (kernel/timers.c), why do we need two timer su      8 (kernel/timers.c), why do we need two timer subsystems? After a lot of
  9 back and forth trying to integrate high-resolu      9 back and forth trying to integrate high-resolution and high-precision
 10 features into the existing timer framework, an     10 features into the existing timer framework, and after testing various
 11 such high-resolution timer implementations in      11 such high-resolution timer implementations in practice, we came to the
 12 conclusion that the timer wheel code is fundam     12 conclusion that the timer wheel code is fundamentally not suitable for
 13 such an approach. We initially didn't believe      13 such an approach. We initially didn't believe this ('there must be a way
 14 to solve this'), and spent a considerable effo     14 to solve this'), and spent a considerable effort trying to integrate
 15 things into the timer wheel, but we failed. In     15 things into the timer wheel, but we failed. In hindsight, there are
 16 several reasons why such integration is hard/i     16 several reasons why such integration is hard/impossible:
 17                                                    17 
 18 - the forced handling of low-resolution and hi     18 - the forced handling of low-resolution and high-resolution timers in
 19   the same way leads to a lot of compromises,      19   the same way leads to a lot of compromises, macro magic and #ifdef
 20   mess. The timers.c code is very "tightly cod     20   mess. The timers.c code is very "tightly coded" around jiffies and
 21   32-bitness assumptions, and has been honed a     21   32-bitness assumptions, and has been honed and micro-optimized for a
 22   relatively narrow use case (jiffies in a rel     22   relatively narrow use case (jiffies in a relatively narrow HZ range)
 23   for many years - and thus even small extensi     23   for many years - and thus even small extensions to it easily break
 24   the wheel concept, leading to even worse com     24   the wheel concept, leading to even worse compromises. The timer wheel
 25   code is very good and tight code, there's ze     25   code is very good and tight code, there's zero problems with it in its
 26   current usage - but it is simply not suitabl     26   current usage - but it is simply not suitable to be extended for
 27   high-res timers.                                 27   high-res timers.
 28                                                    28 
 29 - the unpredictable [O(N)] overhead of cascadi     29 - the unpredictable [O(N)] overhead of cascading leads to delays which
 30   necessitate a more complex handling of high      30   necessitate a more complex handling of high resolution timers, which
 31   in turn decreases robustness. Such a design      31   in turn decreases robustness. Such a design still leads to rather large
 32   timing inaccuracies. Cascading is a fundamen     32   timing inaccuracies. Cascading is a fundamental property of the timer
 33   wheel concept, it cannot be 'designed out' w     33   wheel concept, it cannot be 'designed out' without inevitably
 34   degrading other portions of the timers.c cod     34   degrading other portions of the timers.c code in an unacceptable way.
 35                                                    35 
 36 - the implementation of the current posix-time     36 - the implementation of the current posix-timer subsystem on top of
 37   the timer wheel has already introduced a qui     37   the timer wheel has already introduced a quite complex handling of
 38   the required readjusting of absolute CLOCK_R     38   the required readjusting of absolute CLOCK_REALTIME timers at
 39   settimeofday or NTP time - further underlyin     39   settimeofday or NTP time - further underlying our experience by
 40   example: that the timer wheel data structure     40   example: that the timer wheel data structure is too rigid for high-res
 41   timers.                                          41   timers.
 42                                                    42 
 43 - the timer wheel code is most optimal for use     43 - the timer wheel code is most optimal for use cases which can be
 44   identified as "timeouts". Such timeouts are      44   identified as "timeouts". Such timeouts are usually set up to cover
 45   error conditions in various I/O paths, such      45   error conditions in various I/O paths, such as networking and block
 46   I/O. The vast majority of those timers never     46   I/O. The vast majority of those timers never expire and are rarely
 47   recascaded because the expected correct even     47   recascaded because the expected correct event arrives in time so they
 48   can be removed from the timer wheel before a     48   can be removed from the timer wheel before any further processing of
 49   them becomes necessary. Thus the users of th     49   them becomes necessary. Thus the users of these timeouts can accept
 50   the granularity and precision tradeoffs of t     50   the granularity and precision tradeoffs of the timer wheel, and
 51   largely expect the timer subsystem to have n     51   largely expect the timer subsystem to have near-zero overhead.
 52   Accurate timing for them is not a core purpo     52   Accurate timing for them is not a core purpose - in fact most of the
 53   timeout values used are ad-hoc. For them it      53   timeout values used are ad-hoc. For them it is at most a necessary
 54   evil to guarantee the processing of actual t     54   evil to guarantee the processing of actual timeout completions
 55   (because most of the timeouts are deleted be     55   (because most of the timeouts are deleted before completion), which
 56   should thus be as cheap and unintrusive as p     56   should thus be as cheap and unintrusive as possible.
 57                                                    57 
 58 The primary users of precision timers are user     58 The primary users of precision timers are user-space applications that
 59 utilize nanosleep, posix-timers and itimer int     59 utilize nanosleep, posix-timers and itimer interfaces. Also, in-kernel
 60 users like drivers and subsystems which requir     60 users like drivers and subsystems which require precise timed events
 61 (e.g. multimedia) can benefit from the availab     61 (e.g. multimedia) can benefit from the availability of a separate
 62 high-resolution timer subsystem as well.           62 high-resolution timer subsystem as well.
 63                                                    63 
 64 While this subsystem does not offer high-resol     64 While this subsystem does not offer high-resolution clock sources just
 65 yet, the hrtimer subsystem can be easily exten     65 yet, the hrtimer subsystem can be easily extended with high-resolution
 66 clock capabilities, and patches for that exist     66 clock capabilities, and patches for that exist and are maturing quickly.
 67 The increasing demand for realtime and multime     67 The increasing demand for realtime and multimedia applications along
 68 with other potential users for precise timers      68 with other potential users for precise timers gives another reason to
 69 separate the "timeout" and "precise timer" sub     69 separate the "timeout" and "precise timer" subsystems.
 70                                                    70 
 71 Another potential benefit is that such a separ     71 Another potential benefit is that such a separation allows even more
 72 special-purpose optimization of the existing t     72 special-purpose optimization of the existing timer wheel for the low
 73 resolution and low precision use cases - once      73 resolution and low precision use cases - once the precision-sensitive
 74 APIs are separated from the timer wheel and ar     74 APIs are separated from the timer wheel and are migrated over to
 75 hrtimers. E.g. we could decrease the frequency     75 hrtimers. E.g. we could decrease the frequency of the timeout subsystem
 76 from 250 Hz to 100 HZ (or even smaller).           76 from 250 Hz to 100 HZ (or even smaller).
 77                                                    77 
 78 hrtimer subsystem implementation details           78 hrtimer subsystem implementation details
 79 ----------------------------------------           79 ----------------------------------------
 80                                                    80 
 81 the basic design considerations were:              81 the basic design considerations were:
 82                                                    82 
 83 - simplicity                                       83 - simplicity
 84                                                    84 
 85 - data structure not bound to jiffies or any o     85 - data structure not bound to jiffies or any other granularity. All the
 86   kernel logic works at 64-bit nanoseconds res     86   kernel logic works at 64-bit nanoseconds resolution - no compromises.
 87                                                    87 
 88 - simplification of existing, timing related k     88 - simplification of existing, timing related kernel code
 89                                                    89 
 90 another basic requirement was the immediate en     90 another basic requirement was the immediate enqueueing and ordering of
 91 timers at activation time. After looking at se     91 timers at activation time. After looking at several possible solutions
 92 such as radix trees and hashes, we chose the r     92 such as radix trees and hashes, we chose the red black tree as the basic
 93 data structure. Rbtrees are available as a lib     93 data structure. Rbtrees are available as a library in the kernel and are
 94 used in various performance-critical areas of      94 used in various performance-critical areas of e.g. memory management and
 95 file systems. The rbtree is solely used for ti     95 file systems. The rbtree is solely used for time sorted ordering, while
 96 a separate list is used to give the expiry cod     96 a separate list is used to give the expiry code fast access to the
 97 queued timers, without having to walk the rbtr     97 queued timers, without having to walk the rbtree.
 98                                                    98 
 99 (This separate list is also useful for later w     99 (This separate list is also useful for later when we'll introduce
100 high-resolution clocks, where we need separate    100 high-resolution clocks, where we need separate pending and expired
101 queues while keeping the time-order intact.)      101 queues while keeping the time-order intact.)
102                                                   102 
103 Time-ordered enqueueing is not purely for the     103 Time-ordered enqueueing is not purely for the purposes of
104 high-resolution clocks though, it also simplif    104 high-resolution clocks though, it also simplifies the handling of
105 absolute timers based on a low-resolution CLOC    105 absolute timers based on a low-resolution CLOCK_REALTIME. The existing
106 implementation needed to keep an extra list of    106 implementation needed to keep an extra list of all armed absolute
107 CLOCK_REALTIME timers along with complex locki    107 CLOCK_REALTIME timers along with complex locking. In case of
108 settimeofday and NTP, all the timers (!) had t    108 settimeofday and NTP, all the timers (!) had to be dequeued, the
109 time-changing code had to fix them up one by o    109 time-changing code had to fix them up one by one, and all of them had to
110 be enqueued again. The time-ordered enqueueing    110 be enqueued again. The time-ordered enqueueing and the storage of the
111 expiry time in absolute time units removes all    111 expiry time in absolute time units removes all this complex and poorly
112 scaling code from the posix-timer implementati    112 scaling code from the posix-timer implementation - the clock can simply
113 be set without having to touch the rbtree. Thi    113 be set without having to touch the rbtree. This also makes the handling
114 of posix-timers simpler in general.               114 of posix-timers simpler in general.
115                                                   115 
116 The locking and per-CPU behavior of hrtimers w    116 The locking and per-CPU behavior of hrtimers was mostly taken from the
117 existing timer wheel code, as it is mature and    117 existing timer wheel code, as it is mature and well suited. Sharing code
118 was not really a win, due to the different dat    118 was not really a win, due to the different data structures. Also, the
119 hrtimer functions now have clearer behavior an    119 hrtimer functions now have clearer behavior and clearer names - such as
120 hrtimer_try_to_cancel() and hrtimer_cancel() [    120 hrtimer_try_to_cancel() and hrtimer_cancel() [which are roughly
121 equivalent to timer_delete() and timer_delete_ !! 121 equivalent to del_timer() and del_timer_sync()] - so there's no direct
122 1:1 mapping between them on the algorithmic le    122 1:1 mapping between them on the algorithmic level, and thus no real
123 potential for code sharing either.                123 potential for code sharing either.
124                                                   124 
125 Basic data types: every time value, absolute o    125 Basic data types: every time value, absolute or relative, is in a
126 special nanosecond-resolution 64bit type: ktim !! 126 special nanosecond-resolution type: ktime_t. The kernel-internal
127 (Originally, the kernel-internal representatio !! 127 representation of ktime_t values and operations is implemented via
128 operations was implemented via macros and inli !! 128 macros and inline functions, and can be switched between a "hybrid
129 switched between a "hybrid union" type and a p !! 129 union" type and a plain "scalar" 64bit nanoseconds representation (at
130 nanoseconds representation (at compile time).  !! 130 compile time). The hybrid union type optimizes time conversions on 32bit
131 context of the Y2038 work.)                    !! 131 CPUs. This build-time-selectable ktime_t storage format was implemented
                                                   >> 132 to avoid the performance impact of 64-bit multiplications and divisions
                                                   >> 133 on 32bit CPUs. Such operations are frequently necessary to convert
                                                   >> 134 between the storage formats provided by kernel and userspace interfaces
                                                   >> 135 and the internal time format. (See include/linux/ktime.h for further
                                                   >> 136 details.)
132                                                   137 
133 hrtimers - rounding of timer values               138 hrtimers - rounding of timer values
134 -----------------------------------               139 -----------------------------------
135                                                   140 
136 the hrtimer code will round timer events to lo    141 the hrtimer code will round timer events to lower-resolution clocks
137 because it has to. Otherwise it will do no art    142 because it has to. Otherwise it will do no artificial rounding at all.
138                                                   143 
139 one question is, what resolution value should     144 one question is, what resolution value should be returned to the user by
140 the clock_getres() interface. This will return    145 the clock_getres() interface. This will return whatever real resolution
141 a given clock has - be it low-res, high-res, o    146 a given clock has - be it low-res, high-res, or artificially-low-res.
142                                                   147 
143 hrtimers - testing and verification               148 hrtimers - testing and verification
144 -----------------------------------               149 -----------------------------------
145                                                   150 
146 We used the high-resolution clock subsystem on !! 151 We used the high-resolution clock subsystem ontop of hrtimers to verify
147 the hrtimer implementation details in praxis,     152 the hrtimer implementation details in praxis, and we also ran the posix
148 timer tests in order to ensure specification c    153 timer tests in order to ensure specification compliance. We also ran
149 tests on low-resolution clocks.                   154 tests on low-resolution clocks.
150                                                   155 
151 The hrtimer patch converts the following kerne    156 The hrtimer patch converts the following kernel functionality to use
152 hrtimers:                                         157 hrtimers:
153                                                   158 
154  - nanosleep                                      159  - nanosleep
155  - itimers                                        160  - itimers
156  - posix-timers                                   161  - posix-timers
157                                                   162 
158 The conversion of nanosleep and posix-timers e    163 The conversion of nanosleep and posix-timers enabled the unification of
159 nanosleep and clock_nanosleep.                    164 nanosleep and clock_nanosleep.
160                                                   165 
161 The code was successfully compiled for the fol    166 The code was successfully compiled for the following platforms:
162                                                   167 
163  i386, x86_64, ARM, PPC, PPC64, IA64              168  i386, x86_64, ARM, PPC, PPC64, IA64
164                                                   169 
165 The code was run-tested on the following platf    170 The code was run-tested on the following platforms:
166                                                   171 
167  i386(UP/SMP), x86_64(UP/SMP), ARM, PPC           172  i386(UP/SMP), x86_64(UP/SMP), ARM, PPC
168                                                   173 
169 hrtimers were also integrated into the -rt tre    174 hrtimers were also integrated into the -rt tree, along with a
170 hrtimers-based high-resolution clock implement    175 hrtimers-based high-resolution clock implementation, so the hrtimers
171 code got a healthy amount of testing and use i    176 code got a healthy amount of testing and use in practice.
172                                                   177 
173         Thomas Gleixner, Ingo Molnar              178         Thomas Gleixner, Ingo Molnar
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php