1 ============================================== 2 hrtimers - subsystem for high-resolution kerne 3 ============================================== 4 5 This patch introduces a new subsystem for high 6 7 One might ask the question: we already have a 8 (kernel/timers.c), why do we need two timer su 9 back and forth trying to integrate high-resolu 10 features into the existing timer framework, an 11 such high-resolution timer implementations in 12 conclusion that the timer wheel code is fundam 13 such an approach. We initially didn't believe 14 to solve this'), and spent a considerable effo 15 things into the timer wheel, but we failed. In 16 several reasons why such integration is hard/i 17 18 - the forced handling of low-resolution and hi 19 the same way leads to a lot of compromises, 20 mess. The timers.c code is very "tightly cod 21 32-bitness assumptions, and has been honed a 22 relatively narrow use case (jiffies in a rel 23 for many years - and thus even small extensi 24 the wheel concept, leading to even worse com 25 code is very good and tight code, there's ze 26 current usage - but it is simply not suitabl 27 high-res timers. 28 29 - the unpredictable [O(N)] overhead of cascadi 30 necessitate a more complex handling of high 31 in turn decreases robustness. Such a design 32 timing inaccuracies. Cascading is a fundamen 33 wheel concept, it cannot be 'designed out' w 34 degrading other portions of the timers.c cod 35 36 - the implementation of the current posix-time 37 the timer wheel has already introduced a qui 38 the required readjusting of absolute CLOCK_R 39 settimeofday or NTP time - further underlyin 40 example: that the timer wheel data structure 41 timers. 42 43 - the timer wheel code is most optimal for use 44 identified as "timeouts". Such timeouts are 45 error conditions in various I/O paths, such 46 I/O. The vast majority of those timers never 47 recascaded because the expected correct even 48 can be removed from the timer wheel before a 49 them becomes necessary. Thus the users of th 50 the granularity and precision tradeoffs of t 51 largely expect the timer subsystem to have n 52 Accurate timing for them is not a core purpo 53 timeout values used are ad-hoc. For them it 54 evil to guarantee the processing of actual t 55 (because most of the timeouts are deleted be 56 should thus be as cheap and unintrusive as p 57 58 The primary users of precision timers are user 59 utilize nanosleep, posix-timers and itimer int 60 users like drivers and subsystems which requir 61 (e.g. multimedia) can benefit from the availab 62 high-resolution timer subsystem as well. 63 64 While this subsystem does not offer high-resol 65 yet, the hrtimer subsystem can be easily exten 66 clock capabilities, and patches for that exist 67 The increasing demand for realtime and multime 68 with other potential users for precise timers 69 separate the "timeout" and "precise timer" sub 70 71 Another potential benefit is that such a separ 72 special-purpose optimization of the existing t 73 resolution and low precision use cases - once 74 APIs are separated from the timer wheel and ar 75 hrtimers. E.g. we could decrease the frequency 76 from 250 Hz to 100 HZ (or even smaller). 77 78 hrtimer subsystem implementation details 79 ---------------------------------------- 80 81 the basic design considerations were: 82 83 - simplicity 84 85 - data structure not bound to jiffies or any o 86 kernel logic works at 64-bit nanoseconds res 87 88 - simplification of existing, timing related k 89 90 another basic requirement was the immediate en 91 timers at activation time. After looking at se 92 such as radix trees and hashes, we chose the r 93 data structure. Rbtrees are available as a lib 94 used in various performance-critical areas of 95 file systems. The rbtree is solely used for ti 96 a separate list is used to give the expiry cod 97 queued timers, without having to walk the rbtr 98 99 (This separate list is also useful for later w 100 high-resolution clocks, where we need separate 101 queues while keeping the time-order intact.) 102 103 Time-ordered enqueueing is not purely for the 104 high-resolution clocks though, it also simplif 105 absolute timers based on a low-resolution CLOC 106 implementation needed to keep an extra list of 107 CLOCK_REALTIME timers along with complex locki 108 settimeofday and NTP, all the timers (!) had t 109 time-changing code had to fix them up one by o 110 be enqueued again. The time-ordered enqueueing 111 expiry time in absolute time units removes all 112 scaling code from the posix-timer implementati 113 be set without having to touch the rbtree. Thi 114 of posix-timers simpler in general. 115 116 The locking and per-CPU behavior of hrtimers w 117 existing timer wheel code, as it is mature and 118 was not really a win, due to the different dat 119 hrtimer functions now have clearer behavior an 120 hrtimer_try_to_cancel() and hrtimer_cancel() [ 121 equivalent to timer_delete() and timer_delete_ 122 1:1 mapping between them on the algorithmic le 123 potential for code sharing either. 124 125 Basic data types: every time value, absolute o 126 special nanosecond-resolution 64bit type: ktim 127 (Originally, the kernel-internal representatio 128 operations was implemented via macros and inli 129 switched between a "hybrid union" type and a p 130 nanoseconds representation (at compile time). 131 context of the Y2038 work.) 132 133 hrtimers - rounding of timer values 134 ----------------------------------- 135 136 the hrtimer code will round timer events to lo 137 because it has to. Otherwise it will do no art 138 139 one question is, what resolution value should 140 the clock_getres() interface. This will return 141 a given clock has - be it low-res, high-res, o 142 143 hrtimers - testing and verification 144 ----------------------------------- 145 146 We used the high-resolution clock subsystem on 147 the hrtimer implementation details in praxis, 148 timer tests in order to ensure specification c 149 tests on low-resolution clocks. 150 151 The hrtimer patch converts the following kerne 152 hrtimers: 153 154 - nanosleep 155 - itimers 156 - posix-timers 157 158 The conversion of nanosleep and posix-timers e 159 nanosleep and clock_nanosleep. 160 161 The code was successfully compiled for the fol 162 163 i386, x86_64, ARM, PPC, PPC64, IA64 164 165 The code was run-tested on the following platf 166 167 i386(UP/SMP), x86_64(UP/SMP), ARM, PPC 168 169 hrtimers were also integrated into the -rt tre 170 hrtimers-based high-resolution clock implement 171 code got a healthy amount of testing and use i 172 173 Thomas Gleixner, Ingo Molnar
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.