1 ############### 1 ############### 2 Timerlat tracer 2 Timerlat tracer 3 ############### 3 ############### 4 4 5 The timerlat tracer aims to help the preemptiv 5 The timerlat tracer aims to help the preemptive kernel developers to 6 find sources of wakeup latencies of real-time !! 6 find souces of wakeup latencies of real-time threads. Like cyclictest, 7 the tracer sets a periodic timer that wakes up 7 the tracer sets a periodic timer that wakes up a thread. The thread then 8 computes a *wakeup latency* value as the diffe 8 computes a *wakeup latency* value as the difference between the *current 9 time* and the *absolute time* that the timer w 9 time* and the *absolute time* that the timer was set to expire. The main 10 goal of timerlat is tracing in such a way to h 10 goal of timerlat is tracing in such a way to help kernel developers. 11 11 12 Usage 12 Usage 13 ----- 13 ----- 14 14 15 Write the ASCII text "timerlat" into the curre 15 Write the ASCII text "timerlat" into the current_tracer file of the 16 tracing system (generally mounted at /sys/kern 16 tracing system (generally mounted at /sys/kernel/tracing). 17 17 18 For example:: 18 For example:: 19 19 20 [root@f32 ~]# cd /sys/kernel/tracing/ 20 [root@f32 ~]# cd /sys/kernel/tracing/ 21 [root@f32 tracing]# echo timerlat > cu 21 [root@f32 tracing]# echo timerlat > current_tracer 22 22 23 It is possible to follow the trace by reading !! 23 It is possible to follow the trace by reading the trace trace file:: 24 24 25 [root@f32 tracing]# cat trace 25 [root@f32 tracing]# cat trace 26 # tracer: timerlat 26 # tracer: timerlat 27 # 27 # 28 # _-----=> irqs 28 # _-----=> irqs-off 29 # / _----=> need 29 # / _----=> need-resched 30 # | / _---=> hard 30 # | / _---=> hardirq/softirq 31 # || / _--=> pree 31 # || / _--=> preempt-depth 32 # || / 32 # || / 33 # |||| 33 # |||| ACTIVATION 34 # TASK-PID CPU# |||| TIMESTAM 34 # TASK-PID CPU# |||| TIMESTAMP ID CONTEXT LATENCY 35 # | | | |||| | 35 # | | | |||| | | | | 36 <idle>-0 [000] d.h1 54.0293 36 <idle>-0 [000] d.h1 54.029328: #1 context irq timer_latency 932 ns 37 <...>-867 [000] .... 54.0293 37 <...>-867 [000] .... 54.029339: #1 context thread timer_latency 11700 ns 38 <idle>-0 [001] dNh1 54.0293 38 <idle>-0 [001] dNh1 54.029346: #1 context irq timer_latency 2833 ns 39 <...>-868 [001] .... 54.0293 39 <...>-868 [001] .... 54.029353: #1 context thread timer_latency 9820 ns 40 <idle>-0 [000] d.h1 54.0303 40 <idle>-0 [000] d.h1 54.030328: #2 context irq timer_latency 769 ns 41 <...>-867 [000] .... 54.0303 41 <...>-867 [000] .... 54.030330: #2 context thread timer_latency 3070 ns 42 <idle>-0 [001] d.h1 54.0303 42 <idle>-0 [001] d.h1 54.030344: #2 context irq timer_latency 935 ns 43 <...>-868 [001] .... 54.0303 43 <...>-868 [001] .... 54.030347: #2 context thread timer_latency 4351 ns 44 44 45 45 46 The tracer creates a per-cpu kernel thread wit 46 The tracer creates a per-cpu kernel thread with real-time priority that 47 prints two lines at every activation. The firs 47 prints two lines at every activation. The first is the *timer latency* 48 observed at the *hardirq* context before the a 48 observed at the *hardirq* context before the activation of the thread. 49 The second is the *timer latency* observed by 49 The second is the *timer latency* observed by the thread. The ACTIVATION 50 ID field serves to relate the *irq* execution 50 ID field serves to relate the *irq* execution to its respective *thread* 51 execution. 51 execution. 52 52 53 The *irq*/*thread* splitting is important to c !! 53 The *irq*/*thread* splitting is important to clarify at which context 54 the unexpected high value is coming from. The 54 the unexpected high value is coming from. The *irq* context can be 55 delayed by hardware-related actions, such as S !! 55 delayed by hardware related actions, such as SMIs, NMIs, IRQs 56 or by thread masking interrupts. Once the time !! 56 or by a thread masking interrupts. Once the timer happens, the delay 57 can also be influenced by blocking caused by t 57 can also be influenced by blocking caused by threads. For example, by 58 postponing the scheduler execution via preempt !! 58 postponing the scheduler execution via preempt_disable(), by the 59 execution, or masking interrupts. Threads can !! 59 scheduler execution, or by masking interrupts. Threads can 60 interference from other threads and IRQs. !! 60 also be delayed by the interference from other threads and IRQs. 61 61 62 Tracer options 62 Tracer options 63 --------------------- 63 --------------------- 64 64 65 The timerlat tracer is built on top of osnoise 65 The timerlat tracer is built on top of osnoise tracer. 66 So its configuration is also done in the osnoi 66 So its configuration is also done in the osnoise/ config 67 directory. The timerlat configs are: 67 directory. The timerlat configs are: 68 68 69 - cpus: CPUs at which a timerlat thread will 69 - cpus: CPUs at which a timerlat thread will execute. 70 - timerlat_period_us: the period of the timer 70 - timerlat_period_us: the period of the timerlat thread. 71 - stop_tracing_us: stop the system tracing if !! 71 - osnoise/stop_tracing_us: stop the system tracing if a 72 timer latency at the *irq* context higher t 72 timer latency at the *irq* context higher than the configured 73 value happens. Writing 0 disables this opti 73 value happens. Writing 0 disables this option. 74 - stop_tracing_total_us: stop the system trac 74 - stop_tracing_total_us: stop the system tracing if a 75 timer latency at the *thread* context is hi !! 75 timer latency at the *thread* context higher than the configured 76 value happens. Writing 0 disables this opti 76 value happens. Writing 0 disables this option. 77 - print_stack: save the stack of the IRQ occu !! 77 - print_stack: save the stack of the IRQ ocurrence, and print 78 after the *thread context* event, or at the !! 78 it afte the *thread context* event". 79 is hit. << 80 79 81 timerlat and osnoise 80 timerlat and osnoise 82 ---------------------------- 81 ---------------------------- 83 82 84 The timerlat can also take advantage of the os 83 The timerlat can also take advantage of the osnoise: traceevents. 85 For example:: 84 For example:: 86 85 87 [root@f32 ~]# cd /sys/kernel/tracing/ 86 [root@f32 ~]# cd /sys/kernel/tracing/ 88 [root@f32 tracing]# echo timerlat > cu 87 [root@f32 tracing]# echo timerlat > current_tracer 89 [root@f32 tracing]# echo 1 > events/os 88 [root@f32 tracing]# echo 1 > events/osnoise/enable 90 [root@f32 tracing]# echo 25 > osnoise/ 89 [root@f32 tracing]# echo 25 > osnoise/stop_tracing_total_us 91 [root@f32 tracing]# tail -10 trace 90 [root@f32 tracing]# tail -10 trace 92 cc1-87882 [005] d..h... 548.7 91 cc1-87882 [005] d..h... 548.771078: #402268 context irq timer_latency 13585 ns 93 cc1-87882 [005] dNLh1.. 548.7 92 cc1-87882 [005] dNLh1.. 548.771082: irq_noise: local_timer:236 start 548.771077442 duration 7597 ns 94 cc1-87882 [005] dNLh2.. 548.7 93 cc1-87882 [005] dNLh2.. 548.771099: irq_noise: qxl:21 start 548.771085017 duration 7139 ns 95 cc1-87882 [005] d...3.. 548.7 94 cc1-87882 [005] d...3.. 548.771102: thread_noise: cc1:87882 start 548.771078243 duration 9909 ns 96 timerlat/5-1035 [005] ....... 548.7 95 timerlat/5-1035 [005] ....... 548.771104: #402268 context thread timer_latency 39960 ns 97 96 98 In this case, the root cause of the timer late 97 In this case, the root cause of the timer latency does not point to a 99 single cause but to multiple ones. Firstly, th !! 98 single cause, but to multiple ones. Firstly, the timer IRQ was delayed 100 for 13 us, which may point to a long IRQ disab 99 for 13 us, which may point to a long IRQ disabled section (see IRQ 101 stacktrace section). Then the timer interrupt 100 stacktrace section). Then the timer interrupt that wakes up the timerlat 102 thread took 7597 ns, and the qxl:21 device IRQ 101 thread took 7597 ns, and the qxl:21 device IRQ took 7139 ns. Finally, 103 the cc1 thread noise took 9909 ns of time befo 102 the cc1 thread noise took 9909 ns of time before the context switch. 104 Such pieces of evidence are useful for the dev 103 Such pieces of evidence are useful for the developer to use other 105 tracing methods to figure out how to debug and 104 tracing methods to figure out how to debug and optimize the system. 106 105 107 It is worth mentioning that the *duration* val 106 It is worth mentioning that the *duration* values reported 108 by the osnoise: events are *net* values. For e 107 by the osnoise: events are *net* values. For example, the 109 thread_noise does not include the duration of 108 thread_noise does not include the duration of the overhead caused 110 by the IRQ execution (which indeed accounted f 109 by the IRQ execution (which indeed accounted for 12736 ns). But 111 the values reported by the timerlat tracer (ti 110 the values reported by the timerlat tracer (timerlat_latency) 112 are *gross* values. 111 are *gross* values. 113 112 114 The art below illustrates a CPU timeline and h 113 The art below illustrates a CPU timeline and how the timerlat tracer 115 observes it at the top and the osnoise: events 114 observes it at the top and the osnoise: events at the bottom. Each "-" 116 in the timelines means circa 1 us, and the tim 115 in the timelines means circa 1 us, and the time moves ==>:: 117 116 118 External timer irq 117 External timer irq thread 119 clock latency 118 clock latency latency 120 event 13585 ns 119 event 13585 ns 39960 ns 121 | ^ 120 | ^ ^ 122 v | 121 v | | 123 |-------------| 122 |-------------| | 124 |-------------+---------------------- 123 |-------------+-------------------------| 125 ^ 124 ^ ^ 126 ============================================ 125 ======================================================================== 127 [tmr irq] [dev irq] 126 [tmr irq] [dev irq] 128 [another thread...^ v..^ v...... 127 [another thread...^ v..^ v.......][timerlat/ thread] <-- CPU timeline 129 ============================================ 128 ========================================================================= 130 |-------| |-------| 129 |-------| |-------| 131 |--^ v------ 130 |--^ v-------| 132 | | 131 | | | 133 | | 132 | | + thread_noise: 9909 ns 134 | +-> irq 133 | +-> irq_noise: 6139 ns 135 +-> irq_noise: 759 134 +-> irq_noise: 7597 ns 136 135 137 IRQ stacktrace 136 IRQ stacktrace 138 --------------------------- 137 --------------------------- 139 138 140 The osnoise/print_stack option is helpful for 139 The osnoise/print_stack option is helpful for the cases in which a thread 141 noise causes the major factor for the timer la 140 noise causes the major factor for the timer latency, because of preempt or 142 irq disabled. For example:: 141 irq disabled. For example:: 143 142 144 [root@f32 tracing]# echo 500 > osnoise 143 [root@f32 tracing]# echo 500 > osnoise/stop_tracing_total_us 145 [root@f32 tracing]# echo 500 > osnoise 144 [root@f32 tracing]# echo 500 > osnoise/print_stack 146 [root@f32 tracing]# echo timerlat > cu 145 [root@f32 tracing]# echo timerlat > current_tracer 147 [root@f32 tracing]# tail -21 per_cpu/c 146 [root@f32 tracing]# tail -21 per_cpu/cpu7/trace 148 insmod-1026 [007] dN.h1.. 200.2 147 insmod-1026 [007] dN.h1.. 200.201948: irq_noise: local_timer:236 start 200.201939376 duration 7872 ns 149 insmod-1026 [007] d..h1.. 200.2 148 insmod-1026 [007] d..h1.. 200.202587: #29800 context irq timer_latency 1616 ns 150 insmod-1026 [007] dN.h2.. 200.2 149 insmod-1026 [007] dN.h2.. 200.202598: irq_noise: local_timer:236 start 200.202586162 duration 11855 ns 151 insmod-1026 [007] dN.h3.. 200.2 150 insmod-1026 [007] dN.h3.. 200.202947: irq_noise: local_timer:236 start 200.202939174 duration 7318 ns 152 insmod-1026 [007] d...3.. 200.2 151 insmod-1026 [007] d...3.. 200.203444: thread_noise: insmod:1026 start 200.202586933 duration 838681 ns 153 timerlat/7-1001 [007] ....... 200.2 152 timerlat/7-1001 [007] ....... 200.203445: #29800 context thread timer_latency 859978 ns 154 timerlat/7-1001 [007] ....1.. 200.2 153 timerlat/7-1001 [007] ....1.. 200.203446: <stack trace> 155 => timerlat_irq 154 => timerlat_irq 156 => __hrtimer_run_queues 155 => __hrtimer_run_queues 157 => hrtimer_interrupt 156 => hrtimer_interrupt 158 => __sysvec_apic_timer_interrupt 157 => __sysvec_apic_timer_interrupt 159 => asm_call_irq_on_stack 158 => asm_call_irq_on_stack 160 => sysvec_apic_timer_interrupt 159 => sysvec_apic_timer_interrupt 161 => asm_sysvec_apic_timer_interrupt 160 => asm_sysvec_apic_timer_interrupt 162 => delay_tsc 161 => delay_tsc 163 => dummy_load_1ms_pd_init 162 => dummy_load_1ms_pd_init 164 => do_one_initcall 163 => do_one_initcall 165 => do_init_module 164 => do_init_module 166 => __do_sys_finit_module 165 => __do_sys_finit_module 167 => do_syscall_64 166 => do_syscall_64 168 => entry_SYSCALL_64_after_hwframe 167 => entry_SYSCALL_64_after_hwframe 169 168 170 In this case, it is possible to see that the t 169 In this case, it is possible to see that the thread added the highest 171 contribution to the *timer latency* and the st 170 contribution to the *timer latency* and the stack trace, saved during 172 the timerlat IRQ handler, points to a function 171 the timerlat IRQ handler, points to a function named 173 dummy_load_1ms_pd_init, which had the followin 172 dummy_load_1ms_pd_init, which had the following code (on purpose):: 174 173 175 static int __init dummy_load_1ms_pd_in 174 static int __init dummy_load_1ms_pd_init(void) 176 { 175 { 177 preempt_disable(); 176 preempt_disable(); 178 mdelay(1); 177 mdelay(1); 179 preempt_enable(); 178 preempt_enable(); 180 return 0; 179 return 0; 181 180 182 } 181 } 183 << 184 User-space interface << 185 --------------------------- << 186 << 187 Timerlat allows user-space threads to use time << 188 measure scheduling latency. This interface is << 189 file descriptor inside $tracing_dir/osnoise/pe << 190 << 191 This interface is accessible under the followi << 192 << 193 - timerlat tracer is enable << 194 - osnoise workload option is set to NO_OSNOIS << 195 - The user-space thread is affined to a singl << 196 - The thread opens the file associated with i << 197 - Only one thread can access the file at a ti << 198 << 199 The open() syscall will fail if any of these c << 200 After opening the file descriptor, the user sp << 201 << 202 The read() system call will run a timerlat cod << 203 timer in the future and wait for it as the reg << 204 << 205 When the timer IRQ fires, the timerlat IRQ wil << 206 IRQ latency and wake up the thread waiting in << 207 scheduled and report the thread latency via tr << 208 thread. << 209 << 210 The difference from the in-kernel timerlat is << 211 the timer, timerlat will return to the read() << 212 the user can run any code. << 213 << 214 If the application rereads the file timerlat f << 215 will report the return from user-space latency << 216 latency. If this is the end of the work, it ca << 217 response time for the request. << 218 << 219 After reporting the total latency, timerlat wi << 220 a timer, and go to sleep for the following act << 221 << 222 If at any time one of the conditions is broken << 223 while in user space, or the timerlat tracer is << 224 signal will be sent to the user-space thread. << 225 << 226 Here is an basic example of user-space code fo << 227 << 228 int main(void) << 229 { << 230 char buffer[1024]; << 231 int timerlat_fd; << 232 int retval; << 233 long cpu = 0; /* place in CPU 0 */ << 234 cpu_set_t set; << 235 << 236 CPU_ZERO(&set); << 237 CPU_SET(cpu, &set); << 238 << 239 if (sched_setaffinity(gettid(), sizeof << 240 return 1; << 241 << 242 snprintf(buffer, sizeof(buffer), << 243 "/sys/kernel/tracing/osnoise/p << 244 cpu); << 245 << 246 timerlat_fd = open(buffer, O_RDONLY); << 247 if (timerlat_fd < 0) { << 248 printf("error opening %s: %s\n << 249 exit(1); << 250 } << 251 << 252 for (;;) { << 253 retval = read(timerlat_fd, buf << 254 if (retval < 0) << 255 break; << 256 } << 257 << 258 close(timerlat_fd); << 259 exit(0); << 260 } <<
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.