1 ============= 1 ============= 2 Event Tracing 2 Event Tracing 3 ============= 3 ============= 4 4 5 :Author: Theodore Ts'o 5 :Author: Theodore Ts'o 6 :Updated: Li Zefan and Tom Zanussi 6 :Updated: Li Zefan and Tom Zanussi 7 7 8 1. Introduction 8 1. Introduction 9 =============== 9 =============== 10 10 11 Tracepoints (see Documentation/trace/tracepoin 11 Tracepoints (see Documentation/trace/tracepoints.rst) can be used 12 without creating custom kernel modules to regi 12 without creating custom kernel modules to register probe functions 13 using the event tracing infrastructure. 13 using the event tracing infrastructure. 14 14 15 Not all tracepoints can be traced using the ev 15 Not all tracepoints can be traced using the event tracing system; 16 the kernel developer must provide code snippet 16 the kernel developer must provide code snippets which define how the 17 tracing information is saved into the tracing 17 tracing information is saved into the tracing buffer, and how the 18 tracing information should be printed. 18 tracing information should be printed. 19 19 20 2. Using Event Tracing 20 2. Using Event Tracing 21 ====================== 21 ====================== 22 22 23 2.1 Via the 'set_event' interface 23 2.1 Via the 'set_event' interface 24 --------------------------------- 24 --------------------------------- 25 25 26 The events which are available for tracing can 26 The events which are available for tracing can be found in the file 27 /sys/kernel/tracing/available_events. !! 27 /sys/kernel/debug/tracing/available_events. 28 28 29 To enable a particular event, such as 'sched_w 29 To enable a particular event, such as 'sched_wakeup', simply echo it 30 to /sys/kernel/tracing/set_event. For example: !! 30 to /sys/kernel/debug/tracing/set_event. For example:: 31 31 32 # echo sched_wakeup >> /sys/kernel/tra !! 32 # echo sched_wakeup >> /sys/kernel/debug/tracing/set_event 33 33 34 .. Note:: '>>' is necessary, otherwise it will 34 .. Note:: '>>' is necessary, otherwise it will firstly disable all the events. 35 35 36 To disable an event, echo the event name to th 36 To disable an event, echo the event name to the set_event file prefixed 37 with an exclamation point:: 37 with an exclamation point:: 38 38 39 # echo '!sched_wakeup' >> /sys/kernel/ !! 39 # echo '!sched_wakeup' >> /sys/kernel/debug/tracing/set_event 40 40 41 To disable all events, echo an empty line to t 41 To disable all events, echo an empty line to the set_event file:: 42 42 43 # echo > /sys/kernel/tracing/set_event !! 43 # echo > /sys/kernel/debug/tracing/set_event 44 44 45 To enable all events, echo ``*:*`` or ``*:`` t 45 To enable all events, echo ``*:*`` or ``*:`` to the set_event file:: 46 46 47 # echo *:* > /sys/kernel/tracing/set_e !! 47 # echo *:* > /sys/kernel/debug/tracing/set_event 48 48 49 The events are organized into subsystems, such 49 The events are organized into subsystems, such as ext4, irq, sched, 50 etc., and a full event name looks like this: < 50 etc., and a full event name looks like this: <subsystem>:<event>. The 51 subsystem name is optional, but it is displaye 51 subsystem name is optional, but it is displayed in the available_events 52 file. All of the events in a subsystem can be 52 file. All of the events in a subsystem can be specified via the syntax 53 ``<subsystem>:*``; for example, to enable all 53 ``<subsystem>:*``; for example, to enable all irq events, you can use the 54 command:: 54 command:: 55 55 56 # echo 'irq:*' > /sys/kernel/tracing/s !! 56 # echo 'irq:*' > /sys/kernel/debug/tracing/set_event 57 57 58 2.2 Via the 'enable' toggle 58 2.2 Via the 'enable' toggle 59 --------------------------- 59 --------------------------- 60 60 61 The events available are also listed in /sys/k !! 61 The events available are also listed in /sys/kernel/debug/tracing/events/ hierarchy 62 of directories. 62 of directories. 63 63 64 To enable event 'sched_wakeup':: 64 To enable event 'sched_wakeup':: 65 65 66 # echo 1 > /sys/kernel/tracing/events/ !! 66 # echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable 67 67 68 To disable it:: 68 To disable it:: 69 69 70 # echo 0 > /sys/kernel/tracing/events/ !! 70 # echo 0 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable 71 71 72 To enable all events in sched subsystem:: 72 To enable all events in sched subsystem:: 73 73 74 # echo 1 > /sys/kernel/tracing/events/ !! 74 # echo 1 > /sys/kernel/debug/tracing/events/sched/enable 75 75 76 To enable all events:: 76 To enable all events:: 77 77 78 # echo 1 > /sys/kernel/tracing/events/ !! 78 # echo 1 > /sys/kernel/debug/tracing/events/enable 79 79 80 When reading one of these enable files, there 80 When reading one of these enable files, there are four results: 81 81 82 - 0 - all events this file affects are disabl 82 - 0 - all events this file affects are disabled 83 - 1 - all events this file affects are enable 83 - 1 - all events this file affects are enabled 84 - X - there is a mixture of events enabled an 84 - X - there is a mixture of events enabled and disabled 85 - ? - this file does not affect any event 85 - ? - this file does not affect any event 86 86 87 2.3 Boot option 87 2.3 Boot option 88 --------------- 88 --------------- 89 89 90 In order to facilitate early boot debugging, u 90 In order to facilitate early boot debugging, use boot option:: 91 91 92 trace_event=[event-list] 92 trace_event=[event-list] 93 93 94 event-list is a comma separated list of events 94 event-list is a comma separated list of events. See section 2.1 for event 95 format. 95 format. 96 96 97 3. Defining an event-enabled tracepoint 97 3. Defining an event-enabled tracepoint 98 ======================================= 98 ======================================= 99 99 100 See The example provided in samples/trace_even 100 See The example provided in samples/trace_events 101 101 102 4. Event formats 102 4. Event formats 103 ================ 103 ================ 104 104 105 Each trace event has a 'format' file associate 105 Each trace event has a 'format' file associated with it that contains 106 a description of each field in a logged event. 106 a description of each field in a logged event. This information can 107 be used to parse the binary trace stream, and 107 be used to parse the binary trace stream, and is also the place to 108 find the field names that can be used in event 108 find the field names that can be used in event filters (see section 5). 109 109 110 It also displays the format string that will b 110 It also displays the format string that will be used to print the 111 event in text mode, along with the event name 111 event in text mode, along with the event name and ID used for 112 profiling. 112 profiling. 113 113 114 Every event has a set of ``common`` fields ass 114 Every event has a set of ``common`` fields associated with it; these are 115 the fields prefixed with ``common_``. The oth 115 the fields prefixed with ``common_``. The other fields vary between 116 events and correspond to the fields defined in 116 events and correspond to the fields defined in the TRACE_EVENT 117 definition for that event. 117 definition for that event. 118 118 119 Each field in the format has the form:: 119 Each field in the format has the form:: 120 120 121 field:field-type field-name; offset:N; si 121 field:field-type field-name; offset:N; size:N; 122 122 123 where offset is the offset of the field in the 123 where offset is the offset of the field in the trace record and size 124 is the size of the data item, in bytes. 124 is the size of the data item, in bytes. 125 125 126 For example, here's the information displayed 126 For example, here's the information displayed for the 'sched_wakeup' 127 event:: 127 event:: 128 128 129 # cat /sys/kernel/tracing/events/sched !! 129 # cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/format 130 130 131 name: sched_wakeup 131 name: sched_wakeup 132 ID: 60 132 ID: 60 133 format: 133 format: 134 field:unsigned short common_ty 134 field:unsigned short common_type; offset:0; size:2; 135 field:unsigned char common_fla 135 field:unsigned char common_flags; offset:2; size:1; 136 field:unsigned char common_pre 136 field:unsigned char common_preempt_count; offset:3; size:1; 137 field:int common_pid; offset 137 field:int common_pid; offset:4; size:4; 138 field:int common_tgid; offset 138 field:int common_tgid; offset:8; size:4; 139 139 140 field:char comm[TASK_COMM_LEN] 140 field:char comm[TASK_COMM_LEN]; offset:12; size:16; 141 field:pid_t pid; offset 141 field:pid_t pid; offset:28; size:4; 142 field:int prio; offset:32; 142 field:int prio; offset:32; size:4; 143 field:int success; offset 143 field:int success; offset:36; size:4; 144 field:int cpu; offset:40; 144 field:int cpu; offset:40; size:4; 145 145 146 print fmt: "task %s:%d [%d] success=%d 146 print fmt: "task %s:%d [%d] success=%d [%03d]", REC->comm, REC->pid, 147 REC->prio, REC->success, RE 147 REC->prio, REC->success, REC->cpu 148 148 149 This event contains 10 fields, the first 5 com 149 This event contains 10 fields, the first 5 common and the remaining 5 150 event-specific. All the fields for this event 150 event-specific. All the fields for this event are numeric, except for 151 'comm' which is a string, a distinction import 151 'comm' which is a string, a distinction important for event filtering. 152 152 153 5. Event filtering 153 5. Event filtering 154 ================== 154 ================== 155 155 156 Trace events can be filtered in the kernel by 156 Trace events can be filtered in the kernel by associating boolean 157 'filter expressions' with them. As soon as an 157 'filter expressions' with them. As soon as an event is logged into 158 the trace buffer, its fields are checked again 158 the trace buffer, its fields are checked against the filter expression 159 associated with that event type. An event wit 159 associated with that event type. An event with field values that 160 'match' the filter will appear in the trace ou 160 'match' the filter will appear in the trace output, and an event whose 161 values don't match will be discarded. An even 161 values don't match will be discarded. An event with no filter 162 associated with it matches everything, and is 162 associated with it matches everything, and is the default when no 163 filter has been set for an event. 163 filter has been set for an event. 164 164 165 5.1 Expression syntax 165 5.1 Expression syntax 166 --------------------- 166 --------------------- 167 167 168 A filter expression consists of one or more 'p 168 A filter expression consists of one or more 'predicates' that can be 169 combined using the logical operators '&&' and 169 combined using the logical operators '&&' and '||'. A predicate is 170 simply a clause that compares the value of a f 170 simply a clause that compares the value of a field contained within a 171 logged event with a constant value and returns 171 logged event with a constant value and returns either 0 or 1 depending 172 on whether the field value matched (1) or didn 172 on whether the field value matched (1) or didn't match (0):: 173 173 174 field-name relational-operator value 174 field-name relational-operator value 175 175 176 Parentheses can be used to provide arbitrary l 176 Parentheses can be used to provide arbitrary logical groupings and 177 double-quotes can be used to prevent the shell 177 double-quotes can be used to prevent the shell from interpreting 178 operators as shell metacharacters. 178 operators as shell metacharacters. 179 179 180 The field-names available for use in filters c 180 The field-names available for use in filters can be found in the 181 'format' files for trace events (see section 4 181 'format' files for trace events (see section 4). 182 182 183 The relational-operators depend on the type of 183 The relational-operators depend on the type of the field being tested: 184 184 185 The operators available for numeric fields are 185 The operators available for numeric fields are: 186 186 187 ==, !=, <, <=, >, >=, & 187 ==, !=, <, <=, >, >=, & 188 188 189 And for string fields they are: 189 And for string fields they are: 190 190 191 ==, !=, ~ 191 ==, !=, ~ 192 192 193 The glob (~) accepts a wild card character (\* 193 The glob (~) accepts a wild card character (\*,?) and character classes 194 ([). For example:: 194 ([). For example:: 195 195 196 prev_comm ~ "*sh" 196 prev_comm ~ "*sh" 197 prev_comm ~ "sh*" 197 prev_comm ~ "sh*" 198 prev_comm ~ "*sh*" 198 prev_comm ~ "*sh*" 199 prev_comm ~ "ba*sh" 199 prev_comm ~ "ba*sh" 200 200 201 If the field is a pointer that points into use << 202 "filename" from sys_enter_openat), then you ha << 203 field name:: << 204 << 205 filename.ustring ~ "password" << 206 << 207 As the kernel will have to know how to retriev << 208 is at from user space. << 209 << 210 You can convert any long type to a function ad << 211 << 212 call_site.function == security_prepare_creds << 213 << 214 The above will filter when the field "call_sit << 215 "security_prepare_creds". That is, it will com << 216 the filter will return true if it is greater t << 217 the function "security_prepare_creds" and less << 218 << 219 The ".function" postfix can only be attached t << 220 be compared with "==" or "!=". << 221 << 222 Cpumask fields or scalar fields that encode a << 223 a user-provided cpumask in cpulist format. The << 224 << 225 CPUS{$cpulist} << 226 << 227 Operators available to cpumask filtering are: << 228 << 229 & (intersection), ==, != << 230 << 231 For example, this will filter events that have << 232 in the given cpumask:: << 233 << 234 target_cpu & CPUS{17-42} << 235 << 236 5.2 Setting filters 201 5.2 Setting filters 237 ------------------- 202 ------------------- 238 203 239 A filter for an individual event is set by wri 204 A filter for an individual event is set by writing a filter expression 240 to the 'filter' file for the given event. 205 to the 'filter' file for the given event. 241 206 242 For example:: 207 For example:: 243 208 244 # cd /sys/kernel/tracing/events/sched/ !! 209 # cd /sys/kernel/debug/tracing/events/sched/sched_wakeup 245 # echo "common_preempt_count > 4" > fi 210 # echo "common_preempt_count > 4" > filter 246 211 247 A slightly more involved example:: 212 A slightly more involved example:: 248 213 249 # cd /sys/kernel/tracing/events/signal !! 214 # cd /sys/kernel/debug/tracing/events/signal/signal_generate 250 # echo "((sig >= 10 && sig < 15) || si 215 # echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter 251 216 252 If there is an error in the expression, you'll 217 If there is an error in the expression, you'll get an 'Invalid 253 argument' error when setting it, and the erron 218 argument' error when setting it, and the erroneous string along with 254 an error message can be seen by looking at the 219 an error message can be seen by looking at the filter e.g.:: 255 220 256 # cd /sys/kernel/tracing/events/signal !! 221 # cd /sys/kernel/debug/tracing/events/signal/signal_generate 257 # echo "((sig >= 10 && sig < 15) || ds 222 # echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter 258 -bash: echo: write error: Invalid argu 223 -bash: echo: write error: Invalid argument 259 # cat filter 224 # cat filter 260 ((sig >= 10 && sig < 15) || dsig == 17 225 ((sig >= 10 && sig < 15) || dsig == 17) && comm != bash 261 ^ 226 ^ 262 parse_error: Field not found 227 parse_error: Field not found 263 228 264 Currently the caret ('^') for an error always 229 Currently the caret ('^') for an error always appears at the beginning of 265 the filter string; the error message should st 230 the filter string; the error message should still be useful though 266 even without more accurate position info. 231 even without more accurate position info. 267 232 268 5.2.1 Filter limitations << 269 ------------------------ << 270 << 271 If a filter is placed on a string pointer ``(c << 272 to a string on the ring buffer, but instead po << 273 memory, then, for safety reasons, at most 1024 << 274 copied onto a temporary buffer to do the compa << 275 faults (the pointer points to memory that shou << 276 string compare will be treated as not matching << 277 << 278 5.3 Clearing filters 233 5.3 Clearing filters 279 -------------------- 234 -------------------- 280 235 281 To clear the filter for an event, write a '0' 236 To clear the filter for an event, write a '0' to the event's filter 282 file. 237 file. 283 238 284 To clear the filters for all events in a subsy 239 To clear the filters for all events in a subsystem, write a '0' to the 285 subsystem's filter file. 240 subsystem's filter file. 286 241 287 5.4 Subsystem filters !! 242 5.3 Subsystem filters 288 --------------------- 243 --------------------- 289 244 290 For convenience, filters for every event in a 245 For convenience, filters for every event in a subsystem can be set or 291 cleared as a group by writing a filter express 246 cleared as a group by writing a filter expression into the filter file 292 at the root of the subsystem. Note however, t 247 at the root of the subsystem. Note however, that if a filter for any 293 event within the subsystem lacks a field speci 248 event within the subsystem lacks a field specified in the subsystem 294 filter, or if the filter can't be applied for 249 filter, or if the filter can't be applied for any other reason, the 295 filter for that event will retain its previous 250 filter for that event will retain its previous setting. This can 296 result in an unintended mixture of filters whi 251 result in an unintended mixture of filters which could lead to 297 confusing (to the user who might think differe 252 confusing (to the user who might think different filters are in 298 effect) trace output. Only filters that refer 253 effect) trace output. Only filters that reference just the common 299 fields can be guaranteed to propagate successf 254 fields can be guaranteed to propagate successfully to all events. 300 255 301 Here are a few subsystem filter examples that 256 Here are a few subsystem filter examples that also illustrate the 302 above points: 257 above points: 303 258 304 Clear the filters on all events in the sched s 259 Clear the filters on all events in the sched subsystem:: 305 260 306 # cd /sys/kernel/tracing/events/sched !! 261 # cd /sys/kernel/debug/tracing/events/sched 307 # echo 0 > filter 262 # echo 0 > filter 308 # cat sched_switch/filter 263 # cat sched_switch/filter 309 none 264 none 310 # cat sched_wakeup/filter 265 # cat sched_wakeup/filter 311 none 266 none 312 267 313 Set a filter using only common fields for all 268 Set a filter using only common fields for all events in the sched 314 subsystem (all events end up with the same fil 269 subsystem (all events end up with the same filter):: 315 270 316 # cd /sys/kernel/tracing/events/sched !! 271 # cd /sys/kernel/debug/tracing/events/sched 317 # echo common_pid == 0 > filter 272 # echo common_pid == 0 > filter 318 # cat sched_switch/filter 273 # cat sched_switch/filter 319 common_pid == 0 274 common_pid == 0 320 # cat sched_wakeup/filter 275 # cat sched_wakeup/filter 321 common_pid == 0 276 common_pid == 0 322 277 323 Attempt to set a filter using a non-common fie 278 Attempt to set a filter using a non-common field for all events in the 324 sched subsystem (all events but those that hav 279 sched subsystem (all events but those that have a prev_pid field retain 325 their old filters):: 280 their old filters):: 326 281 327 # cd /sys/kernel/tracing/events/sched !! 282 # cd /sys/kernel/debug/tracing/events/sched 328 # echo prev_pid == 0 > filter 283 # echo prev_pid == 0 > filter 329 # cat sched_switch/filter 284 # cat sched_switch/filter 330 prev_pid == 0 285 prev_pid == 0 331 # cat sched_wakeup/filter 286 # cat sched_wakeup/filter 332 common_pid == 0 287 common_pid == 0 333 288 334 5.5 PID filtering !! 289 5.4 PID filtering 335 ----------------- 290 ----------------- 336 291 337 The set_event_pid file in the same directory a 292 The set_event_pid file in the same directory as the top events directory 338 exists, will filter all events from tracing an 293 exists, will filter all events from tracing any task that does not have the 339 PID listed in the set_event_pid file. 294 PID listed in the set_event_pid file. 340 :: 295 :: 341 296 342 # cd /sys/kernel/tracing !! 297 # cd /sys/kernel/debug/tracing 343 # echo $$ > set_event_pid 298 # echo $$ > set_event_pid 344 # echo 1 > events/enable 299 # echo 1 > events/enable 345 300 346 Will only trace events for the current task. 301 Will only trace events for the current task. 347 302 348 To add more PIDs without losing the PIDs alrea 303 To add more PIDs without losing the PIDs already included, use '>>'. 349 :: 304 :: 350 305 351 # echo 123 244 1 >> set_event_pid 306 # echo 123 244 1 >> set_event_pid 352 307 353 308 354 6. Event triggers 309 6. Event triggers 355 ================= 310 ================= 356 311 357 Trace events can be made to conditionally invo 312 Trace events can be made to conditionally invoke trigger 'commands' 358 which can take various forms and are described 313 which can take various forms and are described in detail below; 359 examples would be enabling or disabling other 314 examples would be enabling or disabling other trace events or invoking 360 a stack trace whenever the trace event is hit. 315 a stack trace whenever the trace event is hit. Whenever a trace event 361 with attached triggers is invoked, the set of 316 with attached triggers is invoked, the set of trigger commands 362 associated with that event is invoked. Any gi 317 associated with that event is invoked. Any given trigger can 363 additionally have an event filter of the same 318 additionally have an event filter of the same form as described in 364 section 5 (Event filtering) associated with it 319 section 5 (Event filtering) associated with it - the command will only 365 be invoked if the event being invoked passes t 320 be invoked if the event being invoked passes the associated filter. 366 If no filter is associated with the trigger, i 321 If no filter is associated with the trigger, it always passes. 367 322 368 Triggers are added to and removed from a parti 323 Triggers are added to and removed from a particular event by writing 369 trigger expressions to the 'trigger' file for 324 trigger expressions to the 'trigger' file for the given event. 370 325 371 A given event can have any number of triggers 326 A given event can have any number of triggers associated with it, 372 subject to any restrictions that individual co 327 subject to any restrictions that individual commands may have in that 373 regard. 328 regard. 374 329 375 Event triggers are implemented on top of "soft 330 Event triggers are implemented on top of "soft" mode, which means that 376 whenever a trace event has one or more trigger 331 whenever a trace event has one or more triggers associated with it, 377 the event is activated even if it isn't actual 332 the event is activated even if it isn't actually enabled, but is 378 disabled in a "soft" mode. That is, the trace 333 disabled in a "soft" mode. That is, the tracepoint will be called, 379 but just will not be traced, unless of course 334 but just will not be traced, unless of course it's actually enabled. 380 This scheme allows triggers to be invoked even 335 This scheme allows triggers to be invoked even for events that aren't 381 enabled, and also allows the current event fil 336 enabled, and also allows the current event filter implementation to be 382 used for conditionally invoking triggers. 337 used for conditionally invoking triggers. 383 338 384 The syntax for event triggers is roughly based 339 The syntax for event triggers is roughly based on the syntax for 385 set_ftrace_filter 'ftrace filter commands' (se 340 set_ftrace_filter 'ftrace filter commands' (see the 'Filter commands' 386 section of Documentation/trace/ftrace.rst), bu 341 section of Documentation/trace/ftrace.rst), but there are major 387 differences and the implementation isn't curre 342 differences and the implementation isn't currently tied to it in any 388 way, so beware about making generalizations be 343 way, so beware about making generalizations between the two. 389 344 390 .. Note:: !! 345 Note: Writing into trace_marker (See Documentation/trace/ftrace.rst) 391 Writing into trace_marker (See Documentat << 392 can also enable triggers that are written 346 can also enable triggers that are written into 393 /sys/kernel/tracing/events/ftrace/print/t 347 /sys/kernel/tracing/events/ftrace/print/trigger 394 348 395 6.1 Expression syntax 349 6.1 Expression syntax 396 --------------------- 350 --------------------- 397 351 398 Triggers are added by echoing the command to t 352 Triggers are added by echoing the command to the 'trigger' file:: 399 353 400 # echo 'command[:count] [if filter]' > trigg 354 # echo 'command[:count] [if filter]' > trigger 401 355 402 Triggers are removed by echoing the same comma 356 Triggers are removed by echoing the same command but starting with '!' 403 to the 'trigger' file:: 357 to the 'trigger' file:: 404 358 405 # echo '!command[:count] [if filter]' > trig 359 # echo '!command[:count] [if filter]' > trigger 406 360 407 The [if filter] part isn't used in matching co 361 The [if filter] part isn't used in matching commands when removing, so 408 leaving that off in a '!' command will accompl 362 leaving that off in a '!' command will accomplish the same thing as 409 having it in. 363 having it in. 410 364 411 The filter syntax is the same as that describe 365 The filter syntax is the same as that described in the 'Event 412 filtering' section above. 366 filtering' section above. 413 367 414 For ease of use, writing to the trigger file u 368 For ease of use, writing to the trigger file using '>' currently just 415 adds or removes a single trigger and there's n 369 adds or removes a single trigger and there's no explicit '>>' support 416 ('>' actually behaves like '>>') or truncation 370 ('>' actually behaves like '>>') or truncation support to remove all 417 triggers (you have to use '!' for each one add 371 triggers (you have to use '!' for each one added.) 418 372 419 6.2 Supported trigger commands 373 6.2 Supported trigger commands 420 ------------------------------ 374 ------------------------------ 421 375 422 The following commands are supported: 376 The following commands are supported: 423 377 424 - enable_event/disable_event 378 - enable_event/disable_event 425 379 426 These commands can enable or disable another 380 These commands can enable or disable another trace event whenever 427 the triggering event is hit. When these com 381 the triggering event is hit. When these commands are registered, 428 the other trace event is activated, but disa 382 the other trace event is activated, but disabled in a "soft" mode. 429 That is, the tracepoint will be called, but 383 That is, the tracepoint will be called, but just will not be traced. 430 The event tracepoint stays in this mode as l 384 The event tracepoint stays in this mode as long as there's a trigger 431 in effect that can trigger it. 385 in effect that can trigger it. 432 386 433 For example, the following trigger causes km 387 For example, the following trigger causes kmalloc events to be 434 traced when a read system call is entered, a 388 traced when a read system call is entered, and the :1 at the end 435 specifies that this enablement happens only 389 specifies that this enablement happens only once:: 436 390 437 # echo 'enable_event:kmem:kmalloc:1' 391 # echo 'enable_event:kmem:kmalloc:1' > \ 438 /sys/kernel/tracing/events/sysca !! 392 /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger 439 393 440 The following trigger causes kmalloc events 394 The following trigger causes kmalloc events to stop being traced 441 when a read system call exits. This disable 395 when a read system call exits. This disablement happens on every 442 read system call exit:: 396 read system call exit:: 443 397 444 # echo 'disable_event:kmem:kmalloc' 398 # echo 'disable_event:kmem:kmalloc' > \ 445 /sys/kernel/tracing/events/sysca !! 399 /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger 446 400 447 The format is:: 401 The format is:: 448 402 449 enable_event:<system>:<event>[:count] 403 enable_event:<system>:<event>[:count] 450 disable_event:<system>:<event>[:count] 404 disable_event:<system>:<event>[:count] 451 405 452 To remove the above commands:: 406 To remove the above commands:: 453 407 454 # echo '!enable_event:kmem:kmalloc:1 408 # echo '!enable_event:kmem:kmalloc:1' > \ 455 /sys/kernel/tracing/events/sysca !! 409 /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger 456 410 457 # echo '!disable_event:kmem:kmalloc' 411 # echo '!disable_event:kmem:kmalloc' > \ 458 /sys/kernel/tracing/events/sysca !! 412 /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger 459 413 460 Note that there can be any number of enable/ 414 Note that there can be any number of enable/disable_event triggers 461 per triggering event, but there can only be 415 per triggering event, but there can only be one trigger per 462 triggered event. e.g. sys_enter_read can hav 416 triggered event. e.g. sys_enter_read can have triggers enabling both 463 kmem:kmalloc and sched:sched_switch, but can 417 kmem:kmalloc and sched:sched_switch, but can't have two kmem:kmalloc 464 versions such as kmem:kmalloc and kmem:kmall 418 versions such as kmem:kmalloc and kmem:kmalloc:1 or 'kmem:kmalloc if 465 bytes_req == 256' and 'kmem:kmalloc if bytes 419 bytes_req == 256' and 'kmem:kmalloc if bytes_alloc == 256' (they 466 could be combined into a single filter on km 420 could be combined into a single filter on kmem:kmalloc though). 467 421 468 - stacktrace 422 - stacktrace 469 423 470 This command dumps a stacktrace in the trace 424 This command dumps a stacktrace in the trace buffer whenever the 471 triggering event occurs. 425 triggering event occurs. 472 426 473 For example, the following trigger dumps a s 427 For example, the following trigger dumps a stacktrace every time the 474 kmalloc tracepoint is hit:: 428 kmalloc tracepoint is hit:: 475 429 476 # echo 'stacktrace' > \ 430 # echo 'stacktrace' > \ 477 /sys/kernel/tracing/events/kme !! 431 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger 478 432 479 The following trigger dumps a stacktrace the 433 The following trigger dumps a stacktrace the first 5 times a kmalloc 480 request happens with a size >= 64K:: 434 request happens with a size >= 64K:: 481 435 482 # echo 'stacktrace:5 if bytes_req >= 436 # echo 'stacktrace:5 if bytes_req >= 65536' > \ 483 /sys/kernel/tracing/events/kme !! 437 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger 484 438 485 The format is:: 439 The format is:: 486 440 487 stacktrace[:count] 441 stacktrace[:count] 488 442 489 To remove the above commands:: 443 To remove the above commands:: 490 444 491 # echo '!stacktrace' > \ 445 # echo '!stacktrace' > \ 492 /sys/kernel/tracing/events/kme !! 446 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger 493 447 494 # echo '!stacktrace:5 if bytes_req > 448 # echo '!stacktrace:5 if bytes_req >= 65536' > \ 495 /sys/kernel/tracing/events/kme !! 449 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger 496 450 497 The latter can also be removed more simply b 451 The latter can also be removed more simply by the following (without 498 the filter):: 452 the filter):: 499 453 500 # echo '!stacktrace:5' > \ 454 # echo '!stacktrace:5' > \ 501 /sys/kernel/tracing/events/kme !! 455 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger 502 456 503 Note that there can be only one stacktrace t 457 Note that there can be only one stacktrace trigger per triggering 504 event. 458 event. 505 459 506 - snapshot 460 - snapshot 507 461 508 This command causes a snapshot to be trigger 462 This command causes a snapshot to be triggered whenever the 509 triggering event occurs. 463 triggering event occurs. 510 464 511 The following command creates a snapshot eve 465 The following command creates a snapshot every time a block request 512 queue is unplugged with a depth > 1. If you 466 queue is unplugged with a depth > 1. If you were tracing a set of 513 events or functions at the time, the snapsho 467 events or functions at the time, the snapshot trace buffer would 514 capture those events when the trigger event 468 capture those events when the trigger event occurred:: 515 469 516 # echo 'snapshot if nr_rq > 1' > \ 470 # echo 'snapshot if nr_rq > 1' > \ 517 /sys/kernel/tracing/events/blo !! 471 /sys/kernel/debug/tracing/events/block/block_unplug/trigger 518 472 519 To only snapshot once:: 473 To only snapshot once:: 520 474 521 # echo 'snapshot:1 if nr_rq > 1' > \ 475 # echo 'snapshot:1 if nr_rq > 1' > \ 522 /sys/kernel/tracing/events/blo !! 476 /sys/kernel/debug/tracing/events/block/block_unplug/trigger 523 477 524 To remove the above commands:: 478 To remove the above commands:: 525 479 526 # echo '!snapshot if nr_rq > 1' > \ 480 # echo '!snapshot if nr_rq > 1' > \ 527 /sys/kernel/tracing/events/blo !! 481 /sys/kernel/debug/tracing/events/block/block_unplug/trigger 528 482 529 # echo '!snapshot:1 if nr_rq > 1' > 483 # echo '!snapshot:1 if nr_rq > 1' > \ 530 /sys/kernel/tracing/events/blo !! 484 /sys/kernel/debug/tracing/events/block/block_unplug/trigger 531 485 532 Note that there can be only one snapshot tri 486 Note that there can be only one snapshot trigger per triggering 533 event. 487 event. 534 488 535 - traceon/traceoff 489 - traceon/traceoff 536 490 537 These commands turn tracing on and off when 491 These commands turn tracing on and off when the specified events are 538 hit. The parameter determines how many times 492 hit. The parameter determines how many times the tracing system is 539 turned on and off. If unspecified, there is 493 turned on and off. If unspecified, there is no limit. 540 494 541 The following command turns tracing off the 495 The following command turns tracing off the first time a block 542 request queue is unplugged with a depth > 1. 496 request queue is unplugged with a depth > 1. If you were tracing a 543 set of events or functions at the time, you 497 set of events or functions at the time, you could then examine the 544 trace buffer to see the sequence of events t 498 trace buffer to see the sequence of events that led up to the 545 trigger event:: 499 trigger event:: 546 500 547 # echo 'traceoff:1 if nr_rq > 1' > \ 501 # echo 'traceoff:1 if nr_rq > 1' > \ 548 /sys/kernel/tracing/events/blo !! 502 /sys/kernel/debug/tracing/events/block/block_unplug/trigger 549 503 550 To always disable tracing when nr_rq > 1:: 504 To always disable tracing when nr_rq > 1:: 551 505 552 # echo 'traceoff if nr_rq > 1' > \ 506 # echo 'traceoff if nr_rq > 1' > \ 553 /sys/kernel/tracing/events/blo !! 507 /sys/kernel/debug/tracing/events/block/block_unplug/trigger 554 508 555 To remove the above commands:: 509 To remove the above commands:: 556 510 557 # echo '!traceoff:1 if nr_rq > 1' > 511 # echo '!traceoff:1 if nr_rq > 1' > \ 558 /sys/kernel/tracing/events/blo !! 512 /sys/kernel/debug/tracing/events/block/block_unplug/trigger 559 513 560 # echo '!traceoff if nr_rq > 1' > \ 514 # echo '!traceoff if nr_rq > 1' > \ 561 /sys/kernel/tracing/events/blo !! 515 /sys/kernel/debug/tracing/events/block/block_unplug/trigger 562 516 563 Note that there can be only one traceon or t 517 Note that there can be only one traceon or traceoff trigger per 564 triggering event. 518 triggering event. 565 519 566 - hist 520 - hist 567 521 568 This command aggregates event hits into a ha 522 This command aggregates event hits into a hash table keyed on one or 569 more trace event format fields (or stacktrac 523 more trace event format fields (or stacktrace) and a set of running 570 totals derived from one or more trace event 524 totals derived from one or more trace event format fields and/or 571 event counts (hitcount). 525 event counts (hitcount). 572 526 573 See Documentation/trace/histogram.rst for de 527 See Documentation/trace/histogram.rst for details and examples. 574 << 575 7. In-kernel trace event API << 576 ============================ << 577 << 578 In most cases, the command-line interface to t << 579 sufficient. Sometimes, however, applications << 580 more complex relationships than can be express << 581 series of linked command-line expressions, or << 582 commands may be simply too cumbersome. An exa << 583 application that needs to 'listen' to the trac << 584 maintain an in-kernel state machine detecting, << 585 illegal kernel state occurs in the scheduler. << 586 << 587 The trace event subsystem provides an in-kerne << 588 or other kernel code to generate user-defined << 589 will, which can be used to either augment the << 590 and/or signal that a particular important stat << 591 << 592 A similar in-kernel API is also available for << 593 kretprobe events. << 594 << 595 Both the synthetic event and k/ret/probe event << 596 of a lower-level "dynevent_cmd" event command << 597 available for more specialized applications, o << 598 higher-level trace event APIs. << 599 << 600 The API provided for these purposes is describ << 601 following: << 602 << 603 - dynamically creating synthetic event defin << 604 - dynamically creating kprobe and kretprobe << 605 - tracing synthetic events from in-kernel co << 606 - the low-level "dynevent_cmd" API << 607 << 608 7.1 Dyamically creating synthetic event defini << 609 ---------------------------------------------- << 610 << 611 There are a couple ways to create a new synthe << 612 module or other kernel code. << 613 << 614 The first creates the event in one step, using << 615 In this method, the name of the event to creat << 616 the fields is supplied to synth_event_create() << 617 synthetic event with that name and fields will << 618 call. For example, to create a new "schedtest << 619 << 620 ret = synth_event_create("schedtest", sched_ << 621 ARRAY_SIZE(sched_fi << 622 << 623 The sched_fields param in this example points << 624 synth_field_desc, each of which describes an e << 625 name:: << 626 << 627 static struct synth_field_desc sched_fields[ << 628 { .type = "pid_t", .name << 629 { .type = "char[16]", .name << 630 { .type = "u64", .name << 631 { .type = "u64", .name << 632 { .type = "unsigned int", .name << 633 { .type = "char[64]", .name << 634 { .type = "int", .name << 635 }; << 636 << 637 See synth_field_size() for available types. << 638 << 639 If field_name contains [n], the field is consi << 640 << 641 If field_names contains[] (no subscript), the << 642 be a dynamic array, which will only take as mu << 643 is required to hold the array. << 644 << 645 Because space for an event is reserved before << 646 to the event, using dynamic arrays implies tha << 647 in-kernel API described below can't be used wi << 648 other non-piecewise in-kernel APIs can, howeve << 649 arrays. << 650 << 651 If the event is created from within a module, << 652 must be passed to synth_event_create(). This << 653 trace buffer won't contain unreadable events w << 654 removed. << 655 << 656 At this point, the event object is ready to be << 657 events. << 658 << 659 In the second method, the event is created in << 660 allows events to be created dynamically and wi << 661 and populate an array of fields beforehand. << 662 << 663 To use this method, an empty or partially empt << 664 first be created using synth_event_gen_cmd_sta << 665 synth_event_gen_cmd_array_start(). For synth_ << 666 the name of the event along with one or more p << 667 representing a 'type field_name;' field specif << 668 supplied. For synth_event_gen_cmd_array_start << 669 event along with an array of struct synth_fiel << 670 supplied. Before calling synth_event_gen_cmd_s << 671 synth_event_gen_cmd_array_start(), the user sh << 672 initialize a dynevent_cmd object using synth_e << 673 << 674 For example, to create a new "schedtest" synth << 675 fields:: << 676 << 677 struct dynevent_cmd cmd; << 678 char *buf; << 679 << 680 /* Create a buffer to hold the generated com << 681 buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERN << 682 << 683 /* Before generating the command, initialize << 684 synth_event_cmd_init(&cmd, buf, MAX_DYNEVENT << 685 << 686 ret = synth_event_gen_cmd_start(&cmd, "sched << 687 "pid_t", "ne << 688 "u64", "ts_n << 689 << 690 Alternatively, using an array of struct synth_ << 691 containing the same information:: << 692 << 693 ret = synth_event_gen_cmd_array_start(&cmd, << 694 fields << 695 << 696 Once the synthetic event object has been creat << 697 populated with more fields. Fields are added << 698 synth_event_add_field(), supplying the dyneven << 699 type, and a field name. For example, to add a << 700 "intfield", the following call should be made: << 701 << 702 ret = synth_event_add_field(&cmd, "int", "in << 703 << 704 See synth_field_size() for available types. If << 705 the field is considered to be an array. << 706 << 707 A group of fields can also be added all at onc << 708 synth_field_desc with add_synth_fields(). For << 709 just the first four sched_fields:: << 710 << 711 ret = synth_event_add_fields(&cmd, sched_fie << 712 << 713 If you already have a string of the form 'type << 714 synth_event_add_field_str() can be used to add << 715 also automatically append a ';' to the string. << 716 << 717 Once all the fields have been added, the event << 718 registered by calling the synth_event_gen_cmd_ << 719 << 720 ret = synth_event_gen_cmd_end(&cmd); << 721 << 722 At this point, the event object is ready to be << 723 events. << 724 << 725 7.2 Tracing synthetic events from in-kernel co << 726 ---------------------------------------------- << 727 << 728 To trace a synthetic event, there are several << 729 option is to trace the event in one call, usin << 730 with a variable number of values, or synth_eve << 731 array of values to be set. A second option ca << 732 need for a pre-formed array of values or list << 733 synth_event_trace_start() and synth_event_trac << 734 synth_event_add_next_val() or synth_event_add_ << 735 piecewise. << 736 << 737 7.2.1 Tracing a synthetic event all at once << 738 ------------------------------------------- << 739 << 740 To trace a synthetic event all at once, the sy << 741 synth_event_trace_array() functions can be use << 742 << 743 The synth_event_trace() function is passed the << 744 representing the synthetic event (which can be << 745 trace_get_event_file() using the synthetic eve << 746 the system name, and the trace instance name ( << 747 trace array)), along with an variable number o << 748 synthetic event field, and the number of value << 749 << 750 So, to trace an event corresponding to the syn << 751 above, code like the following could be used:: << 752 << 753 ret = synth_event_trace(create_synth_test, 7 << 754 444, /* << 755 (u64)"clackers", /* << 756 1000000, /* << 757 1000, /* << 758 smp_processor_id(),/ << 759 (u64)"Thneed", /* << 760 999); /* << 761 << 762 All vals should be cast to u64, and string val << 763 strings, cast to u64. Strings will be copied << 764 the event for the string, using these pointers << 765 << 766 Alternatively, the synth_event_trace_array() f << 767 accomplish the same thing. It is passed the t << 768 representing the synthetic event (which can be << 769 trace_get_event_file() using the synthetic eve << 770 the system name, and the trace instance name ( << 771 trace array)), along with an array of u64, one << 772 event field. << 773 << 774 To trace an event corresponding to the synthet << 775 above, code like the following could be used:: << 776 << 777 u64 vals[7]; << 778 << 779 vals[0] = 777; /* next_pid_ << 780 vals[1] = (u64)"tiddlywinks"; /* next_comm << 781 vals[2] = 1000000; /* ts_ns */ << 782 vals[3] = 1000; /* ts_ms */ << 783 vals[4] = smp_processor_id(); /* cpu */ << 784 vals[5] = (u64)"thneed"; /* my_string << 785 vals[6] = 398; /* my_int_fi << 786 << 787 The 'vals' array is just an array of u64, the << 788 match the number of field in the synthetic eve << 789 the same order as the synthetic event fields. << 790 << 791 All vals should be cast to u64, and string val << 792 strings, cast to u64. Strings will be copied << 793 the event for the string, using these pointers << 794 << 795 In order to trace a synthetic event, a pointer << 796 is needed. The trace_get_event_file() functio << 797 it - it will find the file in the given trace << 798 NULL since the top trace array is being used) << 799 preventing the instance containing it from goi << 800 << 801 schedtest_event_file = trace_get_event_ << 802 << 803 << 804 Before tracing the event, it should be enabled << 805 the synthetic event won't actually show up in << 806 << 807 To enable a synthetic event from the kernel, t << 808 can be used (which is not specific to syntheti << 809 the "synthetic" system name to be specified ex << 810 << 811 To enable the event, pass 'true' to it:: << 812 << 813 trace_array_set_clr_event(schedtest_eve << 814 "synthetic", << 815 << 816 To disable it pass false:: << 817 << 818 trace_array_set_clr_event(schedtest_eve << 819 "synthetic", << 820 << 821 Finally, synth_event_trace_array() can be used << 822 event, which should be visible in the trace bu << 823 << 824 ret = synth_event_trace_array(schedtest << 825 ARRAY_SIZ << 826 << 827 To remove the synthetic event, the event shoul << 828 trace instance should be 'put' back using trac << 829 << 830 trace_array_set_clr_event(schedtest_eve << 831 "synthetic", << 832 trace_put_event_file(schedtest_event_fi << 833 << 834 If those have been successful, synth_event_del << 835 remove the event:: << 836 << 837 ret = synth_event_delete("schedtest"); << 838 << 839 7.2.2 Tracing a synthetic event piecewise << 840 ----------------------------------------- << 841 << 842 To trace a synthetic using the piecewise metho << 843 synth_event_trace_start() function is used to << 844 event trace:: << 845 << 846 struct synth_event_trace_state trace_st << 847 << 848 ret = synth_event_trace_start(schedtest << 849 << 850 It's passed the trace_event_file representing << 851 using the same methods as described above, alo << 852 struct synth_event_trace_state object, which w << 853 used to maintain state between this and follow << 854 << 855 Once the event has been opened, which means sp << 856 reserved in the trace buffer, the individual f << 857 are two ways to do that, either one after anot << 858 the event, which requires no lookups, or by na << 859 tradeoff is flexibility in doing the assignmen << 860 lookup per field. << 861 << 862 To assign the values one after the other witho << 863 synth_event_add_next_val() should be used. Ea << 864 same synth_event_trace_state object used in th << 865 along with the value to set the next field in << 866 field is set, the 'cursor' points to the next << 867 by the subsequent call, continuing until all t << 868 in order. The same sequence of calls as in th << 869 this method would be (without error-handling c << 870 << 871 /* next_pid_field */ << 872 ret = synth_event_add_next_val(777, &tr << 873 << 874 /* next_comm_field */ << 875 ret = synth_event_add_next_val((u64)"sl << 876 << 877 /* ts_ns */ << 878 ret = synth_event_add_next_val(1000000, << 879 << 880 /* ts_ms */ << 881 ret = synth_event_add_next_val(1000, &t << 882 << 883 /* cpu */ << 884 ret = synth_event_add_next_val(smp_proc << 885 << 886 /* my_string_field */ << 887 ret = synth_event_add_next_val((u64)"th << 888 << 889 /* my_int_field */ << 890 ret = synth_event_add_next_val(395, &tr << 891 << 892 To assign the values in any order, synth_event << 893 used. Each call is passed the same synth_even << 894 the synth_event_trace_start(), along with the << 895 to set and the value to set it to. The same s << 896 the above examples using this method would be << 897 code):: << 898 << 899 ret = synth_event_add_val("next_pid_fie << 900 ret = synth_event_add_val("next_comm_fi << 901 &trace_state) << 902 ret = synth_event_add_val("ts_ns", 1000 << 903 ret = synth_event_add_val("ts_ms", 1000 << 904 ret = synth_event_add_val("cpu", smp_pr << 905 ret = synth_event_add_val("my_string_fi << 906 &trace_state) << 907 ret = synth_event_add_val("my_int_field << 908 << 909 Note that synth_event_add_next_val() and synth << 910 incompatible if used within the same trace of << 911 can be used but not both at the same time. << 912 << 913 Finally, the event won't be actually traced un << 914 which is done using synth_event_trace_end(), w << 915 struct synth_event_trace_state object used in << 916 << 917 ret = synth_event_trace_end(&trace_stat << 918 << 919 Note that synth_event_trace_end() must be call << 920 of whether any of the add calls failed (say du << 921 being passed in). << 922 << 923 7.3 Dyamically creating kprobe and kretprobe e << 924 ---------------------------------------------- << 925 << 926 To create a kprobe or kretprobe trace event fr << 927 kprobe_event_gen_cmd_start() or kretprobe_even << 928 functions can be used. << 929 << 930 To create a kprobe event, an empty or partiall << 931 should first be created using kprobe_event_gen << 932 of the event and the probe location should be << 933 or args each representing a probe field should << 934 function. Before calling kprobe_event_gen_cmd << 935 should create and initialize a dynevent_cmd ob << 936 kprobe_event_cmd_init(). << 937 << 938 For example, to create a new "schedtest" kprob << 939 << 940 struct dynevent_cmd cmd; << 941 char *buf; << 942 << 943 /* Create a buffer to hold the generated com << 944 buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERN << 945 << 946 /* Before generating the command, initialize << 947 kprobe_event_cmd_init(&cmd, buf, MAX_DYNEVEN << 948 << 949 /* << 950 * Define the gen_kprobe_test event with the << 951 * fields. << 952 */ << 953 ret = kprobe_event_gen_cmd_start(&cmd, "gen_ << 954 "dfd=%ax", << 955 << 956 Once the kprobe event object has been created, << 957 populated with more fields. Fields can be add << 958 kprobe_event_add_fields(), supplying the dynev << 959 with a variable arg list of probe fields. For << 960 couple additional fields, the following call c << 961 << 962 ret = kprobe_event_add_fields(&cmd, "flags=% << 963 << 964 Once all the fields have been added, the event << 965 registered by calling the kprobe_event_gen_cmd << 966 kretprobe_event_gen_cmd_end() functions, depen << 967 or kretprobe command was started:: << 968 << 969 ret = kprobe_event_gen_cmd_end(&cmd); << 970 << 971 or:: << 972 << 973 ret = kretprobe_event_gen_cmd_end(&cmd); << 974 << 975 At this point, the event object is ready to be << 976 events. << 977 << 978 Similarly, a kretprobe event can be created us << 979 kretprobe_event_gen_cmd_start() with a probe n << 980 additional params such as $retval:: << 981 << 982 ret = kretprobe_event_gen_cmd_start(&cmd, "g << 983 "do_sys_ << 984 << 985 Similar to the synthetic event case, code like << 986 used to enable the newly created kprobe event: << 987 << 988 gen_kprobe_test = trace_get_event_file(NULL, << 989 << 990 ret = trace_array_set_clr_event(gen_kprobe_t << 991 "kprobes", " << 992 << 993 Finally, also similar to synthetic events, the << 994 used to give the kprobe event file back and de << 995 << 996 trace_put_event_file(gen_kprobe_test); << 997 << 998 ret = kprobe_event_delete("gen_kprobe_test") << 999 << 1000 7.4 The "dynevent_cmd" low-level API << 1001 ------------------------------------ << 1002 << 1003 Both the in-kernel synthetic event and kprobe << 1004 top of a lower-level "dynevent_cmd" interface << 1005 meant to provide the basis for higher-level i << 1006 synthetic and kprobe interfaces, which can be << 1007 << 1008 The basic idea is simple and amounts to provi << 1009 layer that can be used to generate trace even << 1010 generated command strings can then be passed << 1011 and event creation code that already exists i << 1012 subsystem for creating the corresponding trac << 1013 << 1014 In a nutshell, the way it works is that the h << 1015 code creates a struct dynevent_cmd object, th << 1016 functions, dynevent_arg_add() and dynevent_ar << 1017 a command string, which finally causes the co << 1018 using the dynevent_create() function. The de << 1019 are described below. << 1020 << 1021 The first step in building a new command stri << 1022 initialize an instance of a dynevent_cmd. He << 1023 create a dynevent_cmd on the stack and initia << 1024 << 1025 struct dynevent_cmd cmd; << 1026 char *buf; << 1027 int ret; << 1028 << 1029 buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KER << 1030 << 1031 dynevent_cmd_init(cmd, buf, maxlen, DYNEVEN << 1032 foo_event_run_command); << 1033 << 1034 The dynevent_cmd initialization needs to be g << 1035 buffer and the length of the buffer (MAX_DYNE << 1036 for this purpose - at 2k it's generally too b << 1037 on the stack, so is dynamically allocated), a << 1038 is meant to be used to check that further API << 1039 correct command type, and a pointer to an eve << 1040 callback that will be called to actually exec << 1041 command function. << 1042 << 1043 Once that's done, the command string can by b << 1044 calls to argument-adding functions. << 1045 << 1046 To add a single argument, define and initiali << 1047 or struct dynevent_arg_pair object. Here's a << 1048 possible arg addition, which is simply to app << 1049 a whitespace-separated argument to the comman << 1050 << 1051 struct dynevent_arg arg; << 1052 << 1053 dynevent_arg_init(&arg, NULL, 0); << 1054 << 1055 arg.str = name; << 1056 << 1057 ret = dynevent_arg_add(cmd, &arg); << 1058 << 1059 The arg object is first initialized using dyn << 1060 this case the parameters are NULL or 0, which << 1061 optional sanity-checking function or separato << 1062 the arg. << 1063 << 1064 Here's another more complicated example using << 1065 used to create an argument that consists of a << 1066 together as a unit, for example, a 'type fiel << 1067 expression arg e.g. 'flags=%cx':: << 1068 << 1069 struct dynevent_arg_pair arg_pair; << 1070 << 1071 dynevent_arg_pair_init(&arg_pair, dynevent_ << 1072 << 1073 arg_pair.lhs = type; << 1074 arg_pair.rhs = name; << 1075 << 1076 ret = dynevent_arg_pair_add(cmd, &arg_pair) << 1077 << 1078 Again, the arg_pair is first initialized, in << 1079 function used to check the sanity of the args << 1080 neither part of the pair is NULL), along with << 1081 to add an operator between the pair (here non << 1082 appended onto the end of the arg pair (here ' << 1083 << 1084 There's also a dynevent_str_add() function th << 1085 add a string as-is, with no spaces, delimiter << 1086 << 1087 Any number of dynevent_*_add() calls can be m << 1088 (until its length surpasses cmd->maxlen). Wh << 1089 been added and the command string is complete << 1090 do is run the command, which happens by simpl << 1091 dynevent_create():: << 1092 << 1093 ret = dynevent_create(&cmd); << 1094 << 1095 At that point, if the return value is 0, the << 1096 created and is ready to use. << 1097 << 1098 See the dynevent_cmd function definitions the << 1099 of the API. <<
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.