~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/trace/events.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/trace/events.rst (Version linux-6.11.5) and /Documentation/trace/events.rst (Version linux-5.4.284)


  1 =============                                       1 =============
  2 Event Tracing                                       2 Event Tracing
  3 =============                                       3 =============
  4                                                     4 
  5 :Author: Theodore Ts'o                              5 :Author: Theodore Ts'o
  6 :Updated: Li Zefan and Tom Zanussi                  6 :Updated: Li Zefan and Tom Zanussi
  7                                                     7 
  8 1. Introduction                                     8 1. Introduction
  9 ===============                                     9 ===============
 10                                                    10 
 11 Tracepoints (see Documentation/trace/tracepoin     11 Tracepoints (see Documentation/trace/tracepoints.rst) can be used
 12 without creating custom kernel modules to regi     12 without creating custom kernel modules to register probe functions
 13 using the event tracing infrastructure.            13 using the event tracing infrastructure.
 14                                                    14 
 15 Not all tracepoints can be traced using the ev     15 Not all tracepoints can be traced using the event tracing system;
 16 the kernel developer must provide code snippet     16 the kernel developer must provide code snippets which define how the
 17 tracing information is saved into the tracing      17 tracing information is saved into the tracing buffer, and how the
 18 tracing information should be printed.             18 tracing information should be printed.
 19                                                    19 
 20 2. Using Event Tracing                             20 2. Using Event Tracing
 21 ======================                             21 ======================
 22                                                    22 
 23 2.1 Via the 'set_event' interface                  23 2.1 Via the 'set_event' interface
 24 ---------------------------------                  24 ---------------------------------
 25                                                    25 
 26 The events which are available for tracing can     26 The events which are available for tracing can be found in the file
 27 /sys/kernel/tracing/available_events.          !!  27 /sys/kernel/debug/tracing/available_events.
 28                                                    28 
 29 To enable a particular event, such as 'sched_w     29 To enable a particular event, such as 'sched_wakeup', simply echo it
 30 to /sys/kernel/tracing/set_event. For example: !!  30 to /sys/kernel/debug/tracing/set_event. For example::
 31                                                    31 
 32         # echo sched_wakeup >> /sys/kernel/tra !!  32         # echo sched_wakeup >> /sys/kernel/debug/tracing/set_event
 33                                                    33 
 34 .. Note:: '>>' is necessary, otherwise it will     34 .. Note:: '>>' is necessary, otherwise it will firstly disable all the events.
 35                                                    35 
 36 To disable an event, echo the event name to th     36 To disable an event, echo the event name to the set_event file prefixed
 37 with an exclamation point::                        37 with an exclamation point::
 38                                                    38 
 39         # echo '!sched_wakeup' >> /sys/kernel/ !!  39         # echo '!sched_wakeup' >> /sys/kernel/debug/tracing/set_event
 40                                                    40 
 41 To disable all events, echo an empty line to t     41 To disable all events, echo an empty line to the set_event file::
 42                                                    42 
 43         # echo > /sys/kernel/tracing/set_event !!  43         # echo > /sys/kernel/debug/tracing/set_event
 44                                                    44 
 45 To enable all events, echo ``*:*`` or ``*:`` t     45 To enable all events, echo ``*:*`` or ``*:`` to the set_event file::
 46                                                    46 
 47         # echo *:* > /sys/kernel/tracing/set_e !!  47         # echo *:* > /sys/kernel/debug/tracing/set_event
 48                                                    48 
 49 The events are organized into subsystems, such     49 The events are organized into subsystems, such as ext4, irq, sched,
 50 etc., and a full event name looks like this: <     50 etc., and a full event name looks like this: <subsystem>:<event>.  The
 51 subsystem name is optional, but it is displaye     51 subsystem name is optional, but it is displayed in the available_events
 52 file.  All of the events in a subsystem can be     52 file.  All of the events in a subsystem can be specified via the syntax
 53 ``<subsystem>:*``; for example, to enable all      53 ``<subsystem>:*``; for example, to enable all irq events, you can use the
 54 command::                                          54 command::
 55                                                    55 
 56         # echo 'irq:*' > /sys/kernel/tracing/s !!  56         # echo 'irq:*' > /sys/kernel/debug/tracing/set_event
 57                                                    57 
 58 2.2 Via the 'enable' toggle                        58 2.2 Via the 'enable' toggle
 59 ---------------------------                        59 ---------------------------
 60                                                    60 
 61 The events available are also listed in /sys/k !!  61 The events available are also listed in /sys/kernel/debug/tracing/events/ hierarchy
 62 of directories.                                    62 of directories.
 63                                                    63 
 64 To enable event 'sched_wakeup'::                   64 To enable event 'sched_wakeup'::
 65                                                    65 
 66         # echo 1 > /sys/kernel/tracing/events/ !!  66         # echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
 67                                                    67 
 68 To disable it::                                    68 To disable it::
 69                                                    69 
 70         # echo 0 > /sys/kernel/tracing/events/ !!  70         # echo 0 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
 71                                                    71 
 72 To enable all events in sched subsystem::          72 To enable all events in sched subsystem::
 73                                                    73 
 74         # echo 1 > /sys/kernel/tracing/events/ !!  74         # echo 1 > /sys/kernel/debug/tracing/events/sched/enable
 75                                                    75 
 76 To enable all events::                             76 To enable all events::
 77                                                    77 
 78         # echo 1 > /sys/kernel/tracing/events/ !!  78         # echo 1 > /sys/kernel/debug/tracing/events/enable
 79                                                    79 
 80 When reading one of these enable files, there      80 When reading one of these enable files, there are four results:
 81                                                    81 
 82  - 0 - all events this file affects are disabl     82  - 0 - all events this file affects are disabled
 83  - 1 - all events this file affects are enable     83  - 1 - all events this file affects are enabled
 84  - X - there is a mixture of events enabled an     84  - X - there is a mixture of events enabled and disabled
 85  - ? - this file does not affect any event         85  - ? - this file does not affect any event
 86                                                    86 
 87 2.3 Boot option                                    87 2.3 Boot option
 88 ---------------                                    88 ---------------
 89                                                    89 
 90 In order to facilitate early boot debugging, u     90 In order to facilitate early boot debugging, use boot option::
 91                                                    91 
 92         trace_event=[event-list]                   92         trace_event=[event-list]
 93                                                    93 
 94 event-list is a comma separated list of events     94 event-list is a comma separated list of events. See section 2.1 for event
 95 format.                                            95 format.
 96                                                    96 
 97 3. Defining an event-enabled tracepoint            97 3. Defining an event-enabled tracepoint
 98 =======================================            98 =======================================
 99                                                    99 
100 See The example provided in samples/trace_even    100 See The example provided in samples/trace_events
101                                                   101 
102 4. Event formats                                  102 4. Event formats
103 ================                                  103 ================
104                                                   104 
105 Each trace event has a 'format' file associate    105 Each trace event has a 'format' file associated with it that contains
106 a description of each field in a logged event.    106 a description of each field in a logged event.  This information can
107 be used to parse the binary trace stream, and     107 be used to parse the binary trace stream, and is also the place to
108 find the field names that can be used in event    108 find the field names that can be used in event filters (see section 5).
109                                                   109 
110 It also displays the format string that will b    110 It also displays the format string that will be used to print the
111 event in text mode, along with the event name     111 event in text mode, along with the event name and ID used for
112 profiling.                                        112 profiling.
113                                                   113 
114 Every event has a set of ``common`` fields ass    114 Every event has a set of ``common`` fields associated with it; these are
115 the fields prefixed with ``common_``.  The oth    115 the fields prefixed with ``common_``.  The other fields vary between
116 events and correspond to the fields defined in    116 events and correspond to the fields defined in the TRACE_EVENT
117 definition for that event.                        117 definition for that event.
118                                                   118 
119 Each field in the format has the form::           119 Each field in the format has the form::
120                                                   120 
121      field:field-type field-name; offset:N; si    121      field:field-type field-name; offset:N; size:N;
122                                                   122 
123 where offset is the offset of the field in the    123 where offset is the offset of the field in the trace record and size
124 is the size of the data item, in bytes.           124 is the size of the data item, in bytes.
125                                                   125 
126 For example, here's the information displayed     126 For example, here's the information displayed for the 'sched_wakeup'
127 event::                                           127 event::
128                                                   128 
129         # cat /sys/kernel/tracing/events/sched !! 129         # cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/format
130                                                   130 
131         name: sched_wakeup                        131         name: sched_wakeup
132         ID: 60                                    132         ID: 60
133         format:                                   133         format:
134                 field:unsigned short common_ty    134                 field:unsigned short common_type;       offset:0;       size:2;
135                 field:unsigned char common_fla    135                 field:unsigned char common_flags;       offset:2;       size:1;
136                 field:unsigned char common_pre    136                 field:unsigned char common_preempt_count;       offset:3;       size:1;
137                 field:int common_pid;   offset    137                 field:int common_pid;   offset:4;       size:4;
138                 field:int common_tgid;  offset    138                 field:int common_tgid;  offset:8;       size:4;
139                                                   139 
140                 field:char comm[TASK_COMM_LEN]    140                 field:char comm[TASK_COMM_LEN]; offset:12;      size:16;
141                 field:pid_t pid;        offset    141                 field:pid_t pid;        offset:28;      size:4;
142                 field:int prio; offset:32;        142                 field:int prio; offset:32;      size:4;
143                 field:int success;      offset    143                 field:int success;      offset:36;      size:4;
144                 field:int cpu;  offset:40;        144                 field:int cpu;  offset:40;      size:4;
145                                                   145 
146         print fmt: "task %s:%d [%d] success=%d    146         print fmt: "task %s:%d [%d] success=%d [%03d]", REC->comm, REC->pid,
147                    REC->prio, REC->success, RE    147                    REC->prio, REC->success, REC->cpu
148                                                   148 
149 This event contains 10 fields, the first 5 com    149 This event contains 10 fields, the first 5 common and the remaining 5
150 event-specific.  All the fields for this event    150 event-specific.  All the fields for this event are numeric, except for
151 'comm' which is a string, a distinction import    151 'comm' which is a string, a distinction important for event filtering.
152                                                   152 
153 5. Event filtering                                153 5. Event filtering
154 ==================                                154 ==================
155                                                   155 
156 Trace events can be filtered in the kernel by     156 Trace events can be filtered in the kernel by associating boolean
157 'filter expressions' with them.  As soon as an    157 'filter expressions' with them.  As soon as an event is logged into
158 the trace buffer, its fields are checked again    158 the trace buffer, its fields are checked against the filter expression
159 associated with that event type.  An event wit    159 associated with that event type.  An event with field values that
160 'match' the filter will appear in the trace ou    160 'match' the filter will appear in the trace output, and an event whose
161 values don't match will be discarded.  An even    161 values don't match will be discarded.  An event with no filter
162 associated with it matches everything, and is     162 associated with it matches everything, and is the default when no
163 filter has been set for an event.                 163 filter has been set for an event.
164                                                   164 
165 5.1 Expression syntax                             165 5.1 Expression syntax
166 ---------------------                             166 ---------------------
167                                                   167 
168 A filter expression consists of one or more 'p    168 A filter expression consists of one or more 'predicates' that can be
169 combined using the logical operators '&&' and     169 combined using the logical operators '&&' and '||'.  A predicate is
170 simply a clause that compares the value of a f    170 simply a clause that compares the value of a field contained within a
171 logged event with a constant value and returns    171 logged event with a constant value and returns either 0 or 1 depending
172 on whether the field value matched (1) or didn    172 on whether the field value matched (1) or didn't match (0)::
173                                                   173 
174           field-name relational-operator value    174           field-name relational-operator value
175                                                   175 
176 Parentheses can be used to provide arbitrary l    176 Parentheses can be used to provide arbitrary logical groupings and
177 double-quotes can be used to prevent the shell    177 double-quotes can be used to prevent the shell from interpreting
178 operators as shell metacharacters.                178 operators as shell metacharacters.
179                                                   179 
180 The field-names available for use in filters c    180 The field-names available for use in filters can be found in the
181 'format' files for trace events (see section 4    181 'format' files for trace events (see section 4).
182                                                   182 
183 The relational-operators depend on the type of    183 The relational-operators depend on the type of the field being tested:
184                                                   184 
185 The operators available for numeric fields are    185 The operators available for numeric fields are:
186                                                   186 
187 ==, !=, <, <=, >, >=, &                           187 ==, !=, <, <=, >, >=, &
188                                                   188 
189 And for string fields they are:                   189 And for string fields they are:
190                                                   190 
191 ==, !=, ~                                         191 ==, !=, ~
192                                                   192 
193 The glob (~) accepts a wild card character (\*    193 The glob (~) accepts a wild card character (\*,?) and character classes
194 ([). For example::                                194 ([). For example::
195                                                   195 
196   prev_comm ~ "*sh"                               196   prev_comm ~ "*sh"
197   prev_comm ~ "sh*"                               197   prev_comm ~ "sh*"
198   prev_comm ~ "*sh*"                              198   prev_comm ~ "*sh*"
199   prev_comm ~ "ba*sh"                             199   prev_comm ~ "ba*sh"
200                                                   200 
201 If the field is a pointer that points into use << 
202 "filename" from sys_enter_openat), then you ha << 
203 field name::                                   << 
204                                                << 
205   filename.ustring ~ "password"                << 
206                                                << 
207 As the kernel will have to know how to retriev << 
208 is at from user space.                         << 
209                                                << 
210 You can convert any long type to a function ad << 
211                                                << 
212   call_site.function == security_prepare_creds << 
213                                                << 
214 The above will filter when the field "call_sit << 
215 "security_prepare_creds". That is, it will com << 
216 the filter will return true if it is greater t << 
217 the function "security_prepare_creds" and less << 
218                                                << 
219 The ".function" postfix can only be attached t << 
220 be compared with "==" or "!=".                 << 
221                                                << 
222 Cpumask fields or scalar fields that encode a  << 
223 a user-provided cpumask in cpulist format. The << 
224                                                << 
225   CPUS{$cpulist}                               << 
226                                                << 
227 Operators available to cpumask filtering are:  << 
228                                                << 
229 & (intersection), ==, !=                       << 
230                                                << 
231 For example, this will filter events that have << 
232 in the given cpumask::                         << 
233                                                << 
234   target_cpu & CPUS{17-42}                     << 
235                                                << 
236 5.2 Setting filters                               201 5.2 Setting filters
237 -------------------                               202 -------------------
238                                                   203 
239 A filter for an individual event is set by wri    204 A filter for an individual event is set by writing a filter expression
240 to the 'filter' file for the given event.         205 to the 'filter' file for the given event.
241                                                   206 
242 For example::                                     207 For example::
243                                                   208 
244         # cd /sys/kernel/tracing/events/sched/ !! 209         # cd /sys/kernel/debug/tracing/events/sched/sched_wakeup
245         # echo "common_preempt_count > 4" > fi    210         # echo "common_preempt_count > 4" > filter
246                                                   211 
247 A slightly more involved example::                212 A slightly more involved example::
248                                                   213 
249         # cd /sys/kernel/tracing/events/signal !! 214         # cd /sys/kernel/debug/tracing/events/signal/signal_generate
250         # echo "((sig >= 10 && sig < 15) || si    215         # echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter
251                                                   216 
252 If there is an error in the expression, you'll    217 If there is an error in the expression, you'll get an 'Invalid
253 argument' error when setting it, and the erron    218 argument' error when setting it, and the erroneous string along with
254 an error message can be seen by looking at the    219 an error message can be seen by looking at the filter e.g.::
255                                                   220 
256         # cd /sys/kernel/tracing/events/signal !! 221         # cd /sys/kernel/debug/tracing/events/signal/signal_generate
257         # echo "((sig >= 10 && sig < 15) || ds    222         # echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter
258         -bash: echo: write error: Invalid argu    223         -bash: echo: write error: Invalid argument
259         # cat filter                              224         # cat filter
260         ((sig >= 10 && sig < 15) || dsig == 17    225         ((sig >= 10 && sig < 15) || dsig == 17) && comm != bash
261         ^                                         226         ^
262         parse_error: Field not found              227         parse_error: Field not found
263                                                   228 
264 Currently the caret ('^') for an error always     229 Currently the caret ('^') for an error always appears at the beginning of
265 the filter string; the error message should st    230 the filter string; the error message should still be useful though
266 even without more accurate position info.         231 even without more accurate position info.
267                                                   232 
268 5.2.1 Filter limitations                       << 
269 ------------------------                       << 
270                                                << 
271 If a filter is placed on a string pointer ``(c << 
272 to a string on the ring buffer, but instead po << 
273 memory, then, for safety reasons, at most 1024 << 
274 copied onto a temporary buffer to do the compa << 
275 faults (the pointer points to memory that shou << 
276 string compare will be treated as not matching << 
277                                                << 
278 5.3 Clearing filters                              233 5.3 Clearing filters
279 --------------------                              234 --------------------
280                                                   235 
281 To clear the filter for an event, write a '0'     236 To clear the filter for an event, write a '0' to the event's filter
282 file.                                             237 file.
283                                                   238 
284 To clear the filters for all events in a subsy    239 To clear the filters for all events in a subsystem, write a '0' to the
285 subsystem's filter file.                          240 subsystem's filter file.
286                                                   241 
287 5.4 Subsystem filters                          !! 242 5.3 Subsystem filters
288 ---------------------                             243 ---------------------
289                                                   244 
290 For convenience, filters for every event in a     245 For convenience, filters for every event in a subsystem can be set or
291 cleared as a group by writing a filter express    246 cleared as a group by writing a filter expression into the filter file
292 at the root of the subsystem.  Note however, t    247 at the root of the subsystem.  Note however, that if a filter for any
293 event within the subsystem lacks a field speci    248 event within the subsystem lacks a field specified in the subsystem
294 filter, or if the filter can't be applied for     249 filter, or if the filter can't be applied for any other reason, the
295 filter for that event will retain its previous    250 filter for that event will retain its previous setting.  This can
296 result in an unintended mixture of filters whi    251 result in an unintended mixture of filters which could lead to
297 confusing (to the user who might think differe    252 confusing (to the user who might think different filters are in
298 effect) trace output.  Only filters that refer    253 effect) trace output.  Only filters that reference just the common
299 fields can be guaranteed to propagate successf    254 fields can be guaranteed to propagate successfully to all events.
300                                                   255 
301 Here are a few subsystem filter examples that     256 Here are a few subsystem filter examples that also illustrate the
302 above points:                                     257 above points:
303                                                   258 
304 Clear the filters on all events in the sched s    259 Clear the filters on all events in the sched subsystem::
305                                                   260 
306         # cd /sys/kernel/tracing/events/sched  !! 261         # cd /sys/kernel/debug/tracing/events/sched
307         # echo 0 > filter                         262         # echo 0 > filter
308         # cat sched_switch/filter                 263         # cat sched_switch/filter
309         none                                      264         none
310         # cat sched_wakeup/filter                 265         # cat sched_wakeup/filter
311         none                                      266         none
312                                                   267 
313 Set a filter using only common fields for all     268 Set a filter using only common fields for all events in the sched
314 subsystem (all events end up with the same fil    269 subsystem (all events end up with the same filter)::
315                                                   270 
316         # cd /sys/kernel/tracing/events/sched  !! 271         # cd /sys/kernel/debug/tracing/events/sched
317         # echo common_pid == 0 > filter           272         # echo common_pid == 0 > filter
318         # cat sched_switch/filter                 273         # cat sched_switch/filter
319         common_pid == 0                           274         common_pid == 0
320         # cat sched_wakeup/filter                 275         # cat sched_wakeup/filter
321         common_pid == 0                           276         common_pid == 0
322                                                   277 
323 Attempt to set a filter using a non-common fie    278 Attempt to set a filter using a non-common field for all events in the
324 sched subsystem (all events but those that hav    279 sched subsystem (all events but those that have a prev_pid field retain
325 their old filters)::                              280 their old filters)::
326                                                   281 
327         # cd /sys/kernel/tracing/events/sched  !! 282         # cd /sys/kernel/debug/tracing/events/sched
328         # echo prev_pid == 0 > filter             283         # echo prev_pid == 0 > filter
329         # cat sched_switch/filter                 284         # cat sched_switch/filter
330         prev_pid == 0                             285         prev_pid == 0
331         # cat sched_wakeup/filter                 286         # cat sched_wakeup/filter
332         common_pid == 0                           287         common_pid == 0
333                                                   288 
334 5.5 PID filtering                              !! 289 5.4 PID filtering
335 -----------------                                 290 -----------------
336                                                   291 
337 The set_event_pid file in the same directory a    292 The set_event_pid file in the same directory as the top events directory
338 exists, will filter all events from tracing an    293 exists, will filter all events from tracing any task that does not have the
339 PID listed in the set_event_pid file.             294 PID listed in the set_event_pid file.
340 ::                                                295 ::
341                                                   296 
342         # cd /sys/kernel/tracing               !! 297         # cd /sys/kernel/debug/tracing
343         # echo $$ > set_event_pid                 298         # echo $$ > set_event_pid
344         # echo 1 > events/enable                  299         # echo 1 > events/enable
345                                                   300 
346 Will only trace events for the current task.      301 Will only trace events for the current task.
347                                                   302 
348 To add more PIDs without losing the PIDs alrea    303 To add more PIDs without losing the PIDs already included, use '>>'.
349 ::                                                304 ::
350                                                   305 
351         # echo 123 244 1 >> set_event_pid         306         # echo 123 244 1 >> set_event_pid
352                                                   307 
353                                                   308 
354 6. Event triggers                                 309 6. Event triggers
355 =================                                 310 =================
356                                                   311 
357 Trace events can be made to conditionally invo    312 Trace events can be made to conditionally invoke trigger 'commands'
358 which can take various forms and are described    313 which can take various forms and are described in detail below;
359 examples would be enabling or disabling other     314 examples would be enabling or disabling other trace events or invoking
360 a stack trace whenever the trace event is hit.    315 a stack trace whenever the trace event is hit.  Whenever a trace event
361 with attached triggers is invoked, the set of     316 with attached triggers is invoked, the set of trigger commands
362 associated with that event is invoked.  Any gi    317 associated with that event is invoked.  Any given trigger can
363 additionally have an event filter of the same     318 additionally have an event filter of the same form as described in
364 section 5 (Event filtering) associated with it    319 section 5 (Event filtering) associated with it - the command will only
365 be invoked if the event being invoked passes t    320 be invoked if the event being invoked passes the associated filter.
366 If no filter is associated with the trigger, i    321 If no filter is associated with the trigger, it always passes.
367                                                   322 
368 Triggers are added to and removed from a parti    323 Triggers are added to and removed from a particular event by writing
369 trigger expressions to the 'trigger' file for     324 trigger expressions to the 'trigger' file for the given event.
370                                                   325 
371 A given event can have any number of triggers     326 A given event can have any number of triggers associated with it,
372 subject to any restrictions that individual co    327 subject to any restrictions that individual commands may have in that
373 regard.                                           328 regard.
374                                                   329 
375 Event triggers are implemented on top of "soft    330 Event triggers are implemented on top of "soft" mode, which means that
376 whenever a trace event has one or more trigger    331 whenever a trace event has one or more triggers associated with it,
377 the event is activated even if it isn't actual    332 the event is activated even if it isn't actually enabled, but is
378 disabled in a "soft" mode.  That is, the trace    333 disabled in a "soft" mode.  That is, the tracepoint will be called,
379 but just will not be traced, unless of course     334 but just will not be traced, unless of course it's actually enabled.
380 This scheme allows triggers to be invoked even    335 This scheme allows triggers to be invoked even for events that aren't
381 enabled, and also allows the current event fil    336 enabled, and also allows the current event filter implementation to be
382 used for conditionally invoking triggers.         337 used for conditionally invoking triggers.
383                                                   338 
384 The syntax for event triggers is roughly based    339 The syntax for event triggers is roughly based on the syntax for
385 set_ftrace_filter 'ftrace filter commands' (se    340 set_ftrace_filter 'ftrace filter commands' (see the 'Filter commands'
386 section of Documentation/trace/ftrace.rst), bu    341 section of Documentation/trace/ftrace.rst), but there are major
387 differences and the implementation isn't curre    342 differences and the implementation isn't currently tied to it in any
388 way, so beware about making generalizations be    343 way, so beware about making generalizations between the two.
389                                                   344 
390 .. Note::                                      !! 345 Note: Writing into trace_marker (See Documentation/trace/ftrace.rst)
391      Writing into trace_marker (See Documentat << 
392      can also enable triggers that are written    346      can also enable triggers that are written into
393      /sys/kernel/tracing/events/ftrace/print/t    347      /sys/kernel/tracing/events/ftrace/print/trigger
394                                                   348 
395 6.1 Expression syntax                             349 6.1 Expression syntax
396 ---------------------                             350 ---------------------
397                                                   351 
398 Triggers are added by echoing the command to t    352 Triggers are added by echoing the command to the 'trigger' file::
399                                                   353 
400   # echo 'command[:count] [if filter]' > trigg    354   # echo 'command[:count] [if filter]' > trigger
401                                                   355 
402 Triggers are removed by echoing the same comma    356 Triggers are removed by echoing the same command but starting with '!'
403 to the 'trigger' file::                           357 to the 'trigger' file::
404                                                   358 
405   # echo '!command[:count] [if filter]' > trig    359   # echo '!command[:count] [if filter]' > trigger
406                                                   360 
407 The [if filter] part isn't used in matching co    361 The [if filter] part isn't used in matching commands when removing, so
408 leaving that off in a '!' command will accompl    362 leaving that off in a '!' command will accomplish the same thing as
409 having it in.                                     363 having it in.
410                                                   364 
411 The filter syntax is the same as that describe    365 The filter syntax is the same as that described in the 'Event
412 filtering' section above.                         366 filtering' section above.
413                                                   367 
414 For ease of use, writing to the trigger file u    368 For ease of use, writing to the trigger file using '>' currently just
415 adds or removes a single trigger and there's n    369 adds or removes a single trigger and there's no explicit '>>' support
416 ('>' actually behaves like '>>') or truncation    370 ('>' actually behaves like '>>') or truncation support to remove all
417 triggers (you have to use '!' for each one add    371 triggers (you have to use '!' for each one added.)
418                                                   372 
419 6.2 Supported trigger commands                    373 6.2 Supported trigger commands
420 ------------------------------                    374 ------------------------------
421                                                   375 
422 The following commands are supported:             376 The following commands are supported:
423                                                   377 
424 - enable_event/disable_event                      378 - enable_event/disable_event
425                                                   379 
426   These commands can enable or disable another    380   These commands can enable or disable another trace event whenever
427   the triggering event is hit.  When these com    381   the triggering event is hit.  When these commands are registered,
428   the other trace event is activated, but disa    382   the other trace event is activated, but disabled in a "soft" mode.
429   That is, the tracepoint will be called, but     383   That is, the tracepoint will be called, but just will not be traced.
430   The event tracepoint stays in this mode as l    384   The event tracepoint stays in this mode as long as there's a trigger
431   in effect that can trigger it.                  385   in effect that can trigger it.
432                                                   386 
433   For example, the following trigger causes km    387   For example, the following trigger causes kmalloc events to be
434   traced when a read system call is entered, a    388   traced when a read system call is entered, and the :1 at the end
435   specifies that this enablement happens only     389   specifies that this enablement happens only once::
436                                                   390 
437           # echo 'enable_event:kmem:kmalloc:1'    391           # echo 'enable_event:kmem:kmalloc:1' > \
438               /sys/kernel/tracing/events/sysca !! 392               /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
439                                                   393 
440   The following trigger causes kmalloc events     394   The following trigger causes kmalloc events to stop being traced
441   when a read system call exits.  This disable    395   when a read system call exits.  This disablement happens on every
442   read system call exit::                         396   read system call exit::
443                                                   397 
444           # echo 'disable_event:kmem:kmalloc'     398           # echo 'disable_event:kmem:kmalloc' > \
445               /sys/kernel/tracing/events/sysca !! 399               /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
446                                                   400 
447   The format is::                                 401   The format is::
448                                                   402 
449       enable_event:<system>:<event>[:count]       403       enable_event:<system>:<event>[:count]
450       disable_event:<system>:<event>[:count]      404       disable_event:<system>:<event>[:count]
451                                                   405 
452   To remove the above commands::                  406   To remove the above commands::
453                                                   407 
454           # echo '!enable_event:kmem:kmalloc:1    408           # echo '!enable_event:kmem:kmalloc:1' > \
455               /sys/kernel/tracing/events/sysca !! 409               /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
456                                                   410 
457           # echo '!disable_event:kmem:kmalloc'    411           # echo '!disable_event:kmem:kmalloc' > \
458               /sys/kernel/tracing/events/sysca !! 412               /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
459                                                   413 
460   Note that there can be any number of enable/    414   Note that there can be any number of enable/disable_event triggers
461   per triggering event, but there can only be     415   per triggering event, but there can only be one trigger per
462   triggered event. e.g. sys_enter_read can hav    416   triggered event. e.g. sys_enter_read can have triggers enabling both
463   kmem:kmalloc and sched:sched_switch, but can    417   kmem:kmalloc and sched:sched_switch, but can't have two kmem:kmalloc
464   versions such as kmem:kmalloc and kmem:kmall    418   versions such as kmem:kmalloc and kmem:kmalloc:1 or 'kmem:kmalloc if
465   bytes_req == 256' and 'kmem:kmalloc if bytes    419   bytes_req == 256' and 'kmem:kmalloc if bytes_alloc == 256' (they
466   could be combined into a single filter on km    420   could be combined into a single filter on kmem:kmalloc though).
467                                                   421 
468 - stacktrace                                      422 - stacktrace
469                                                   423 
470   This command dumps a stacktrace in the trace    424   This command dumps a stacktrace in the trace buffer whenever the
471   triggering event occurs.                        425   triggering event occurs.
472                                                   426 
473   For example, the following trigger dumps a s    427   For example, the following trigger dumps a stacktrace every time the
474   kmalloc tracepoint is hit::                     428   kmalloc tracepoint is hit::
475                                                   429 
476           # echo 'stacktrace' > \                 430           # echo 'stacktrace' > \
477                 /sys/kernel/tracing/events/kme !! 431                 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
478                                                   432 
479   The following trigger dumps a stacktrace the    433   The following trigger dumps a stacktrace the first 5 times a kmalloc
480   request happens with a size >= 64K::            434   request happens with a size >= 64K::
481                                                   435 
482           # echo 'stacktrace:5 if bytes_req >=    436           # echo 'stacktrace:5 if bytes_req >= 65536' > \
483                 /sys/kernel/tracing/events/kme !! 437                 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
484                                                   438 
485   The format is::                                 439   The format is::
486                                                   440 
487       stacktrace[:count]                          441       stacktrace[:count]
488                                                   442 
489   To remove the above commands::                  443   To remove the above commands::
490                                                   444 
491           # echo '!stacktrace' > \                445           # echo '!stacktrace' > \
492                 /sys/kernel/tracing/events/kme !! 446                 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
493                                                   447 
494           # echo '!stacktrace:5 if bytes_req >    448           # echo '!stacktrace:5 if bytes_req >= 65536' > \
495                 /sys/kernel/tracing/events/kme !! 449                 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
496                                                   450 
497   The latter can also be removed more simply b    451   The latter can also be removed more simply by the following (without
498   the filter)::                                   452   the filter)::
499                                                   453 
500           # echo '!stacktrace:5' > \              454           # echo '!stacktrace:5' > \
501                 /sys/kernel/tracing/events/kme !! 455                 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
502                                                   456 
503   Note that there can be only one stacktrace t    457   Note that there can be only one stacktrace trigger per triggering
504   event.                                          458   event.
505                                                   459 
506 - snapshot                                        460 - snapshot
507                                                   461 
508   This command causes a snapshot to be trigger    462   This command causes a snapshot to be triggered whenever the
509   triggering event occurs.                        463   triggering event occurs.
510                                                   464 
511   The following command creates a snapshot eve    465   The following command creates a snapshot every time a block request
512   queue is unplugged with a depth > 1.  If you    466   queue is unplugged with a depth > 1.  If you were tracing a set of
513   events or functions at the time, the snapsho    467   events or functions at the time, the snapshot trace buffer would
514   capture those events when the trigger event     468   capture those events when the trigger event occurred::
515                                                   469 
516           # echo 'snapshot if nr_rq > 1' > \      470           # echo 'snapshot if nr_rq > 1' > \
517                 /sys/kernel/tracing/events/blo !! 471                 /sys/kernel/debug/tracing/events/block/block_unplug/trigger
518                                                   472 
519   To only snapshot once::                         473   To only snapshot once::
520                                                   474 
521           # echo 'snapshot:1 if nr_rq > 1' > \    475           # echo 'snapshot:1 if nr_rq > 1' > \
522                 /sys/kernel/tracing/events/blo !! 476                 /sys/kernel/debug/tracing/events/block/block_unplug/trigger
523                                                   477 
524   To remove the above commands::                  478   To remove the above commands::
525                                                   479 
526           # echo '!snapshot if nr_rq > 1' > \     480           # echo '!snapshot if nr_rq > 1' > \
527                 /sys/kernel/tracing/events/blo !! 481                 /sys/kernel/debug/tracing/events/block/block_unplug/trigger
528                                                   482 
529           # echo '!snapshot:1 if nr_rq > 1' >     483           # echo '!snapshot:1 if nr_rq > 1' > \
530                 /sys/kernel/tracing/events/blo !! 484                 /sys/kernel/debug/tracing/events/block/block_unplug/trigger
531                                                   485 
532   Note that there can be only one snapshot tri    486   Note that there can be only one snapshot trigger per triggering
533   event.                                          487   event.
534                                                   488 
535 - traceon/traceoff                                489 - traceon/traceoff
536                                                   490 
537   These commands turn tracing on and off when     491   These commands turn tracing on and off when the specified events are
538   hit. The parameter determines how many times    492   hit. The parameter determines how many times the tracing system is
539   turned on and off. If unspecified, there is     493   turned on and off. If unspecified, there is no limit.
540                                                   494 
541   The following command turns tracing off the     495   The following command turns tracing off the first time a block
542   request queue is unplugged with a depth > 1.    496   request queue is unplugged with a depth > 1.  If you were tracing a
543   set of events or functions at the time, you     497   set of events or functions at the time, you could then examine the
544   trace buffer to see the sequence of events t    498   trace buffer to see the sequence of events that led up to the
545   trigger event::                                 499   trigger event::
546                                                   500 
547           # echo 'traceoff:1 if nr_rq > 1' > \    501           # echo 'traceoff:1 if nr_rq > 1' > \
548                 /sys/kernel/tracing/events/blo !! 502                 /sys/kernel/debug/tracing/events/block/block_unplug/trigger
549                                                   503 
550   To always disable tracing when nr_rq  > 1::     504   To always disable tracing when nr_rq  > 1::
551                                                   505 
552           # echo 'traceoff if nr_rq > 1' > \      506           # echo 'traceoff if nr_rq > 1' > \
553                 /sys/kernel/tracing/events/blo !! 507                 /sys/kernel/debug/tracing/events/block/block_unplug/trigger
554                                                   508 
555   To remove the above commands::                  509   To remove the above commands::
556                                                   510 
557           # echo '!traceoff:1 if nr_rq > 1' >     511           # echo '!traceoff:1 if nr_rq > 1' > \
558                 /sys/kernel/tracing/events/blo !! 512                 /sys/kernel/debug/tracing/events/block/block_unplug/trigger
559                                                   513 
560           # echo '!traceoff if nr_rq > 1' > \     514           # echo '!traceoff if nr_rq > 1' > \
561                 /sys/kernel/tracing/events/blo !! 515                 /sys/kernel/debug/tracing/events/block/block_unplug/trigger
562                                                   516 
563   Note that there can be only one traceon or t    517   Note that there can be only one traceon or traceoff trigger per
564   triggering event.                               518   triggering event.
565                                                   519 
566 - hist                                            520 - hist
567                                                   521 
568   This command aggregates event hits into a ha    522   This command aggregates event hits into a hash table keyed on one or
569   more trace event format fields (or stacktrac    523   more trace event format fields (or stacktrace) and a set of running
570   totals derived from one or more trace event     524   totals derived from one or more trace event format fields and/or
571   event counts (hitcount).                        525   event counts (hitcount).
572                                                   526 
573   See Documentation/trace/histogram.rst for de    527   See Documentation/trace/histogram.rst for details and examples.
574                                                << 
575 7. In-kernel trace event API                   << 
576 ============================                   << 
577                                                << 
578 In most cases, the command-line interface to t << 
579 sufficient.  Sometimes, however, applications  << 
580 more complex relationships than can be express << 
581 series of linked command-line expressions, or  << 
582 commands may be simply too cumbersome.  An exa << 
583 application that needs to 'listen' to the trac << 
584 maintain an in-kernel state machine detecting, << 
585 illegal kernel state occurs in the scheduler.  << 
586                                                << 
587 The trace event subsystem provides an in-kerne << 
588 or other kernel code to generate user-defined  << 
589 will, which can be used to either augment the  << 
590 and/or signal that a particular important stat << 
591                                                << 
592 A similar in-kernel API is also available for  << 
593 kretprobe events.                              << 
594                                                << 
595 Both the synthetic event and k/ret/probe event << 
596 of a lower-level "dynevent_cmd" event command  << 
597 available for more specialized applications, o << 
598 higher-level trace event APIs.                 << 
599                                                << 
600 The API provided for these purposes is describ << 
601 following:                                     << 
602                                                << 
603   - dynamically creating synthetic event defin << 
604   - dynamically creating kprobe and kretprobe  << 
605   - tracing synthetic events from in-kernel co << 
606   - the low-level "dynevent_cmd" API           << 
607                                                << 
608 7.1 Dyamically creating synthetic event defini << 
609 ---------------------------------------------- << 
610                                                << 
611 There are a couple ways to create a new synthe << 
612 module or other kernel code.                   << 
613                                                << 
614 The first creates the event in one step, using << 
615 In this method, the name of the event to creat << 
616 the fields is supplied to synth_event_create() << 
617 synthetic event with that name and fields will << 
618 call.  For example, to create a new "schedtest << 
619                                                << 
620   ret = synth_event_create("schedtest", sched_ << 
621                            ARRAY_SIZE(sched_fi << 
622                                                << 
623 The sched_fields param in this example points  << 
624 synth_field_desc, each of which describes an e << 
625 name::                                         << 
626                                                << 
627   static struct synth_field_desc sched_fields[ << 
628         { .type = "pid_t",              .name  << 
629         { .type = "char[16]",           .name  << 
630         { .type = "u64",                .name  << 
631         { .type = "u64",                .name  << 
632         { .type = "unsigned int",       .name  << 
633         { .type = "char[64]",           .name  << 
634         { .type = "int",                .name  << 
635   };                                           << 
636                                                << 
637 See synth_field_size() for available types.    << 
638                                                << 
639 If field_name contains [n], the field is consi << 
640                                                << 
641 If field_names contains[] (no subscript), the  << 
642 be a dynamic array, which will only take as mu << 
643 is required to hold the array.                 << 
644                                                << 
645 Because space for an event is reserved before  << 
646 to the event, using dynamic arrays implies tha << 
647 in-kernel API described below can't be used wi << 
648 other non-piecewise in-kernel APIs can, howeve << 
649 arrays.                                        << 
650                                                << 
651 If the event is created from within a module,  << 
652 must be passed to synth_event_create().  This  << 
653 trace buffer won't contain unreadable events w << 
654 removed.                                       << 
655                                                << 
656 At this point, the event object is ready to be << 
657 events.                                        << 
658                                                << 
659 In the second method, the event is created in  << 
660 allows events to be created dynamically and wi << 
661 and populate an array of fields beforehand.    << 
662                                                << 
663 To use this method, an empty or partially empt << 
664 first be created using synth_event_gen_cmd_sta << 
665 synth_event_gen_cmd_array_start().  For synth_ << 
666 the name of the event along with one or more p << 
667 representing a 'type field_name;' field specif << 
668 supplied.  For synth_event_gen_cmd_array_start << 
669 event along with an array of struct synth_fiel << 
670 supplied. Before calling synth_event_gen_cmd_s << 
671 synth_event_gen_cmd_array_start(), the user sh << 
672 initialize a dynevent_cmd object using synth_e << 
673                                                << 
674 For example, to create a new "schedtest" synth << 
675 fields::                                       << 
676                                                << 
677   struct dynevent_cmd cmd;                     << 
678   char *buf;                                   << 
679                                                << 
680   /* Create a buffer to hold the generated com << 
681   buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERN << 
682                                                << 
683   /* Before generating the command, initialize << 
684   synth_event_cmd_init(&cmd, buf, MAX_DYNEVENT << 
685                                                << 
686   ret = synth_event_gen_cmd_start(&cmd, "sched << 
687                                   "pid_t", "ne << 
688                                   "u64", "ts_n << 
689                                                << 
690 Alternatively, using an array of struct synth_ << 
691 containing the same information::              << 
692                                                << 
693   ret = synth_event_gen_cmd_array_start(&cmd,  << 
694                                         fields << 
695                                                << 
696 Once the synthetic event object has been creat << 
697 populated with more fields.  Fields are added  << 
698 synth_event_add_field(), supplying the dyneven << 
699 type, and a field name.  For example, to add a << 
700 "intfield", the following call should be made: << 
701                                                << 
702   ret = synth_event_add_field(&cmd, "int", "in << 
703                                                << 
704 See synth_field_size() for available types. If << 
705 the field is considered to be an array.        << 
706                                                << 
707 A group of fields can also be added all at onc << 
708 synth_field_desc with add_synth_fields().  For << 
709 just the first four sched_fields::             << 
710                                                << 
711   ret = synth_event_add_fields(&cmd, sched_fie << 
712                                                << 
713 If you already have a string of the form 'type << 
714 synth_event_add_field_str() can be used to add << 
715 also automatically append a ';' to the string. << 
716                                                << 
717 Once all the fields have been added, the event << 
718 registered by calling the synth_event_gen_cmd_ << 
719                                                << 
720   ret = synth_event_gen_cmd_end(&cmd);         << 
721                                                << 
722 At this point, the event object is ready to be << 
723 events.                                        << 
724                                                << 
725 7.2 Tracing synthetic events from in-kernel co << 
726 ---------------------------------------------- << 
727                                                << 
728 To trace a synthetic event, there are several  << 
729 option is to trace the event in one call, usin << 
730 with a variable number of values, or synth_eve << 
731 array of values to be set.  A second option ca << 
732 need for a pre-formed array of values or list  << 
733 synth_event_trace_start() and synth_event_trac << 
734 synth_event_add_next_val() or synth_event_add_ << 
735 piecewise.                                     << 
736                                                << 
737 7.2.1 Tracing a synthetic event all at once    << 
738 -------------------------------------------    << 
739                                                << 
740 To trace a synthetic event all at once, the sy << 
741 synth_event_trace_array() functions can be use << 
742                                                << 
743 The synth_event_trace() function is passed the << 
744 representing the synthetic event (which can be << 
745 trace_get_event_file() using the synthetic eve << 
746 the system name, and the trace instance name ( << 
747 trace array)), along with an variable number o << 
748 synthetic event field, and the number of value << 
749                                                << 
750 So, to trace an event corresponding to the syn << 
751 above, code like the following could be used:: << 
752                                                << 
753   ret = synth_event_trace(create_synth_test, 7 << 
754                           444,             /*  << 
755                           (u64)"clackers", /*  << 
756                           1000000,         /*  << 
757                           1000,            /*  << 
758                           smp_processor_id(),/ << 
759                           (u64)"Thneed",   /*  << 
760                           999);            /*  << 
761                                                << 
762 All vals should be cast to u64, and string val << 
763 strings, cast to u64.  Strings will be copied  << 
764 the event for the string, using these pointers << 
765                                                << 
766 Alternatively, the synth_event_trace_array() f << 
767 accomplish the same thing.  It is passed the t << 
768 representing the synthetic event (which can be << 
769 trace_get_event_file() using the synthetic eve << 
770 the system name, and the trace instance name ( << 
771 trace array)), along with an array of u64, one << 
772 event field.                                   << 
773                                                << 
774 To trace an event corresponding to the synthet << 
775 above, code like the following could be used:: << 
776                                                << 
777   u64 vals[7];                                 << 
778                                                << 
779   vals[0] = 777;                  /* next_pid_ << 
780   vals[1] = (u64)"tiddlywinks";   /* next_comm << 
781   vals[2] = 1000000;              /* ts_ns */  << 
782   vals[3] = 1000;                 /* ts_ms */  << 
783   vals[4] = smp_processor_id();   /* cpu */    << 
784   vals[5] = (u64)"thneed";        /* my_string << 
785   vals[6] = 398;                  /* my_int_fi << 
786                                                << 
787 The 'vals' array is just an array of u64, the  << 
788 match the number of field in the synthetic eve << 
789 the same order as the synthetic event fields.  << 
790                                                << 
791 All vals should be cast to u64, and string val << 
792 strings, cast to u64.  Strings will be copied  << 
793 the event for the string, using these pointers << 
794                                                << 
795 In order to trace a synthetic event, a pointer << 
796 is needed.  The trace_get_event_file() functio << 
797 it - it will find the file in the given trace  << 
798 NULL since the top trace array is being used)  << 
799 preventing the instance containing it from goi << 
800                                                << 
801        schedtest_event_file = trace_get_event_ << 
802                                                << 
803                                                << 
804 Before tracing the event, it should be enabled << 
805 the synthetic event won't actually show up in  << 
806                                                << 
807 To enable a synthetic event from the kernel, t << 
808 can be used (which is not specific to syntheti << 
809 the "synthetic" system name to be specified ex << 
810                                                << 
811 To enable the event, pass 'true' to it::       << 
812                                                << 
813        trace_array_set_clr_event(schedtest_eve << 
814                                  "synthetic",  << 
815                                                << 
816 To disable it pass false::                     << 
817                                                << 
818        trace_array_set_clr_event(schedtest_eve << 
819                                  "synthetic",  << 
820                                                << 
821 Finally, synth_event_trace_array() can be used << 
822 event, which should be visible in the trace bu << 
823                                                << 
824        ret = synth_event_trace_array(schedtest << 
825                                      ARRAY_SIZ << 
826                                                << 
827 To remove the synthetic event, the event shoul << 
828 trace instance should be 'put' back using trac << 
829                                                << 
830        trace_array_set_clr_event(schedtest_eve << 
831                                  "synthetic",  << 
832        trace_put_event_file(schedtest_event_fi << 
833                                                << 
834 If those have been successful, synth_event_del << 
835 remove the event::                             << 
836                                                << 
837        ret = synth_event_delete("schedtest");  << 
838                                                << 
839 7.2.2 Tracing a synthetic event piecewise      << 
840 -----------------------------------------      << 
841                                                << 
842 To trace a synthetic using the piecewise metho << 
843 synth_event_trace_start() function is used to  << 
844 event trace::                                  << 
845                                                << 
846        struct synth_event_trace_state trace_st << 
847                                                << 
848        ret = synth_event_trace_start(schedtest << 
849                                                << 
850 It's passed the trace_event_file representing  << 
851 using the same methods as described above, alo << 
852 struct synth_event_trace_state object, which w << 
853 used to maintain state between this and follow << 
854                                                << 
855 Once the event has been opened, which means sp << 
856 reserved in the trace buffer, the individual f << 
857 are two ways to do that, either one after anot << 
858 the event, which requires no lookups, or by na << 
859 tradeoff is flexibility in doing the assignmen << 
860 lookup per field.                              << 
861                                                << 
862 To assign the values one after the other witho << 
863 synth_event_add_next_val() should be used.  Ea << 
864 same synth_event_trace_state object used in th << 
865 along with the value to set the next field in  << 
866 field is set, the 'cursor' points to the next  << 
867 by the subsequent call, continuing until all t << 
868 in order.  The same sequence of calls as in th << 
869 this method would be (without error-handling c << 
870                                                << 
871        /* next_pid_field */                    << 
872        ret = synth_event_add_next_val(777, &tr << 
873                                                << 
874        /* next_comm_field */                   << 
875        ret = synth_event_add_next_val((u64)"sl << 
876                                                << 
877        /* ts_ns */                             << 
878        ret = synth_event_add_next_val(1000000, << 
879                                                << 
880        /* ts_ms */                             << 
881        ret = synth_event_add_next_val(1000, &t << 
882                                                << 
883        /* cpu */                               << 
884        ret = synth_event_add_next_val(smp_proc << 
885                                                << 
886        /* my_string_field */                   << 
887        ret = synth_event_add_next_val((u64)"th << 
888                                                << 
889        /* my_int_field */                      << 
890        ret = synth_event_add_next_val(395, &tr << 
891                                                << 
892 To assign the values in any order, synth_event << 
893 used.  Each call is passed the same synth_even << 
894 the synth_event_trace_start(), along with the  << 
895 to set and the value to set it to.  The same s << 
896 the above examples using this method would be  << 
897 code)::                                        << 
898                                                << 
899        ret = synth_event_add_val("next_pid_fie << 
900        ret = synth_event_add_val("next_comm_fi << 
901                                  &trace_state) << 
902        ret = synth_event_add_val("ts_ns", 1000 << 
903        ret = synth_event_add_val("ts_ms", 1000 << 
904        ret = synth_event_add_val("cpu", smp_pr << 
905        ret = synth_event_add_val("my_string_fi << 
906                                  &trace_state) << 
907        ret = synth_event_add_val("my_int_field << 
908                                                << 
909 Note that synth_event_add_next_val() and synth << 
910 incompatible if used within the same trace of  << 
911 can be used but not both at the same time.     << 
912                                                << 
913 Finally, the event won't be actually traced un << 
914 which is done using synth_event_trace_end(), w << 
915 struct synth_event_trace_state object used in  << 
916                                                << 
917        ret = synth_event_trace_end(&trace_stat << 
918                                                << 
919 Note that synth_event_trace_end() must be call << 
920 of whether any of the add calls failed (say du << 
921 being passed in).                              << 
922                                                << 
923 7.3 Dyamically creating kprobe and kretprobe e << 
924 ---------------------------------------------- << 
925                                                << 
926 To create a kprobe or kretprobe trace event fr << 
927 kprobe_event_gen_cmd_start() or kretprobe_even << 
928 functions can be used.                         << 
929                                                << 
930 To create a kprobe event, an empty or partiall << 
931 should first be created using kprobe_event_gen << 
932 of the event and the probe location should be  << 
933 or args each representing a probe field should << 
934 function.  Before calling kprobe_event_gen_cmd << 
935 should create and initialize a dynevent_cmd ob << 
936 kprobe_event_cmd_init().                       << 
937                                                << 
938 For example, to create a new "schedtest" kprob << 
939                                                << 
940   struct dynevent_cmd cmd;                     << 
941   char *buf;                                   << 
942                                                << 
943   /* Create a buffer to hold the generated com << 
944   buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERN << 
945                                                << 
946   /* Before generating the command, initialize << 
947   kprobe_event_cmd_init(&cmd, buf, MAX_DYNEVEN << 
948                                                << 
949   /*                                           << 
950    * Define the gen_kprobe_test event with the << 
951    * fields.                                   << 
952    */                                          << 
953   ret = kprobe_event_gen_cmd_start(&cmd, "gen_ << 
954                                    "dfd=%ax",  << 
955                                                << 
956 Once the kprobe event object has been created, << 
957 populated with more fields.  Fields can be add << 
958 kprobe_event_add_fields(), supplying the dynev << 
959 with a variable arg list of probe fields.  For << 
960 couple additional fields, the following call c << 
961                                                << 
962   ret = kprobe_event_add_fields(&cmd, "flags=% << 
963                                                << 
964 Once all the fields have been added, the event << 
965 registered by calling the kprobe_event_gen_cmd << 
966 kretprobe_event_gen_cmd_end() functions, depen << 
967 or kretprobe command was started::             << 
968                                                << 
969   ret = kprobe_event_gen_cmd_end(&cmd);        << 
970                                                << 
971 or::                                           << 
972                                                << 
973   ret = kretprobe_event_gen_cmd_end(&cmd);     << 
974                                                << 
975 At this point, the event object is ready to be << 
976 events.                                        << 
977                                                << 
978 Similarly, a kretprobe event can be created us << 
979 kretprobe_event_gen_cmd_start() with a probe n << 
980 additional params such as $retval::            << 
981                                                << 
982   ret = kretprobe_event_gen_cmd_start(&cmd, "g << 
983                                       "do_sys_ << 
984                                                << 
985 Similar to the synthetic event case, code like << 
986 used to enable the newly created kprobe event: << 
987                                                << 
988   gen_kprobe_test = trace_get_event_file(NULL, << 
989                                                << 
990   ret = trace_array_set_clr_event(gen_kprobe_t << 
991                                   "kprobes", " << 
992                                                << 
993 Finally, also similar to synthetic events, the << 
994 used to give the kprobe event file back and de << 
995                                                << 
996   trace_put_event_file(gen_kprobe_test);       << 
997                                                << 
998   ret = kprobe_event_delete("gen_kprobe_test") << 
999                                                << 
1000 7.4 The "dynevent_cmd" low-level API          << 
1001 ------------------------------------          << 
1002                                               << 
1003 Both the in-kernel synthetic event and kprobe << 
1004 top of a lower-level "dynevent_cmd" interface << 
1005 meant to provide the basis for higher-level i << 
1006 synthetic and kprobe interfaces, which can be << 
1007                                               << 
1008 The basic idea is simple and amounts to provi << 
1009 layer that can be used to generate trace even << 
1010 generated command strings can then be passed  << 
1011 and event creation code that already exists i << 
1012 subsystem for creating the corresponding trac << 
1013                                               << 
1014 In a nutshell, the way it works is that the h << 
1015 code creates a struct dynevent_cmd object, th << 
1016 functions, dynevent_arg_add() and dynevent_ar << 
1017 a command string, which finally causes the co << 
1018 using the dynevent_create() function.  The de << 
1019 are described below.                          << 
1020                                               << 
1021 The first step in building a new command stri << 
1022 initialize an instance of a dynevent_cmd.  He << 
1023 create a dynevent_cmd on the stack and initia << 
1024                                               << 
1025   struct dynevent_cmd cmd;                    << 
1026   char *buf;                                  << 
1027   int ret;                                    << 
1028                                               << 
1029   buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KER << 
1030                                               << 
1031   dynevent_cmd_init(cmd, buf, maxlen, DYNEVEN << 
1032                     foo_event_run_command);   << 
1033                                               << 
1034 The dynevent_cmd initialization needs to be g << 
1035 buffer and the length of the buffer (MAX_DYNE << 
1036 for this purpose - at 2k it's generally too b << 
1037 on the stack, so is dynamically allocated), a << 
1038 is meant to be used to check that further API << 
1039 correct command type, and a pointer to an eve << 
1040 callback that will be called to actually exec << 
1041 command function.                             << 
1042                                               << 
1043 Once that's done, the command string can by b << 
1044 calls to argument-adding functions.           << 
1045                                               << 
1046 To add a single argument, define and initiali << 
1047 or struct dynevent_arg_pair object.  Here's a << 
1048 possible arg addition, which is simply to app << 
1049 a whitespace-separated argument to the comman << 
1050                                               << 
1051   struct dynevent_arg arg;                    << 
1052                                               << 
1053   dynevent_arg_init(&arg, NULL, 0);           << 
1054                                               << 
1055   arg.str = name;                             << 
1056                                               << 
1057   ret = dynevent_arg_add(cmd, &arg);          << 
1058                                               << 
1059 The arg object is first initialized using dyn << 
1060 this case the parameters are NULL or 0, which << 
1061 optional sanity-checking function or separato << 
1062 the arg.                                      << 
1063                                               << 
1064 Here's another more complicated example using << 
1065 used to create an argument that consists of a << 
1066 together as a unit, for example, a 'type fiel << 
1067 expression arg e.g. 'flags=%cx'::             << 
1068                                               << 
1069   struct dynevent_arg_pair arg_pair;          << 
1070                                               << 
1071   dynevent_arg_pair_init(&arg_pair, dynevent_ << 
1072                                               << 
1073   arg_pair.lhs = type;                        << 
1074   arg_pair.rhs = name;                        << 
1075                                               << 
1076   ret = dynevent_arg_pair_add(cmd, &arg_pair) << 
1077                                               << 
1078 Again, the arg_pair is first initialized, in  << 
1079 function used to check the sanity of the args << 
1080 neither part of the pair is NULL), along with << 
1081 to add an operator between the pair (here non << 
1082 appended onto the end of the arg pair (here ' << 
1083                                               << 
1084 There's also a dynevent_str_add() function th << 
1085 add a string as-is, with no spaces, delimiter << 
1086                                               << 
1087 Any number of dynevent_*_add() calls can be m << 
1088 (until its length surpasses cmd->maxlen).  Wh << 
1089 been added and the command string is complete << 
1090 do is run the command, which happens by simpl << 
1091 dynevent_create()::                           << 
1092                                               << 
1093   ret = dynevent_create(&cmd);                << 
1094                                               << 
1095 At that point, if the return value is 0, the  << 
1096 created and is ready to use.                  << 
1097                                               << 
1098 See the dynevent_cmd function definitions the << 
1099 of the API.                                   << 
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php