~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/trace/user_events.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/trace/user_events.rst (Version linux-6.11.5) and /Documentation/trace/user_events.rst (Version linux-5.19.17)


  1 =========================================           1 =========================================
  2 user_events: User-based Event Tracing               2 user_events: User-based Event Tracing
  3 =========================================           3 =========================================
  4                                                     4 
  5 :Author: Beau Belgrave                              5 :Author: Beau Belgrave
  6                                                     6 
  7 Overview                                            7 Overview
  8 --------                                            8 --------
  9 User based trace events allow user processes t      9 User based trace events allow user processes to create events and trace data
 10 that can be viewed via existing tools, such as     10 that can be viewed via existing tools, such as ftrace and perf.
 11 To enable this feature, build your kernel with     11 To enable this feature, build your kernel with CONFIG_USER_EVENTS=y.
 12                                                    12 
 13 Programs can view status of the events via         13 Programs can view status of the events via
 14 /sys/kernel/tracing/user_events_status and can !!  14 /sys/kernel/debug/tracing/user_events_status and can both register and write
 15 data out via /sys/kernel/tracing/user_events_d !!  15 data out via /sys/kernel/debug/tracing/user_events_data.
 16                                                    16 
 17 Programs can also use /sys/kernel/tracing/dyna !!  17 Programs can also use /sys/kernel/debug/tracing/dynamic_events to register and
 18 delete user based events via the u: prefix. Th     18 delete user based events via the u: prefix. The format of the command to
 19 dynamic_events is the same as the ioctl with t !!  19 dynamic_events is the same as the ioctl with the u: prefix applied.
 20 requires CAP_PERFMON due to the event persisti << 
 21                                                    20 
 22 Typically programs will register a set of even     21 Typically programs will register a set of events that they wish to expose to
 23 tools that can read trace_events (such as ftra     22 tools that can read trace_events (such as ftrace and perf). The registration
 24 process tells the kernel which address and bit !!  23 process gives back two ints to the program for each event. The first int is the
 25 enabled the event and data should be written.  !!  24 status index. This index describes which byte in the
 26 a write index which describes the data when a  !!  25 /sys/kernel/debug/tracing/user_events_status file represents this event. The
 27 on the /sys/kernel/tracing/user_events_data fi !!  26 second int is the write index. This index describes the data when a write() or
                                                   >>  27 writev() is called on the /sys/kernel/debug/tracing/user_events_data file.
 28                                                    28 
 29 The structures referenced in this document are !!  29 The structures referenced in this document are contained with the
 30 /include/uapi/linux/user_events.h file in the  !!  30 /include/uap/linux/user_events.h file in the source tree.
 31                                                    31 
 32 **NOTE:** *Both user_events_status and user_ev     32 **NOTE:** *Both user_events_status and user_events_data are under the tracefs
 33 filesystem and may be mounted at different pat     33 filesystem and may be mounted at different paths than above.*
 34                                                    34 
 35 Registering                                        35 Registering
 36 -----------                                        36 -----------
 37 Registering within a user process is done via      37 Registering within a user process is done via ioctl() out to the
 38 /sys/kernel/tracing/user_events_data file. The !!  38 /sys/kernel/debug/tracing/user_events_data file. The command to issue is
 39 DIAG_IOCSREG.                                      39 DIAG_IOCSREG.
 40                                                    40 
 41 This command takes a packed struct user_reg as !!  41 This command takes a struct user_reg as an argument::
 42                                                    42 
 43   struct user_reg {                                43   struct user_reg {
 44         /* Input: Size of the user_reg structu !!  44         u32 size;
 45         __u32 size;                            !!  45         u64 name_args;
 46                                                !!  46         u32 status_index;
 47         /* Input: Bit in enable address to use !!  47         u32 write_index;
 48         __u8 enable_bit;                       !!  48   };
 49                                                << 
 50         /* Input: Enable size in bytes at addr << 
 51         __u8 enable_size;                      << 
 52                                                    49 
 53         /* Input: Flags to use, if any */      !!  50 The struct user_reg requires two inputs, the first is the size of the structure
 54         __u16 flags;                           !!  51 to ensure forward and backward compatibility. The second is the command string
 55                                                !!  52 to issue for registering. Upon success two outputs are set, the status index
 56         /* Input: Address to update when enabl !!  53 and the write index.
 57         __u64 enable_addr;                     << 
 58                                                << 
 59         /* Input: Pointer to string with event << 
 60         __u64 name_args;                       << 
 61                                                << 
 62         /* Output: Index of the event to use w << 
 63         __u32 write_index;                     << 
 64   } __attribute__((__packed__));               << 
 65                                                << 
 66 The struct user_reg requires all the above inp << 
 67                                                << 
 68 + size: This must be set to sizeof(struct user << 
 69                                                << 
 70 + enable_bit: The bit to reflect the event sta << 
 71   enable_addr.                                 << 
 72                                                << 
 73 + enable_size: The size of the value specified << 
 74   This must be 4 (32-bit) or 8 (64-bit). 64-bi << 
 75   used on 64-bit kernels, however, 32-bit can  << 
 76                                                << 
 77 + flags: The flags to use, if any.             << 
 78   Callers should first attempt to use flags an << 
 79   support for lower versions of the kernel. If << 
 80   is returned.                                 << 
 81                                                << 
 82 + enable_addr: The address of the value to use << 
 83   must be naturally aligned and write accessib << 
 84                                                << 
 85 + name_args: The name and arguments to describ << 
 86   for details.                                 << 
 87                                                << 
 88 The following flags are currently supported.   << 
 89                                                << 
 90 + USER_EVENT_REG_PERSIST: The event will not d << 
 91   closing. Callers may use this if an event sh << 
 92   process closes or unregisters the event. Req << 
 93   -EPERM is returned.                          << 
 94                                                << 
 95 + USER_EVENT_REG_MULTI_FORMAT: The event can c << 
 96   allows programs to prevent themselves from b << 
 97   format changes and they wish to use the same << 
 98   tracepoint name will be in the new format of << 
 99   format of "name". A tracepoint will be creat << 
100   and format. This means if several processes  << 
101   they will use the same tracepoint. If yet an << 
102   but a different format than the other proces << 
103   tracepoint with a new unique id. Recording p << 
104   the various different formats of the event n << 
105   recording. The system name of the tracepoint << 
106   instead of "user_events". This prevents sing << 
107   with any multi-format event names within tra << 
108   a hex string. Recording programs should ensu << 
109   the event name they registered and has a suf << 
110   has hex characters. For example to find all  << 
111   can use the regex "^test\.[0-9a-fA-F]+$".    << 
112                                                << 
113 Upon successful registration the following is  << 
114                                                << 
115 + write_index: The index to use for this file  << 
116   event when writing out data. The index is un << 
117   descriptor that was used for the registratio << 
118                                                    54 
119 User based events show up under tracefs like a     55 User based events show up under tracefs like any other event under the
120 subsystem named "user_events". This means tool     56 subsystem named "user_events". This means tools that wish to attach to the
121 events need to use /sys/kernel/tracing/events/ !!  57 events need to use /sys/kernel/debug/tracing/events/user_events/[name]/enable
122 or perf record -e user_events:[name] when atta     58 or perf record -e user_events:[name] when attaching/recording.
123                                                    59 
124 **NOTE:** The event subsystem name by default  !!  60 **NOTE:** *The write_index returned is only valid for the FD that was used*
125 not assume it will always be "user_events". Op << 
126 future to change the subsystem name per-proces << 
127 In addition if the USER_EVENT_REG_MULTI_FORMAT << 
128 will have a unique id appended to it and the s << 
129 "user_events_multi" as described above.        << 
130                                                    61 
131 Command Format                                     62 Command Format
132 ^^^^^^^^^^^^^^                                     63 ^^^^^^^^^^^^^^
133 The command string format is as follows::          64 The command string format is as follows::
134                                                    65 
135   name[:FLAG1[,FLAG2...]] [Field1[;Field2...]]     66   name[:FLAG1[,FLAG2...]] [Field1[;Field2...]]
136                                                    67 
137 Supported Flags                                    68 Supported Flags
138 ^^^^^^^^^^^^^^^                                    69 ^^^^^^^^^^^^^^^
139 None yet                                           70 None yet
140                                                    71 
141 Field Format                                       72 Field Format
142 ^^^^^^^^^^^^                                       73 ^^^^^^^^^^^^
143 ::                                                 74 ::
144                                                    75 
145   type name [size]                                 76   type name [size]
146                                                    77 
147 Basic types are supported (__data_loc, u32, u6     78 Basic types are supported (__data_loc, u32, u64, int, char, char[20], etc).
148 User programs are encouraged to use clearly si     79 User programs are encouraged to use clearly sized types like u32.
149                                                    80 
150 **NOTE:** *Long is not supported since size ca     81 **NOTE:** *Long is not supported since size can vary between user and kernel.*
151                                                    82 
152 The size is only valid for types that start wi     83 The size is only valid for types that start with a struct prefix.
153 This allows user programs to describe custom s     84 This allows user programs to describe custom structs out to tools, if required.
154                                                    85 
155 For example, a struct in C that looks like thi     86 For example, a struct in C that looks like this::
156                                                    87 
157   struct mytype {                                  88   struct mytype {
158     char data[20];                                 89     char data[20];
159   };                                               90   };
160                                                    91 
161 Would be represented by the following field::      92 Would be represented by the following field::
162                                                    93 
163   struct mytype myname 20                          94   struct mytype myname 20
164                                                    95 
165 Deleting                                           96 Deleting
166 --------                                       !!  97 -----------
167 Deleting an event from within a user process i     98 Deleting an event from within a user process is done via ioctl() out to the
168 /sys/kernel/tracing/user_events_data file. The !!  99 /sys/kernel/debug/tracing/user_events_data file. The command to issue is
169 DIAG_IOCSDEL.                                     100 DIAG_IOCSDEL.
170                                                   101 
171 This command only requires a single string spe    102 This command only requires a single string specifying the event to delete by
172 its name. Delete will only succeed if there ar    103 its name. Delete will only succeed if there are no references left to the
173 event (in both user and kernel space). User pr    104 event (in both user and kernel space). User programs should use a separate file
174 to request deletes than the one used for regis    105 to request deletes than the one used for registration due to this.
175                                                   106 
176 **NOTE:** By default events will auto-delete w << 
177 to the event. If programs do not want auto-del << 
178 USER_EVENT_REG_PERSIST flag when registering t << 
179 the event exists until DIAG_IOCSDEL is invoked << 
180 event that persists requires CAP_PERFMON, othe << 
181 there are multiple formats of the same event n << 
182 name will be attempted to be deleted. If only  << 
183 be deleted then the /sys/kernel/tracing/dynami << 
184 that specific format of the event.             << 
185                                                << 
186 Unregistering                                  << 
187 -------------                                  << 
188 If after registering an event it is no longer  << 
189 be disabled via ioctl() out to the /sys/kernel << 
190 The command to issue is DIAG_IOCSUNREG. This i << 
191 deleting actually removes the event from the s << 
192 the kernel your process is no longer intereste << 
193                                                << 
194 This command takes a packed struct user_unreg  << 
195                                                << 
196   struct user_unreg {                          << 
197         /* Input: Size of the user_unreg struc << 
198         __u32 size;                            << 
199                                                << 
200         /* Input: Bit to unregister */         << 
201         __u8 disable_bit;                      << 
202                                                << 
203         /* Input: Reserved, set to 0 */        << 
204         __u8 __reserved;                       << 
205                                                << 
206         /* Input: Reserved, set to 0 */        << 
207         __u16 __reserved2;                     << 
208                                                << 
209         /* Input: Address to unregister */     << 
210         __u64 disable_addr;                    << 
211   } __attribute__((__packed__));               << 
212                                                << 
213 The struct user_unreg requires all the above i << 
214                                                << 
215 + size: This must be set to sizeof(struct user << 
216                                                << 
217 + disable_bit: This must be set to the bit to  << 
218   previously registered via enable_bit).       << 
219                                                << 
220 + disable_addr: This must be set to the addres << 
221   previously registered via enable_addr).      << 
222                                                << 
223 **NOTE:** Events are automatically unregistere << 
224 fork() the registered events will be retained  << 
225 in each process if wanted.                     << 
226                                                << 
227 Status                                            107 Status
228 ------                                            108 ------
229 When tools attach/record user based events the    109 When tools attach/record user based events the status of the event is updated
230 in realtime. This allows user programs to only    110 in realtime. This allows user programs to only incur the cost of the write() or
231 writev() calls when something is actively atta    111 writev() calls when something is actively attached to the event.
232                                                   112 
233 The kernel will update the specified bit that  !! 113 User programs call mmap() on /sys/kernel/debug/tracing/user_events_status to
234 tools attach/detach from the event. User progr !! 114 check the status for each event that is registered. The byte to check in the
235 to see if something is attached or not.        !! 115 file is given back after the register ioctl() via user_reg.status_index.
                                                   >> 116 Currently the size of user_events_status is a single page, however, custom
                                                   >> 117 kernel configurations can change this size to allow more user based events. In
                                                   >> 118 all cases the size of the file is a multiple of a page size.
                                                   >> 119 
                                                   >> 120 For example, if the register ioctl() gives back a status_index of 3 you would
                                                   >> 121 check byte 3 of the returned mmap data to see if anything is attached to that
                                                   >> 122 event.
236                                                   123 
237 Administrators can easily check the status of     124 Administrators can easily check the status of all registered events by reading
238 the user_events_status file directly via a ter    125 the user_events_status file directly via a terminal. The output is as follows::
239                                                   126 
240   Name [# Comments]                            !! 127   Byte:Name [# Comments]
241   ...                                             128   ...
242                                                   129 
243   Active: ActiveCount                             130   Active: ActiveCount
244   Busy: BusyCount                                 131   Busy: BusyCount
                                                   >> 132   Max: MaxCount
245                                                   133 
246 For example, on a system that has a single eve    134 For example, on a system that has a single event the output looks like this::
247                                                   135 
248   test                                         !! 136   1:test
249                                                   137 
250   Active: 1                                       138   Active: 1
251   Busy: 0                                         139   Busy: 0
                                                   >> 140   Max: 4096
252                                                   141 
253 If a user enables the user event via ftrace, t    142 If a user enables the user event via ftrace, the output would change to this::
254                                                   143 
255   test # Used by ftrace                        !! 144   1:test # Used by ftrace
256                                                   145 
257   Active: 1                                       146   Active: 1
258   Busy: 1                                         147   Busy: 1
                                                   >> 148   Max: 4096
                                                   >> 149 
                                                   >> 150 **NOTE:** *A status index of 0 will never be returned. This allows user
                                                   >> 151 programs to have an index that can be used on error cases.*
                                                   >> 152 
                                                   >> 153 Status Bits
                                                   >> 154 ^^^^^^^^^^^
                                                   >> 155 The byte being checked will be non-zero if anything is attached. Programs can
                                                   >> 156 check specific bits in the byte to see what mechanism has been attached.
                                                   >> 157 
                                                   >> 158 The following values are defined to aid in checking what has been attached:
                                                   >> 159 
                                                   >> 160 **EVENT_STATUS_FTRACE** - Bit set if ftrace has been attached (Bit 0).
                                                   >> 161 
                                                   >> 162 **EVENT_STATUS_PERF** - Bit set if perf has been attached (Bit 1).
259                                                   163 
260 Writing Data                                      164 Writing Data
261 ------------                                      165 ------------
262 After registering an event the same fd that wa    166 After registering an event the same fd that was used to register can be used
263 to write an entry for that event. The write_in    167 to write an entry for that event. The write_index returned must be at the start
264 of the data, then the remaining data is treate    168 of the data, then the remaining data is treated as the payload of the event.
265                                                   169 
266 For example, if write_index returned was 1 and    170 For example, if write_index returned was 1 and I wanted to write out an int
267 payload of the event. Then the data would have    171 payload of the event. Then the data would have to be 8 bytes (2 ints) in size,
268 with the first 4 bytes being equal to 1 and th    172 with the first 4 bytes being equal to 1 and the last 4 bytes being equal to the
269 value I want as the payload.                      173 value I want as the payload.
270                                                   174 
271 In memory this would look like this::             175 In memory this would look like this::
272                                                   176 
273   int index;                                      177   int index;
274   int payload;                                    178   int payload;
275                                                   179 
276 User programs might have well known structs th    180 User programs might have well known structs that they wish to use to emit out
277 as payloads. In those cases writev() can be us    181 as payloads. In those cases writev() can be used, with the first vector being
278 the index and the following vector(s) being th    182 the index and the following vector(s) being the actual event payload.
279                                                   183 
280 For example, if I have a struct like this::       184 For example, if I have a struct like this::
281                                                   185 
282   struct payload {                                186   struct payload {
283         int src;                                  187         int src;
284         int dst;                                  188         int dst;
285         int flags;                                189         int flags;
286   } __attribute__((__packed__));               !! 190   };
287                                                   191 
288 It's advised for user programs to do the follo    192 It's advised for user programs to do the following::
289                                                   193 
290   struct iovec io[2];                             194   struct iovec io[2];
291   struct payload e;                               195   struct payload e;
292                                                   196 
293   io[0].iov_base = &write_index;                  197   io[0].iov_base = &write_index;
294   io[0].iov_len = sizeof(write_index);            198   io[0].iov_len = sizeof(write_index);
295   io[1].iov_base = &e;                            199   io[1].iov_base = &e;
296   io[1].iov_len = sizeof(e);                      200   io[1].iov_len = sizeof(e);
297                                                   201 
298   writev(fd, (const struct iovec*)io, 2);         202   writev(fd, (const struct iovec*)io, 2);
299                                                   203 
300 **NOTE:** *The write_index is not emitted out     204 **NOTE:** *The write_index is not emitted out into the trace being recorded.*
301                                                   205 
302 Example Code                                      206 Example Code
303 ------------                                      207 ------------
304 See sample code in samples/user_events.           208 See sample code in samples/user_events.
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php