~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/trace/user_events.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/trace/user_events.rst (Version linux-6.11.5) and /Documentation/trace/user_events.rst (Version linux-6.2.16)


  1 =========================================           1 =========================================
  2 user_events: User-based Event Tracing               2 user_events: User-based Event Tracing
  3 =========================================           3 =========================================
  4                                                     4 
  5 :Author: Beau Belgrave                              5 :Author: Beau Belgrave
  6                                                     6 
  7 Overview                                            7 Overview
  8 --------                                            8 --------
  9 User based trace events allow user processes t      9 User based trace events allow user processes to create events and trace data
 10 that can be viewed via existing tools, such as     10 that can be viewed via existing tools, such as ftrace and perf.
 11 To enable this feature, build your kernel with     11 To enable this feature, build your kernel with CONFIG_USER_EVENTS=y.
 12                                                    12 
 13 Programs can view status of the events via         13 Programs can view status of the events via
 14 /sys/kernel/tracing/user_events_status and can !!  14 /sys/kernel/debug/tracing/user_events_status and can both register and write
 15 data out via /sys/kernel/tracing/user_events_d !!  15 data out via /sys/kernel/debug/tracing/user_events_data.
 16                                                    16 
 17 Programs can also use /sys/kernel/tracing/dyna !!  17 Programs can also use /sys/kernel/debug/tracing/dynamic_events to register and
 18 delete user based events via the u: prefix. Th     18 delete user based events via the u: prefix. The format of the command to
 19 dynamic_events is the same as the ioctl with t !!  19 dynamic_events is the same as the ioctl with the u: prefix applied.
 20 requires CAP_PERFMON due to the event persisti << 
 21                                                    20 
 22 Typically programs will register a set of even     21 Typically programs will register a set of events that they wish to expose to
 23 tools that can read trace_events (such as ftra     22 tools that can read trace_events (such as ftrace and perf). The registration
 24 process tells the kernel which address and bit !!  23 process gives back two ints to the program for each event. The first int is
 25 enabled the event and data should be written.  !!  24 the status bit. This describes which bit in little-endian format in the
 26 a write index which describes the data when a  !!  25 /sys/kernel/debug/tracing/user_events_status file represents this event. The
 27 on the /sys/kernel/tracing/user_events_data fi !!  26 second int is the write index which describes the data when a write() or
                                                   >>  27 writev() is called on the /sys/kernel/debug/tracing/user_events_data file.
 28                                                    28 
 29 The structures referenced in this document are     29 The structures referenced in this document are contained within the
 30 /include/uapi/linux/user_events.h file in the      30 /include/uapi/linux/user_events.h file in the source tree.
 31                                                    31 
 32 **NOTE:** *Both user_events_status and user_ev     32 **NOTE:** *Both user_events_status and user_events_data are under the tracefs
 33 filesystem and may be mounted at different pat     33 filesystem and may be mounted at different paths than above.*
 34                                                    34 
 35 Registering                                        35 Registering
 36 -----------                                        36 -----------
 37 Registering within a user process is done via      37 Registering within a user process is done via ioctl() out to the
 38 /sys/kernel/tracing/user_events_data file. The !!  38 /sys/kernel/debug/tracing/user_events_data file. The command to issue is
 39 DIAG_IOCSREG.                                      39 DIAG_IOCSREG.
 40                                                    40 
 41 This command takes a packed struct user_reg as     41 This command takes a packed struct user_reg as an argument::
 42                                                    42 
 43   struct user_reg {                                43   struct user_reg {
 44         /* Input: Size of the user_reg structu !!  44         u32 size;
 45         __u32 size;                            !!  45         u64 name_args;
 46                                                !!  46         u32 status_bit;
 47         /* Input: Bit in enable address to use !!  47         u32 write_index;
 48         __u8 enable_bit;                       !!  48   };
 49                                                << 
 50         /* Input: Enable size in bytes at addr << 
 51         __u8 enable_size;                      << 
 52                                                    49 
 53         /* Input: Flags to use, if any */      !!  50 The struct user_reg requires two inputs, the first is the size of the structure
 54         __u16 flags;                           !!  51 to ensure forward and backward compatibility. The second is the command string
 55                                                !!  52 to issue for registering. Upon success two outputs are set, the status bit
 56         /* Input: Address to update when enabl !!  53 and the write index.
 57         __u64 enable_addr;                     << 
 58                                                << 
 59         /* Input: Pointer to string with event << 
 60         __u64 name_args;                       << 
 61                                                << 
 62         /* Output: Index of the event to use w << 
 63         __u32 write_index;                     << 
 64   } __attribute__((__packed__));               << 
 65                                                << 
 66 The struct user_reg requires all the above inp << 
 67                                                << 
 68 + size: This must be set to sizeof(struct user << 
 69                                                << 
 70 + enable_bit: The bit to reflect the event sta << 
 71   enable_addr.                                 << 
 72                                                << 
 73 + enable_size: The size of the value specified << 
 74   This must be 4 (32-bit) or 8 (64-bit). 64-bi << 
 75   used on 64-bit kernels, however, 32-bit can  << 
 76                                                << 
 77 + flags: The flags to use, if any.             << 
 78   Callers should first attempt to use flags an << 
 79   support for lower versions of the kernel. If << 
 80   is returned.                                 << 
 81                                                << 
 82 + enable_addr: The address of the value to use << 
 83   must be naturally aligned and write accessib << 
 84                                                << 
 85 + name_args: The name and arguments to describ << 
 86   for details.                                 << 
 87                                                << 
 88 The following flags are currently supported.   << 
 89                                                << 
 90 + USER_EVENT_REG_PERSIST: The event will not d << 
 91   closing. Callers may use this if an event sh << 
 92   process closes or unregisters the event. Req << 
 93   -EPERM is returned.                          << 
 94                                                << 
 95 + USER_EVENT_REG_MULTI_FORMAT: The event can c << 
 96   allows programs to prevent themselves from b << 
 97   format changes and they wish to use the same << 
 98   tracepoint name will be in the new format of << 
 99   format of "name". A tracepoint will be creat << 
100   and format. This means if several processes  << 
101   they will use the same tracepoint. If yet an << 
102   but a different format than the other proces << 
103   tracepoint with a new unique id. Recording p << 
104   the various different formats of the event n << 
105   recording. The system name of the tracepoint << 
106   instead of "user_events". This prevents sing << 
107   with any multi-format event names within tra << 
108   a hex string. Recording programs should ensu << 
109   the event name they registered and has a suf << 
110   has hex characters. For example to find all  << 
111   can use the regex "^test\.[0-9a-fA-F]+$".    << 
112                                                << 
113 Upon successful registration the following is  << 
114                                                << 
115 + write_index: The index to use for this file  << 
116   event when writing out data. The index is un << 
117   descriptor that was used for the registratio << 
118                                                    54 
119 User based events show up under tracefs like a     55 User based events show up under tracefs like any other event under the
120 subsystem named "user_events". This means tool     56 subsystem named "user_events". This means tools that wish to attach to the
121 events need to use /sys/kernel/tracing/events/ !!  57 events need to use /sys/kernel/debug/tracing/events/user_events/[name]/enable
122 or perf record -e user_events:[name] when atta     58 or perf record -e user_events:[name] when attaching/recording.
123                                                    59 
124 **NOTE:** The event subsystem name by default  !!  60 **NOTE:** *The write_index returned is only valid for the FD that was used*
125 not assume it will always be "user_events". Op << 
126 future to change the subsystem name per-proces << 
127 In addition if the USER_EVENT_REG_MULTI_FORMAT << 
128 will have a unique id appended to it and the s << 
129 "user_events_multi" as described above.        << 
130                                                    61 
131 Command Format                                     62 Command Format
132 ^^^^^^^^^^^^^^                                     63 ^^^^^^^^^^^^^^
133 The command string format is as follows::          64 The command string format is as follows::
134                                                    65 
135   name[:FLAG1[,FLAG2...]] [Field1[;Field2...]]     66   name[:FLAG1[,FLAG2...]] [Field1[;Field2...]]
136                                                    67 
137 Supported Flags                                    68 Supported Flags
138 ^^^^^^^^^^^^^^^                                    69 ^^^^^^^^^^^^^^^
139 None yet                                           70 None yet
140                                                    71 
141 Field Format                                       72 Field Format
142 ^^^^^^^^^^^^                                       73 ^^^^^^^^^^^^
143 ::                                                 74 ::
144                                                    75 
145   type name [size]                                 76   type name [size]
146                                                    77 
147 Basic types are supported (__data_loc, u32, u6     78 Basic types are supported (__data_loc, u32, u64, int, char, char[20], etc).
148 User programs are encouraged to use clearly si     79 User programs are encouraged to use clearly sized types like u32.
149                                                    80 
150 **NOTE:** *Long is not supported since size ca     81 **NOTE:** *Long is not supported since size can vary between user and kernel.*
151                                                    82 
152 The size is only valid for types that start wi     83 The size is only valid for types that start with a struct prefix.
153 This allows user programs to describe custom s     84 This allows user programs to describe custom structs out to tools, if required.
154                                                    85 
155 For example, a struct in C that looks like thi     86 For example, a struct in C that looks like this::
156                                                    87 
157   struct mytype {                                  88   struct mytype {
158     char data[20];                                 89     char data[20];
159   };                                               90   };
160                                                    91 
161 Would be represented by the following field::      92 Would be represented by the following field::
162                                                    93 
163   struct mytype myname 20                          94   struct mytype myname 20
164                                                    95 
165 Deleting                                           96 Deleting
166 --------                                       !!  97 -----------
167 Deleting an event from within a user process i     98 Deleting an event from within a user process is done via ioctl() out to the
168 /sys/kernel/tracing/user_events_data file. The !!  99 /sys/kernel/debug/tracing/user_events_data file. The command to issue is
169 DIAG_IOCSDEL.                                     100 DIAG_IOCSDEL.
170                                                   101 
171 This command only requires a single string spe    102 This command only requires a single string specifying the event to delete by
172 its name. Delete will only succeed if there ar    103 its name. Delete will only succeed if there are no references left to the
173 event (in both user and kernel space). User pr    104 event (in both user and kernel space). User programs should use a separate file
174 to request deletes than the one used for regis    105 to request deletes than the one used for registration due to this.
175                                                   106 
176 **NOTE:** By default events will auto-delete w << 
177 to the event. If programs do not want auto-del << 
178 USER_EVENT_REG_PERSIST flag when registering t << 
179 the event exists until DIAG_IOCSDEL is invoked << 
180 event that persists requires CAP_PERFMON, othe << 
181 there are multiple formats of the same event n << 
182 name will be attempted to be deleted. If only  << 
183 be deleted then the /sys/kernel/tracing/dynami << 
184 that specific format of the event.             << 
185                                                << 
186 Unregistering                                  << 
187 -------------                                  << 
188 If after registering an event it is no longer  << 
189 be disabled via ioctl() out to the /sys/kernel << 
190 The command to issue is DIAG_IOCSUNREG. This i << 
191 deleting actually removes the event from the s << 
192 the kernel your process is no longer intereste << 
193                                                << 
194 This command takes a packed struct user_unreg  << 
195                                                << 
196   struct user_unreg {                          << 
197         /* Input: Size of the user_unreg struc << 
198         __u32 size;                            << 
199                                                << 
200         /* Input: Bit to unregister */         << 
201         __u8 disable_bit;                      << 
202                                                << 
203         /* Input: Reserved, set to 0 */        << 
204         __u8 __reserved;                       << 
205                                                << 
206         /* Input: Reserved, set to 0 */        << 
207         __u16 __reserved2;                     << 
208                                                << 
209         /* Input: Address to unregister */     << 
210         __u64 disable_addr;                    << 
211   } __attribute__((__packed__));               << 
212                                                << 
213 The struct user_unreg requires all the above i << 
214                                                << 
215 + size: This must be set to sizeof(struct user << 
216                                                << 
217 + disable_bit: This must be set to the bit to  << 
218   previously registered via enable_bit).       << 
219                                                << 
220 + disable_addr: This must be set to the addres << 
221   previously registered via enable_addr).      << 
222                                                << 
223 **NOTE:** Events are automatically unregistere << 
224 fork() the registered events will be retained  << 
225 in each process if wanted.                     << 
226                                                << 
227 Status                                            107 Status
228 ------                                            108 ------
229 When tools attach/record user based events the    109 When tools attach/record user based events the status of the event is updated
230 in realtime. This allows user programs to only    110 in realtime. This allows user programs to only incur the cost of the write() or
231 writev() calls when something is actively atta    111 writev() calls when something is actively attached to the event.
232                                                   112 
233 The kernel will update the specified bit that  !! 113 User programs call mmap() on /sys/kernel/debug/tracing/user_events_status to
234 tools attach/detach from the event. User progr !! 114 check the status for each event that is registered. The bit to check in the
235 to see if something is attached or not.        !! 115 file is given back after the register ioctl() via user_reg.status_bit. The bit
                                                   >> 116 is always in little-endian format. Programs can check if the bit is set either
                                                   >> 117 using a byte-wise index with a mask or a long-wise index with a little-endian
                                                   >> 118 mask.
                                                   >> 119 
                                                   >> 120 Currently the size of user_events_status is a single page, however, custom
                                                   >> 121 kernel configurations can change this size to allow more user based events. In
                                                   >> 122 all cases the size of the file is a multiple of a page size.
                                                   >> 123 
                                                   >> 124 For example, if the register ioctl() gives back a status_bit of 3 you would
                                                   >> 125 check byte 0 (3 / 8) of the returned mmap data and then AND the result with 8
                                                   >> 126 (1 << (3 % 8)) to see if anything is attached to that event.
                                                   >> 127 
                                                   >> 128 A byte-wise index check is performed as follows::
                                                   >> 129 
                                                   >> 130   int index, mask;
                                                   >> 131   char *status_page;
                                                   >> 132 
                                                   >> 133   index = status_bit / 8;
                                                   >> 134   mask = 1 << (status_bit % 8);
                                                   >> 135 
                                                   >> 136   ...
                                                   >> 137 
                                                   >> 138   if (status_page[index] & mask) {
                                                   >> 139         /* Enabled */
                                                   >> 140   }
                                                   >> 141 
                                                   >> 142 A long-wise index check is performed as follows::
                                                   >> 143 
                                                   >> 144   #include <asm/bitsperlong.h>
                                                   >> 145   #include <endian.h>
                                                   >> 146 
                                                   >> 147   #if __BITS_PER_LONG == 64
                                                   >> 148   #define endian_swap(x) htole64(x)
                                                   >> 149   #else
                                                   >> 150   #define endian_swap(x) htole32(x)
                                                   >> 151   #endif
                                                   >> 152 
                                                   >> 153   long index, mask, *status_page;
                                                   >> 154 
                                                   >> 155   index = status_bit / __BITS_PER_LONG;
                                                   >> 156   mask = 1L << (status_bit % __BITS_PER_LONG);
                                                   >> 157   mask = endian_swap(mask);
                                                   >> 158 
                                                   >> 159   ...
                                                   >> 160 
                                                   >> 161   if (status_page[index] & mask) {
                                                   >> 162         /* Enabled */
                                                   >> 163   }
236                                                   164 
237 Administrators can easily check the status of     165 Administrators can easily check the status of all registered events by reading
238 the user_events_status file directly via a ter    166 the user_events_status file directly via a terminal. The output is as follows::
239                                                   167 
240   Name [# Comments]                            !! 168   Byte:Name [# Comments]
241   ...                                             169   ...
242                                                   170 
243   Active: ActiveCount                             171   Active: ActiveCount
244   Busy: BusyCount                                 172   Busy: BusyCount
                                                   >> 173   Max: MaxCount
245                                                   174 
246 For example, on a system that has a single eve    175 For example, on a system that has a single event the output looks like this::
247                                                   176 
248   test                                         !! 177   1:test
249                                                   178 
250   Active: 1                                       179   Active: 1
251   Busy: 0                                         180   Busy: 0
                                                   >> 181   Max: 32768
252                                                   182 
253 If a user enables the user event via ftrace, t    183 If a user enables the user event via ftrace, the output would change to this::
254                                                   184 
255   test # Used by ftrace                        !! 185   1:test # Used by ftrace
256                                                   186 
257   Active: 1                                       187   Active: 1
258   Busy: 1                                         188   Busy: 1
                                                   >> 189   Max: 32768
                                                   >> 190 
                                                   >> 191 **NOTE:** *A status bit of 0 will never be returned. This allows user programs
                                                   >> 192 to have a bit that can be used on error cases.*
259                                                   193 
260 Writing Data                                      194 Writing Data
261 ------------                                      195 ------------
262 After registering an event the same fd that wa    196 After registering an event the same fd that was used to register can be used
263 to write an entry for that event. The write_in    197 to write an entry for that event. The write_index returned must be at the start
264 of the data, then the remaining data is treate    198 of the data, then the remaining data is treated as the payload of the event.
265                                                   199 
266 For example, if write_index returned was 1 and    200 For example, if write_index returned was 1 and I wanted to write out an int
267 payload of the event. Then the data would have    201 payload of the event. Then the data would have to be 8 bytes (2 ints) in size,
268 with the first 4 bytes being equal to 1 and th    202 with the first 4 bytes being equal to 1 and the last 4 bytes being equal to the
269 value I want as the payload.                      203 value I want as the payload.
270                                                   204 
271 In memory this would look like this::             205 In memory this would look like this::
272                                                   206 
273   int index;                                      207   int index;
274   int payload;                                    208   int payload;
275                                                   209 
276 User programs might have well known structs th    210 User programs might have well known structs that they wish to use to emit out
277 as payloads. In those cases writev() can be us    211 as payloads. In those cases writev() can be used, with the first vector being
278 the index and the following vector(s) being th    212 the index and the following vector(s) being the actual event payload.
279                                                   213 
280 For example, if I have a struct like this::       214 For example, if I have a struct like this::
281                                                   215 
282   struct payload {                                216   struct payload {
283         int src;                                  217         int src;
284         int dst;                                  218         int dst;
285         int flags;                                219         int flags;
286   } __attribute__((__packed__));               !! 220   };
287                                                   221 
288 It's advised for user programs to do the follo    222 It's advised for user programs to do the following::
289                                                   223 
290   struct iovec io[2];                             224   struct iovec io[2];
291   struct payload e;                               225   struct payload e;
292                                                   226 
293   io[0].iov_base = &write_index;                  227   io[0].iov_base = &write_index;
294   io[0].iov_len = sizeof(write_index);            228   io[0].iov_len = sizeof(write_index);
295   io[1].iov_base = &e;                            229   io[1].iov_base = &e;
296   io[1].iov_len = sizeof(e);                      230   io[1].iov_len = sizeof(e);
297                                                   231 
298   writev(fd, (const struct iovec*)io, 2);         232   writev(fd, (const struct iovec*)io, 2);
299                                                   233 
300 **NOTE:** *The write_index is not emitted out     234 **NOTE:** *The write_index is not emitted out into the trace being recorded.*
301                                                   235 
302 Example Code                                      236 Example Code
303 ------------                                      237 ------------
304 See sample code in samples/user_events.           238 See sample code in samples/user_events.
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php