~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/trace/fprobetrace.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 .. SPDX-License-Identifier: GPL-2.0
  2 
  3 ==========================
  4 Fprobe-based Event Tracing
  5 ==========================
  6 
  7 .. Author: Masami Hiramatsu <mhiramat@kernel.org>
  8 
  9 Overview
 10 --------
 11 
 12 Fprobe event is similar to the kprobe event, but limited to probe on
 13 the function entry and exit only. It is good enough for many use cases
 14 which only traces some specific functions.
 15 
 16 This document also covers tracepoint probe events (tprobe) since this
 17 is also works only on the tracepoint entry. User can trace a part of
 18 tracepoint argument, or the tracepoint without trace-event, which is
 19 not exposed on tracefs.
 20 
 21 As same as other dynamic events, fprobe events and tracepoint probe
 22 events are defined via `dynamic_events` interface file on tracefs.
 23 
 24 Synopsis of fprobe-events
 25 -------------------------
 26 ::
 27 
 28   f[:[GRP1/][EVENT1]] SYM [FETCHARGS]                       : Probe on function entry
 29   f[MAXACTIVE][:[GRP1/][EVENT1]] SYM%return [FETCHARGS]     : Probe on function exit
 30   t[:[GRP2/][EVENT2]] TRACEPOINT [FETCHARGS]                : Probe on tracepoint
 31 
 32  GRP1           : Group name for fprobe. If omitted, use "fprobes" for it.
 33  GRP2           : Group name for tprobe. If omitted, use "tracepoints" for it.
 34  EVENT1         : Event name for fprobe. If omitted, the event name is
 35                   "SYM__entry" or "SYM__exit".
 36  EVENT2         : Event name for tprobe. If omitted, the event name is
 37                   the same as "TRACEPOINT", but if the "TRACEPOINT" starts
 38                   with a digit character, "_TRACEPOINT" is used.
 39  MAXACTIVE      : Maximum number of instances of the specified function that
 40                   can be probed simultaneously, or 0 for the default value
 41                   as defined in Documentation/trace/fprobe.rst
 42 
 43  FETCHARGS      : Arguments. Each probe can have up to 128 args.
 44   ARG           : Fetch "ARG" function argument using BTF (only for function
 45                   entry or tracepoint.) (\*1)
 46   @ADDR         : Fetch memory at ADDR (ADDR should be in kernel)
 47   @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
 48   $stackN       : Fetch Nth entry of stack (N >= 0)
 49   $stack        : Fetch stack address.
 50   $argN         : Fetch the Nth function argument. (N >= 1) (\*2)
 51   $retval       : Fetch return value.(\*3)
 52   $comm         : Fetch current task comm.
 53   +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*4)(\*5)
 54   \IMM          : Store an immediate value to the argument.
 55   NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
 56   FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
 57                   (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
 58                   (x8/x16/x32/x64), "char", "string", "ustring", "symbol", "symstr"
 59                   and bitfield are supported.
 60 
 61   (\*1) This is available only when BTF is enabled.
 62   (\*2) only for the probe on function entry (offs == 0). Note, this argument access
 63         is best effort, because depending on the argument type, it may be passed on
 64         the stack. But this only support the arguments via registers.
 65   (\*3) only for return probe. Note that this is also best effort. Depending on the
 66         return value type, it might be passed via a pair of registers. But this only
 67         accesses one register.
 68   (\*4) this is useful for fetching a field of data structures.
 69   (\*5) "u" means user-space dereference.
 70 
 71 For the details of TYPE, see :ref:`kprobetrace documentation <kprobetrace_types>`.
 72 
 73 Function arguments at exit
 74 --------------------------
 75 Function arguments can be accessed at exit probe using $arg<N> fetcharg. This
 76 is useful to record the function parameter and return value at once, and
 77 trace the difference of structure fields (for debugging a function whether it
 78 correctly updates the given data structure or not)
 79 See the :ref:`sample<fprobetrace_exit_args_sample>` below for how it works.
 80 
 81 BTF arguments
 82 -------------
 83 BTF (BPF Type Format) argument allows user to trace function and tracepoint
 84 parameters by its name instead of ``$argN``. This feature is available if the
 85 kernel is configured with CONFIG_BPF_SYSCALL and CONFIG_DEBUG_INFO_BTF.
 86 If user only specify the BTF argument, the event's argument name is also
 87 automatically set by the given name. ::
 88 
 89  # echo 'f:myprobe vfs_read count pos' >> dynamic_events
 90  # cat dynamic_events
 91  f:fprobes/myprobe vfs_read count=count pos=pos
 92 
 93 It also chooses the fetch type from BTF information. For example, in the above
 94 example, the ``count`` is unsigned long, and the ``pos`` is a pointer. Thus,
 95 both are converted to 64bit unsigned long, but only ``pos`` has "%Lx"
 96 print-format as below ::
 97 
 98  # cat events/fprobes/myprobe/format
 99  name: myprobe
100  ID: 1313
101  format:
102         field:unsigned short common_type;       offset:0;       size:2; signed:0;
103         field:unsigned char common_flags;       offset:2;       size:1; signed:0;
104         field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
105         field:int common_pid;   offset:4;       size:4; signed:1;
106 
107         field:unsigned long __probe_ip; offset:8;       size:8; signed:0;
108         field:u64 count;        offset:16;      size:8; signed:0;
109         field:u64 pos;  offset:24;      size:8; signed:0;
110 
111  print fmt: "(%lx) count=%Lu pos=0x%Lx", REC->__probe_ip, REC->count, REC->pos
112 
113 If user unsures the name of arguments, ``$arg*`` will be helpful. The ``$arg*``
114 is expanded to all function arguments of the function or the tracepoint. ::
115 
116  # echo 'f:myprobe vfs_read $arg*' >> dynamic_events
117  # cat dynamic_events
118  f:fprobes/myprobe vfs_read file=file buf=buf count=count pos=pos
119 
120 BTF also affects the ``$retval``. If user doesn't set any type, the retval
121 type is automatically picked from the BTF. If the function returns ``void``,
122 ``$retval`` is rejected.
123 
124 You can access the data fields of a data structure using allow operator ``->``
125 (for pointer type) and dot operator ``.`` (for data structure type.)::
126 
127 # echo 't sched_switch preempt prev_pid=prev->pid next_pid=next->pid' >> dynamic_events
128 
129 The field access operators, ``->`` and ``.`` can be combined for accessing deeper
130 members and other structure members pointed by the member. e.g. ``foo->bar.baz->qux``
131 If there is non-name union member, you can directly access it as the C code does.
132 For example::
133 
134  struct {
135         union {
136         int a;
137         int b;
138         };
139  } *foo;
140 
141 To access ``a`` and ``b``, use ``foo->a`` and ``foo->b`` in this case.
142 
143 This data field access is available for the return value via ``$retval``,
144 e.g. ``$retval->name``.
145 
146 For these BTF arguments and fields, ``:string`` and ``:ustring`` change the
147 behavior. If these are used for BTF argument or field, it checks whether
148 the BTF type of the argument or the data field is ``char *`` or ``char []``,
149 or not.  If not, it rejects applying the string types. Also, with the BTF
150 support, you don't need a memory dereference operator (``+0(PTR)``) for
151 accessing the string pointed by a ``PTR``. It automatically adds the memory
152 dereference operator according to the BTF type. e.g. ::
153 
154 # echo 't sched_switch prev->comm:string' >> dynamic_events
155 # echo 'f getname_flags%return $retval->name:string' >> dynamic_events
156 
157 The ``prev->comm`` is an embedded char array in the data structure, and
158 ``$retval->name`` is a char pointer in the data structure. But in both
159 cases, you can use ``:string`` type to get the string.
160 
161 
162 Usage examples
163 --------------
164 Here is an example to add fprobe events on ``vfs_read()`` function entry
165 and exit, with BTF arguments.
166 ::
167 
168   # echo 'f vfs_read $arg*' >> dynamic_events
169   # echo 'f vfs_read%return $retval' >> dynamic_events
170   # cat dynamic_events
171  f:fprobes/vfs_read__entry vfs_read file=file buf=buf count=count pos=pos
172  f:fprobes/vfs_read__exit vfs_read%return arg1=$retval
173   # echo 1 > events/fprobes/enable
174   # head -n 20 trace | tail
175  #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
176  #              | |         |   |||||     |         |
177                sh-70      [000] ...1.   335.883195: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08
178                sh-70      [000] .....   335.883208: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1
179                sh-70      [000] ...1.   335.883220: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08
180                sh-70      [000] .....   335.883224: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1
181                sh-70      [000] ...1.   335.883232: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c687a count=1 pos=0xffffc900005aff08
182                sh-70      [000] .....   335.883237: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1
183                sh-70      [000] ...1.   336.050329: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08
184                sh-70      [000] .....   336.050343: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1
185 
186 You can see all function arguments and return values are recorded as signed int.
187 
188 Also, here is an example of tracepoint events on ``sched_switch`` tracepoint.
189 To compare the result, this also enables the ``sched_switch`` traceevent too.
190 ::
191 
192   # echo 't sched_switch $arg*' >> dynamic_events
193   # echo 1 > events/sched/sched_switch/enable
194   # echo 1 > events/tracepoints/sched_switch/enable
195   # echo > trace
196   # head -n 20 trace | tail
197  #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
198  #              | |         |   |||||     |         |
199                sh-70      [000] d..2.  3912.083993: sched_switch: prev_comm=sh prev_pid=70 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
200                sh-70      [000] d..3.  3912.083995: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffff88800664e100 next=0xffffffff828229c0 prev_state=1
201            <idle>-0       [000] d..2.  3912.084183: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_preempt next_pid=16 next_prio=120
202            <idle>-0       [000] d..3.  3912.084184: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffffffff828229c0 next=0xffff888004208000 prev_state=0
203       rcu_preempt-16      [000] d..2.  3912.084196: sched_switch: prev_comm=rcu_preempt prev_pid=16 prev_prio=120 prev_state=I ==> next_comm=swapper/0 next_pid=0 next_prio=120
204       rcu_preempt-16      [000] d..3.  3912.084196: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffff888004208000 next=0xffffffff828229c0 prev_state=1026
205            <idle>-0       [000] d..2.  3912.085191: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_preempt next_pid=16 next_prio=120
206            <idle>-0       [000] d..3.  3912.085191: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffffffff828229c0 next=0xffff888004208000 prev_state=0
207 
208 As you can see, the ``sched_switch`` trace-event shows *cooked* parameters, on
209 the other hand, the ``sched_switch`` tracepoint probe event shows *raw*
210 parameters. This means you can access any field values in the task
211 structure pointed by the ``prev`` and ``next`` arguments.
212 
213 For example, usually ``task_struct::start_time`` is not traced, but with this
214 traceprobe event, you can trace that field as below.
215 ::
216 
217   # echo 't sched_switch comm=next->comm:string next->start_time' > dynamic_events
218   # head -n 20 trace | tail
219  #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
220  #              | |         |   |||||     |         |
221                sh-70      [000] d..3.  5606.686577: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="rcu_preempt" usage=1 start_time=245000000
222       rcu_preempt-16      [000] d..3.  5606.686602: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="sh" usage=1 start_time=1596095526
223                sh-70      [000] d..3.  5606.686637: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0
224            <idle>-0       [000] d..3.  5606.687190: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="rcu_preempt" usage=1 start_time=245000000
225       rcu_preempt-16      [000] d..3.  5606.687202: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0
226            <idle>-0       [000] d..3.  5606.690317: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000
227       kworker/0:1-14      [000] d..3.  5606.690339: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0
228            <idle>-0       [000] d..3.  5606.692368: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000
229 
230 .. _fprobetrace_exit_args_sample:
231 
232 The return probe allows us to access the results of some functions, which returns
233 the error code and its results are passed via function parameter, such as an
234 structure-initialization function.
235 
236 For example, vfs_open() will link the file structure to the inode and update
237 mode. You can trace that changes with return probe.
238 ::
239 
240  # echo 'f vfs_open mode=file->f_mode:x32 inode=file->f_inode:x64' >> dynamic_events
241  # echo 'f vfs_open%%return mode=file->f_mode:x32 inode=file->f_inode:x64' >> dynamic_events
242  # echo 1 > events/fprobes/enable
243  # cat trace
244               sh-131     [006] ...1.  1945.714346: vfs_open__entry: (vfs_open+0x4/0x40) mode=0x2 inode=0x0
245               sh-131     [006] ...1.  1945.714358: vfs_open__exit: (do_open+0x274/0x3d0 <- vfs_open) mode=0x4d801e inode=0xffff888008470168
246              cat-143     [007] ...1.  1945.717949: vfs_open__entry: (vfs_open+0x4/0x40) mode=0x1 inode=0x0
247              cat-143     [007] ...1.  1945.717956: vfs_open__exit: (do_open+0x274/0x3d0 <- vfs_open) mode=0x4a801d inode=0xffff888005f78d28
248              cat-143     [007] ...1.  1945.720616: vfs_open__entry: (vfs_open+0x4/0x40) mode=0x1 inode=0x0
249              cat-143     [007] ...1.  1945.728263: vfs_open__exit: (do_open+0x274/0x3d0 <- vfs_open) mode=0xa800d inode=0xffff888004ada8d8
250 
251 You can see the `file::f_mode` and `file::f_inode` are updated in `vfs_open()`.

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php