~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/bpf/bpf_iterators.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/bpf/bpf_iterators.rst (Architecture mips) and /Documentation/bpf/bpf_iterators.rst (Architecture i386)


  1 =============                                       1 =============
  2 BPF Iterators                                       2 BPF Iterators
  3 =============                                       3 =============
  4                                                     4 
  5                                                     5 
  6 ----------                                          6 ----------
  7 Motivation                                          7 Motivation
  8 ----------                                          8 ----------
  9                                                     9 
 10 There are a few existing ways to dump kernel d     10 There are a few existing ways to dump kernel data into user space. The most
 11 popular one is the ``/proc`` system. For examp     11 popular one is the ``/proc`` system. For example, ``cat /proc/net/tcp6`` dumps
 12 all tcp6 sockets in the system, and ``cat /pro     12 all tcp6 sockets in the system, and ``cat /proc/net/netlink`` dumps all netlink
 13 sockets in the system. However, their output f     13 sockets in the system. However, their output format tends to be fixed, and if
 14 users want more information about these socket     14 users want more information about these sockets, they have to patch the kernel,
 15 which often takes time to publish upstream and     15 which often takes time to publish upstream and release. The same is true for popular
 16 tools like `ss <https://man7.org/linux/man-pag     16 tools like `ss <https://man7.org/linux/man-pages/man8/ss.8.html>`_ where any
 17 additional information needs a kernel patch.       17 additional information needs a kernel patch.
 18                                                    18 
 19 To solve this problem, the `drgn                   19 To solve this problem, the `drgn
 20 <https://www.kernel.org/doc/html/latest/bpf/dr     20 <https://www.kernel.org/doc/html/latest/bpf/drgn.html>`_ tool is often used to
 21 dig out the kernel data with no kernel change.     21 dig out the kernel data with no kernel change. However, the main drawback for
 22 drgn is performance, as it cannot do pointer t     22 drgn is performance, as it cannot do pointer tracing inside the kernel. In
 23 addition, drgn cannot validate a pointer value     23 addition, drgn cannot validate a pointer value and may read invalid data if the
 24 pointer becomes invalid inside the kernel.         24 pointer becomes invalid inside the kernel.
 25                                                    25 
 26 The BPF iterator solves the above problem by p     26 The BPF iterator solves the above problem by providing flexibility on what data
 27 (e.g., tasks, bpf_maps, etc.) to collect by ca     27 (e.g., tasks, bpf_maps, etc.) to collect by calling BPF programs for each kernel
 28 data object.                                       28 data object.
 29                                                    29 
 30 ----------------------                             30 ----------------------
 31 How BPF Iterators Work                             31 How BPF Iterators Work
 32 ----------------------                             32 ----------------------
 33                                                    33 
 34 A BPF iterator is a type of BPF program that a     34 A BPF iterator is a type of BPF program that allows users to iterate over
 35 specific types of kernel objects. Unlike tradi     35 specific types of kernel objects. Unlike traditional BPF tracing programs that
 36 allow users to define callbacks that are invok     36 allow users to define callbacks that are invoked at particular points of
 37 execution in the kernel, BPF iterators allow u     37 execution in the kernel, BPF iterators allow users to define callbacks that
 38 should be executed for every entry in a variet     38 should be executed for every entry in a variety of kernel data structures.
 39                                                    39 
 40 For example, users can define a BPF iterator t     40 For example, users can define a BPF iterator that iterates over every task on
 41 the system and dumps the total amount of CPU r     41 the system and dumps the total amount of CPU runtime currently used by each of
 42 them. Another BPF task iterator may instead du     42 them. Another BPF task iterator may instead dump the cgroup information for each
 43 task. Such flexibility is the core value of BP     43 task. Such flexibility is the core value of BPF iterators.
 44                                                    44 
 45 A BPF program is always loaded into the kernel     45 A BPF program is always loaded into the kernel at the behest of a user space
 46 process. A user space process loads a BPF prog     46 process. A user space process loads a BPF program by opening and initializing
 47 the program skeleton as required and then invo     47 the program skeleton as required and then invoking a syscall to have the BPF
 48 program verified and loaded by the kernel.         48 program verified and loaded by the kernel.
 49                                                    49 
 50 In traditional tracing programs, a program is      50 In traditional tracing programs, a program is activated by having user space
 51 obtain a ``bpf_link`` to the program with ``bp     51 obtain a ``bpf_link`` to the program with ``bpf_program__attach()``. Once
 52 activated, the program callback will be invoke     52 activated, the program callback will be invoked whenever the tracepoint is
 53 triggered in the main kernel. For BPF iterator     53 triggered in the main kernel. For BPF iterator programs, a ``bpf_link`` to the
 54 program is obtained using ``bpf_link_create()`     54 program is obtained using ``bpf_link_create()``, and the program callback is
 55 invoked by issuing system calls from user spac     55 invoked by issuing system calls from user space.
 56                                                    56 
 57 Next, let us see how you can use the iterators     57 Next, let us see how you can use the iterators to iterate on kernel objects and
 58 read data.                                         58 read data.
 59                                                    59 
 60 ------------------------                           60 ------------------------
 61 How to Use BPF iterators                           61 How to Use BPF iterators
 62 ------------------------                           62 ------------------------
 63                                                    63 
 64 BPF selftests are a great resource to illustra     64 BPF selftests are a great resource to illustrate how to use the iterators. In
 65 this section, we’ll walk through a BPF selft     65 this section, we’ll walk through a BPF selftest which shows how to load and use
 66 a BPF iterator program.   To begin, we’ll lo     66 a BPF iterator program.   To begin, we’ll look at `bpf_iter.c
 67 <https://git.kernel.org/pub/scm/linux/kernel/g     67 <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/prog_tests/bpf_iter.c>`_,
 68 which illustrates how to load and trigger BPF      68 which illustrates how to load and trigger BPF iterators on the user space side.
 69 Later, we’ll look at a BPF program that runs     69 Later, we’ll look at a BPF program that runs in kernel space.
 70                                                    70 
 71 Loading a BPF iterator in the kernel from user     71 Loading a BPF iterator in the kernel from user space typically involves the
 72 following steps:                                   72 following steps:
 73                                                    73 
 74 * The BPF program is loaded into the kernel th     74 * The BPF program is loaded into the kernel through ``libbpf``. Once the kernel
 75   has verified and loaded the program, it retu     75   has verified and loaded the program, it returns a file descriptor (fd) to user
 76   space.                                           76   space.
 77 * Obtain a ``link_fd`` to the BPF program by c     77 * Obtain a ``link_fd`` to the BPF program by calling the ``bpf_link_create()``
 78   specified with the BPF program file descript     78   specified with the BPF program file descriptor received from the kernel.
 79 * Next, obtain a BPF iterator file descriptor      79 * Next, obtain a BPF iterator file descriptor (``bpf_iter_fd``) by calling the
 80   ``bpf_iter_create()`` specified with the ``b     80   ``bpf_iter_create()`` specified with the ``bpf_link`` received from Step 2.
 81 * Trigger the iteration by calling ``read(bpf_     81 * Trigger the iteration by calling ``read(bpf_iter_fd)`` until no data is
 82   available.                                       82   available.
 83 * Close the iterator fd using ``close(bpf_iter     83 * Close the iterator fd using ``close(bpf_iter_fd)``.
 84 * If needed to reread the data, get a new ``bp     84 * If needed to reread the data, get a new ``bpf_iter_fd`` and do the read again.
 85                                                    85 
 86 The following are a few examples of selftest B     86 The following are a few examples of selftest BPF iterator programs:
 87                                                    87 
 88 * `bpf_iter_tcp4.c <https://git.kernel.org/pub     88 * `bpf_iter_tcp4.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_tcp4.c>`_
 89 * `bpf_iter_task_vma.c <https://git.kernel.org     89 * `bpf_iter_task_vma.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c>`_
 90 * `bpf_iter_task_file.c <https://git.kernel.or     90 * `bpf_iter_task_file.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c>`_
 91                                                    91 
 92 Let us look at ``bpf_iter_task_file.c``, which     92 Let us look at ``bpf_iter_task_file.c``, which runs in kernel space:
 93                                                    93 
 94 Here is the definition of ``bpf_iter__task_fil     94 Here is the definition of ``bpf_iter__task_file`` in `vmlinux.h
 95 <https://facebookmicrosites.github.io/bpf/blog     95 <https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html#btf>`_.
 96 Any struct name in ``vmlinux.h`` in the format     96 Any struct name in ``vmlinux.h`` in the format ``bpf_iter__<iter_name>``
 97 represents a BPF iterator. The suffix ``<iter_     97 represents a BPF iterator. The suffix ``<iter_name>`` represents the type of
 98 iterator.                                          98 iterator.
 99                                                    99 
100 ::                                                100 ::
101                                                   101 
102     struct bpf_iter__task_file {                  102     struct bpf_iter__task_file {
103             union {                               103             union {
104                 struct bpf_iter_meta *meta;       104                 struct bpf_iter_meta *meta;
105             };                                    105             };
106             union {                               106             union {
107                 struct task_struct *task;         107                 struct task_struct *task;
108             };                                    108             };
109             u32 fd;                               109             u32 fd;
110             union {                               110             union {
111                 struct file *file;                111                 struct file *file;
112             };                                    112             };
113     };                                            113     };
114                                                   114 
115 In the above code, the field 'meta' contains t    115 In the above code, the field 'meta' contains the metadata, which is the same for
116 all BPF iterator programs. The rest of the fie    116 all BPF iterator programs. The rest of the fields are specific to different
117 iterators. For example, for task_file iterator    117 iterators. For example, for task_file iterators, the kernel layer provides the
118 'task', 'fd' and 'file' field values. The 'tas    118 'task', 'fd' and 'file' field values. The 'task' and 'file' are `reference
119 counted                                           119 counted
120 <https://facebookmicrosites.github.io/bpf/blog    120 <https://facebookmicrosites.github.io/bpf/blog/2018/08/31/object-lifetime.html#file-descriptors-and-reference-counters>`_,
121 so they won't go away when the BPF program run    121 so they won't go away when the BPF program runs.
122                                                   122 
123 Here is a snippet from the  ``bpf_iter_task_fi    123 Here is a snippet from the  ``bpf_iter_task_file.c`` file:
124                                                   124 
125 ::                                                125 ::
126                                                   126 
127   SEC("iter/task_file")                           127   SEC("iter/task_file")
128   int dump_task_file(struct bpf_iter__task_fil    128   int dump_task_file(struct bpf_iter__task_file *ctx)
129   {                                               129   {
130     struct seq_file *seq = ctx->meta->seq;        130     struct seq_file *seq = ctx->meta->seq;
131     struct task_struct *task = ctx->task;         131     struct task_struct *task = ctx->task;
132     struct file *file = ctx->file;                132     struct file *file = ctx->file;
133     __u32 fd = ctx->fd;                           133     __u32 fd = ctx->fd;
134                                                   134 
135     if (task == NULL || file == NULL)             135     if (task == NULL || file == NULL)
136       return 0;                                   136       return 0;
137                                                   137 
138     if (ctx->meta->seq_num == 0) {                138     if (ctx->meta->seq_num == 0) {
139       count = 0;                                  139       count = 0;
140       BPF_SEQ_PRINTF(seq, "    tgid      gid      140       BPF_SEQ_PRINTF(seq, "    tgid      gid       fd      file\n");
141     }                                             141     }
142                                                   142 
143     if (tgid == task->tgid && task->tgid != ta    143     if (tgid == task->tgid && task->tgid != task->pid)
144       count++;                                    144       count++;
145                                                   145 
146     if (last_tgid != task->tgid) {                146     if (last_tgid != task->tgid) {
147       last_tgid = task->tgid;                     147       last_tgid = task->tgid;
148       unique_tgid_count++;                        148       unique_tgid_count++;
149     }                                             149     }
150                                                   150 
151     BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", t    151     BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd,
152             (long)file->f_op);                    152             (long)file->f_op);
153     return 0;                                     153     return 0;
154   }                                               154   }
155                                                   155 
156 In the above example, the section name ``SEC(i    156 In the above example, the section name ``SEC(iter/task_file)``, indicates that
157 the program is a BPF iterator program to itera    157 the program is a BPF iterator program to iterate all files from all tasks. The
158 context of the program is ``bpf_iter__task_fil    158 context of the program is ``bpf_iter__task_file`` struct.
159                                                   159 
160 The user space program invokes the BPF iterato    160 The user space program invokes the BPF iterator program running in the kernel
161 by issuing a ``read()`` syscall. Once invoked,    161 by issuing a ``read()`` syscall. Once invoked, the BPF
162 program can export data to user space using a     162 program can export data to user space using a variety of BPF helper functions.
163 You can use either ``bpf_seq_printf()`` (and B    163 You can use either ``bpf_seq_printf()`` (and BPF_SEQ_PRINTF helper macro) or
164 ``bpf_seq_write()`` function based on whether     164 ``bpf_seq_write()`` function based on whether you need formatted output or just
165 binary data, respectively. For binary-encoded     165 binary data, respectively. For binary-encoded data, the user space applications
166 can process the data from ``bpf_seq_write()``     166 can process the data from ``bpf_seq_write()`` as needed. For the formatted data,
167 you can use ``cat <path>`` to print the result    167 you can use ``cat <path>`` to print the results similar to ``cat
168 /proc/net/netlink`` after pinning the BPF iter    168 /proc/net/netlink`` after pinning the BPF iterator to the bpffs mount. Later,
169 use  ``rm -f <path>`` to remove the pinned ite    169 use  ``rm -f <path>`` to remove the pinned iterator.
170                                                   170 
171 For example, you can use the following command    171 For example, you can use the following command to create a BPF iterator from the
172 ``bpf_iter_ipv6_route.o`` object file and pin     172 ``bpf_iter_ipv6_route.o`` object file and pin it to the ``/sys/fs/bpf/my_route``
173 path:                                             173 path:
174                                                   174 
175 ::                                                175 ::
176                                                   176 
177   $ bpftool iter pin ./bpf_iter_ipv6_route.o      177   $ bpftool iter pin ./bpf_iter_ipv6_route.o  /sys/fs/bpf/my_route
178                                                   178 
179 And then print out the results using the follo    179 And then print out the results using the following command:
180                                                   180 
181 ::                                                181 ::
182                                                   182 
183   $ cat /sys/fs/bpf/my_route                      183   $ cat /sys/fs/bpf/my_route
184                                                   184 
185                                                   185 
186 ----------------------------------------------    186 -------------------------------------------------------
187 Implement Kernel Support for BPF Iterator Prog    187 Implement Kernel Support for BPF Iterator Program Types
188 ----------------------------------------------    188 -------------------------------------------------------
189                                                   189 
190 To implement a BPF iterator in the kernel, the    190 To implement a BPF iterator in the kernel, the developer must make a one-time
191 change to the following key data structure def    191 change to the following key data structure defined in the `bpf.h
192 <https://git.kernel.org/pub/scm/linux/kernel/g    192 <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/include/linux/bpf.h>`_
193 file.                                             193 file.
194                                                   194 
195 ::                                                195 ::
196                                                   196 
197   struct bpf_iter_reg {                           197   struct bpf_iter_reg {
198             const char *target;                   198             const char *target;
199             bpf_iter_attach_target_t attach_ta    199             bpf_iter_attach_target_t attach_target;
200             bpf_iter_detach_target_t detach_ta    200             bpf_iter_detach_target_t detach_target;
201             bpf_iter_show_fdinfo_t show_fdinfo    201             bpf_iter_show_fdinfo_t show_fdinfo;
202             bpf_iter_fill_link_info_t fill_lin    202             bpf_iter_fill_link_info_t fill_link_info;
203             bpf_iter_get_func_proto_t get_func    203             bpf_iter_get_func_proto_t get_func_proto;
204             u32 ctx_arg_info_size;                204             u32 ctx_arg_info_size;
205             u32 feature;                          205             u32 feature;
206             struct bpf_ctx_arg_aux ctx_arg_inf    206             struct bpf_ctx_arg_aux ctx_arg_info[BPF_ITER_CTX_ARG_MAX];
207             const struct bpf_iter_seq_info *se    207             const struct bpf_iter_seq_info *seq_info;
208   };                                              208   };
209                                                   209 
210 After filling the data structure fields, call     210 After filling the data structure fields, call ``bpf_iter_reg_target()`` to
211 register the iterator to the main BPF iterator    211 register the iterator to the main BPF iterator subsystem.
212                                                   212 
213 The following is the breakdown for each field     213 The following is the breakdown for each field in struct ``bpf_iter_reg``.
214                                                   214 
215 .. list-table::                                   215 .. list-table::
216    :widths: 25 50                                 216    :widths: 25 50
217    :header-rows: 1                                217    :header-rows: 1
218                                                   218 
219    * - Fields                                     219    * - Fields
220      - Description                                220      - Description
221    * - target                                     221    * - target
222      - Specifies the name of the BPF iterator.    222      - Specifies the name of the BPF iterator. For example: ``bpf_map``,
223        ``bpf_map_elem``. The name should be di    223        ``bpf_map_elem``. The name should be different from other ``bpf_iter`` target names in the kernel.
224    * - attach_target and detach_target            224    * - attach_target and detach_target
225      - Allows for target specific ``link_creat    225      - Allows for target specific ``link_create`` action since some targets
226        may need special processing. Called dur    226        may need special processing. Called during the user space link_create stage.
227    * - show_fdinfo and fill_link_info             227    * - show_fdinfo and fill_link_info
228      - Called to fill target specific informat    228      - Called to fill target specific information when user tries to get link
229        info associated with the iterator.         229        info associated with the iterator.
230    * - get_func_proto                             230    * - get_func_proto
231      - Permits a BPF iterator to access BPF he    231      - Permits a BPF iterator to access BPF helpers specific to the iterator.
232    * - ctx_arg_info_size and ctx_arg_info         232    * - ctx_arg_info_size and ctx_arg_info
233      - Specifies the verifier states for BPF p    233      - Specifies the verifier states for BPF program arguments associated with
234        the bpf iterator.                          234        the bpf iterator.
235    * - feature                                    235    * - feature
236      - Specifies certain action requests in th    236      - Specifies certain action requests in the kernel BPF iterator
237        infrastructure. Currently, only BPF_ITE    237        infrastructure. Currently, only BPF_ITER_RESCHED is supported. This means
238        that the kernel function cond_resched()    238        that the kernel function cond_resched() is called to avoid other kernel
239        subsystem (e.g., rcu) misbehaving.         239        subsystem (e.g., rcu) misbehaving.
240    * - seq_info                                   240    * - seq_info
241      - Specifies the set of seq operations for    241      - Specifies the set of seq operations for the BPF iterator and helpers to
242        initialize/free the private data for th    242        initialize/free the private data for the corresponding ``seq_file``.
243                                                   243 
244 `Click here                                       244 `Click here
245 <https://lore.kernel.org/bpf/20210212183107.509    245 <https://lore.kernel.org/bpf/20210212183107.50963-2-songliubraving@fb.com/">https://lore.kernel.org/bpf/20210212183107.50963-2-songliubraving@fb.com/>`_
246 to see an implementation of the ``task_vma`` B    246 to see an implementation of the ``task_vma`` BPF iterator in the kernel.
247                                                   247 
248 ---------------------------------                 248 ---------------------------------
249 Parameterizing BPF Task Iterators                 249 Parameterizing BPF Task Iterators
250 ---------------------------------                 250 ---------------------------------
251                                                   251 
252 By default, BPF iterators walk through all the    252 By default, BPF iterators walk through all the objects of the specified types
253 (processes, cgroups, maps, etc.) across the en    253 (processes, cgroups, maps, etc.) across the entire system to read relevant
254 kernel data. But often, there are cases where     254 kernel data. But often, there are cases where we only care about a much smaller
255 subset of iterable kernel objects, such as onl    255 subset of iterable kernel objects, such as only iterating tasks within a
256 specific process. Therefore, BPF iterator prog    256 specific process. Therefore, BPF iterator programs support filtering out objects
257 from iteration by allowing user space to confi    257 from iteration by allowing user space to configure the iterator program when it
258 is attached.                                      258 is attached.
259                                                   259 
260 --------------------------                        260 --------------------------
261 BPF Task Iterator Program                         261 BPF Task Iterator Program
262 --------------------------                        262 --------------------------
263                                                   263 
264 The following code is a BPF iterator program t    264 The following code is a BPF iterator program to print files and task information
265 through the ``seq_file`` of the iterator. It i    265 through the ``seq_file`` of the iterator. It is a standard BPF iterator program
266 that visits every file of an iterator. We will    266 that visits every file of an iterator. We will use this BPF program in our
267 example later.                                    267 example later.
268                                                   268 
269 ::                                                269 ::
270                                                   270 
271   #include <vmlinux.h>                            271   #include <vmlinux.h>
272   #include <bpf/bpf_helpers.h>                    272   #include <bpf/bpf_helpers.h>
273                                                   273 
274   char _license[] SEC("license") = "GPL";         274   char _license[] SEC("license") = "GPL";
275                                                   275 
276   SEC("iter/task_file")                           276   SEC("iter/task_file")
277   int dump_task_file(struct bpf_iter__task_fil    277   int dump_task_file(struct bpf_iter__task_file *ctx)
278   {                                               278   {
279         struct seq_file *seq = ctx->meta->seq;    279         struct seq_file *seq = ctx->meta->seq;
280         struct task_struct *task = ctx->task;     280         struct task_struct *task = ctx->task;
281         struct file *file = ctx->file;            281         struct file *file = ctx->file;
282         __u32 fd = ctx->fd;                       282         __u32 fd = ctx->fd;
283         if (task == NULL || file == NULL)         283         if (task == NULL || file == NULL)
284                 return 0;                         284                 return 0;
285         if (ctx->meta->seq_num == 0) {            285         if (ctx->meta->seq_num == 0) {
286                 BPF_SEQ_PRINTF(seq, "    tgid     286                 BPF_SEQ_PRINTF(seq, "    tgid      pid       fd      file\n");
287         }                                         287         }
288         BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n    288         BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd,
289                         (long)file->f_op);        289                         (long)file->f_op);
290         return 0;                                 290         return 0;
291   }                                               291   }
292                                                   292 
293 ----------------------------------------          293 ----------------------------------------
294 Creating a File Iterator with Parameters          294 Creating a File Iterator with Parameters
295 ----------------------------------------          295 ----------------------------------------
296                                                   296 
297 Now, let us look at how to create an iterator     297 Now, let us look at how to create an iterator that includes only files of a
298 process.                                          298 process.
299                                                   299 
300 First,  fill the ``bpf_iter_attach_opts`` stru    300 First,  fill the ``bpf_iter_attach_opts`` struct as shown below:
301                                                   301 
302 ::                                                302 ::
303                                                   303 
304   LIBBPF_OPTS(bpf_iter_attach_opts, opts);        304   LIBBPF_OPTS(bpf_iter_attach_opts, opts);
305   union bpf_iter_link_info linfo;                 305   union bpf_iter_link_info linfo;
306   memset(&linfo, 0, sizeof(linfo));               306   memset(&linfo, 0, sizeof(linfo));
307   linfo.task.pid = getpid();                      307   linfo.task.pid = getpid();
308   opts.link_info = &linfo;                        308   opts.link_info = &linfo;
309   opts.link_info_len = sizeof(linfo);             309   opts.link_info_len = sizeof(linfo);
310                                                   310 
311 ``linfo.task.pid``, if it is non-zero, directs    311 ``linfo.task.pid``, if it is non-zero, directs the kernel to create an iterator
312 that only includes opened files for the proces    312 that only includes opened files for the process with the specified ``pid``. In
313 this example, we will only be iterating files     313 this example, we will only be iterating files for our process. If
314 ``linfo.task.pid`` is zero, the iterator will     314 ``linfo.task.pid`` is zero, the iterator will visit every opened file of every
315 process. Similarly, ``linfo.task.tid`` directs    315 process. Similarly, ``linfo.task.tid`` directs the kernel to create an iterator
316 that visits opened files of a specific thread,    316 that visits opened files of a specific thread, not a process. In this example,
317 ``linfo.task.tid`` is different from ``linfo.t    317 ``linfo.task.tid`` is different from ``linfo.task.pid`` only if the thread has a
318 separate file descriptor table. In most circum    318 separate file descriptor table. In most circumstances, all process threads share
319 a single file descriptor table.                   319 a single file descriptor table.
320                                                   320 
321 Now, in the userspace program, pass the pointe    321 Now, in the userspace program, pass the pointer of struct to the
322 ``bpf_program__attach_iter()``.                   322 ``bpf_program__attach_iter()``.
323                                                   323 
324 ::                                                324 ::
325                                                   325 
326   link = bpf_program__attach_iter(prog, &opts)    326   link = bpf_program__attach_iter(prog, &opts); iter_fd =
327   bpf_iter_create(bpf_link__fd(link));            327   bpf_iter_create(bpf_link__fd(link));
328                                                   328 
329 If both *tid* and *pid* are zero, an iterator     329 If both *tid* and *pid* are zero, an iterator created from this struct
330 ``bpf_iter_attach_opts`` will include every op    330 ``bpf_iter_attach_opts`` will include every opened file of every task in the
331 system (in the namespace, actually.) It is the    331 system (in the namespace, actually.) It is the same as passing a NULL as the
332 second argument to ``bpf_program__attach_iter(    332 second argument to ``bpf_program__attach_iter()``.
333                                                   333 
334 The whole program looks like the following cod    334 The whole program looks like the following code:
335                                                   335 
336 ::                                                336 ::
337                                                   337 
338   #include <stdio.h>                              338   #include <stdio.h>
339   #include <unistd.h>                             339   #include <unistd.h>
340   #include <bpf/bpf.h>                            340   #include <bpf/bpf.h>
341   #include <bpf/libbpf.h>                         341   #include <bpf/libbpf.h>
342   #include "bpf_iter_task_ex.skel.h"              342   #include "bpf_iter_task_ex.skel.h"
343                                                   343 
344   static int do_read_opts(struct bpf_program *    344   static int do_read_opts(struct bpf_program *prog, struct bpf_iter_attach_opts *opts)
345   {                                               345   {
346         struct bpf_link *link;                    346         struct bpf_link *link;
347         char buf[16] = {};                        347         char buf[16] = {};
348         int iter_fd = -1, len;                    348         int iter_fd = -1, len;
349         int ret = 0;                              349         int ret = 0;
350                                                   350 
351         link = bpf_program__attach_iter(prog,     351         link = bpf_program__attach_iter(prog, opts);
352         if (!link) {                              352         if (!link) {
353                 fprintf(stderr, "bpf_program__    353                 fprintf(stderr, "bpf_program__attach_iter() fails\n");
354                 return -1;                        354                 return -1;
355         }                                         355         }
356         iter_fd = bpf_iter_create(bpf_link__fd    356         iter_fd = bpf_iter_create(bpf_link__fd(link));
357         if (iter_fd < 0) {                        357         if (iter_fd < 0) {
358                 fprintf(stderr, "bpf_iter_crea    358                 fprintf(stderr, "bpf_iter_create() fails\n");
359                 ret = -1;                         359                 ret = -1;
360                 goto free_link;                   360                 goto free_link;
361         }                                         361         }
362         /* not check contents, but ensure read    362         /* not check contents, but ensure read() ends without error */
363         while ((len = read(iter_fd, buf, sizeo    363         while ((len = read(iter_fd, buf, sizeof(buf) - 1)) > 0) {
364                 buf[len] = 0;                     364                 buf[len] = 0;
365                 printf("%s", buf);                365                 printf("%s", buf);
366         }                                         366         }
367         printf("\n");                             367         printf("\n");
368   free_link:                                      368   free_link:
369         if (iter_fd >= 0)                         369         if (iter_fd >= 0)
370                 close(iter_fd);                   370                 close(iter_fd);
371         bpf_link__destroy(link);                  371         bpf_link__destroy(link);
372         return 0;                                 372         return 0;
373   }                                               373   }
374                                                   374 
375   static void test_task_file(void)                375   static void test_task_file(void)
376   {                                               376   {
377         LIBBPF_OPTS(bpf_iter_attach_opts, opts    377         LIBBPF_OPTS(bpf_iter_attach_opts, opts);
378         struct bpf_iter_task_ex *skel;            378         struct bpf_iter_task_ex *skel;
379         union bpf_iter_link_info linfo;           379         union bpf_iter_link_info linfo;
380         skel = bpf_iter_task_ex__open_and_load    380         skel = bpf_iter_task_ex__open_and_load();
381         if (skel == NULL)                         381         if (skel == NULL)
382                 return;                           382                 return;
383         memset(&linfo, 0, sizeof(linfo));         383         memset(&linfo, 0, sizeof(linfo));
384         linfo.task.pid = getpid();                384         linfo.task.pid = getpid();
385         opts.link_info = &linfo;                  385         opts.link_info = &linfo;
386         opts.link_info_len = sizeof(linfo);       386         opts.link_info_len = sizeof(linfo);
387         printf("PID %d\n", getpid());             387         printf("PID %d\n", getpid());
388         do_read_opts(skel->progs.dump_task_fil    388         do_read_opts(skel->progs.dump_task_file, &opts);
389         bpf_iter_task_ex__destroy(skel);          389         bpf_iter_task_ex__destroy(skel);
390   }                                               390   }
391                                                   391 
392   int main(int argc, const char * const * argv    392   int main(int argc, const char * const * argv)
393   {                                               393   {
394         test_task_file();                         394         test_task_file();
395         return 0;                                 395         return 0;
396   }                                               396   }
397                                                   397 
398 The following lines are the output of the prog    398 The following lines are the output of the program.
399 ::                                                399 ::
400                                                   400 
401   PID 1859                                        401   PID 1859
402                                                   402 
403      tgid      pid       fd      file             403      tgid      pid       fd      file
404      1859     1859        0 ffffffff82270aa0      404      1859     1859        0 ffffffff82270aa0
405      1859     1859        1 ffffffff82270aa0      405      1859     1859        1 ffffffff82270aa0
406      1859     1859        2 ffffffff82270aa0      406      1859     1859        2 ffffffff82270aa0
407      1859     1859        3 ffffffff82272980      407      1859     1859        3 ffffffff82272980
408      1859     1859        4 ffffffff8225e120      408      1859     1859        4 ffffffff8225e120
409      1859     1859        5 ffffffff82255120      409      1859     1859        5 ffffffff82255120
410      1859     1859        6 ffffffff82254f00      410      1859     1859        6 ffffffff82254f00
411      1859     1859        7 ffffffff82254d80      411      1859     1859        7 ffffffff82254d80
412      1859     1859        8 ffffffff8225abe0      412      1859     1859        8 ffffffff8225abe0
413                                                   413 
414 ------------------                                414 ------------------
415 Without Parameters                                415 Without Parameters
416 ------------------                                416 ------------------
417                                                   417 
418 Let us look at how a BPF iterator without para    418 Let us look at how a BPF iterator without parameters skips files of other
419 processes in the system. In this case, the BPF    419 processes in the system. In this case, the BPF program has to check the pid or
420 the tid of tasks, or it will receive every ope    420 the tid of tasks, or it will receive every opened file in the system (in the
421 current *pid* namespace, actually). So, we usu    421 current *pid* namespace, actually). So, we usually add a global variable in the
422 BPF program to pass a *pid* to the BPF program    422 BPF program to pass a *pid* to the BPF program.
423                                                   423 
424 The BPF program would look like the following     424 The BPF program would look like the following block.
425                                                   425 
426   ::                                              426   ::
427                                                   427 
428     ......                                        428     ......
429     int target_pid = 0;                           429     int target_pid = 0;
430                                                   430 
431     SEC("iter/task_file")                         431     SEC("iter/task_file")
432     int dump_task_file(struct bpf_iter__task_f    432     int dump_task_file(struct bpf_iter__task_file *ctx)
433     {                                             433     {
434           ......                                  434           ......
435           if (task->tgid != target_pid) /* Che    435           if (task->tgid != target_pid) /* Check task->pid instead to check thread IDs */
436                   return 0;                       436                   return 0;
437           BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx    437           BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd,
438                           (long)file->f_op);      438                           (long)file->f_op);
439           return 0;                               439           return 0;
440     }                                             440     }
441                                                   441 
442 The user space program would look like the fol    442 The user space program would look like the following block:
443                                                   443 
444   ::                                              444   ::
445                                                   445 
446     ......                                        446     ......
447     static void test_task_file(void)              447     static void test_task_file(void)
448     {                                             448     {
449           ......                                  449           ......
450           skel = bpf_iter_task_ex__open_and_lo    450           skel = bpf_iter_task_ex__open_and_load();
451           if (skel == NULL)                       451           if (skel == NULL)
452                   return;                         452                   return;
453           skel->bss->target_pid = getpid(); /*    453           skel->bss->target_pid = getpid(); /* process ID.  For thread id, use gettid() */
454           memset(&linfo, 0, sizeof(linfo));       454           memset(&linfo, 0, sizeof(linfo));
455           linfo.task.pid = getpid();              455           linfo.task.pid = getpid();
456           opts.link_info = &linfo;                456           opts.link_info = &linfo;
457           opts.link_info_len = sizeof(linfo);     457           opts.link_info_len = sizeof(linfo);
458           ......                                  458           ......
459     }                                             459     }
460                                                   460 
461 ``target_pid`` is a global variable in the BPF    461 ``target_pid`` is a global variable in the BPF program. The user space program
462 should initialize the variable with a process     462 should initialize the variable with a process ID to skip opened files of other
463 processes in the BPF program. When you paramet    463 processes in the BPF program. When you parametrize a BPF iterator, the iterator
464 calls the BPF program fewer times which can sa    464 calls the BPF program fewer times which can save significant resources.
465                                                   465 
466 ---------------------------                       466 ---------------------------
467 Parametrizing VMA Iterators                       467 Parametrizing VMA Iterators
468 ---------------------------                       468 ---------------------------
469                                                   469 
470 By default, a BPF VMA iterator includes every     470 By default, a BPF VMA iterator includes every VMA in every process.  However,
471 you can still specify a process or a thread to    471 you can still specify a process or a thread to include only its VMAs. Unlike
472 files, a thread can not have a separate addres    472 files, a thread can not have a separate address space (since Linux 2.6.0-test6).
473 Here, using *tid* makes no difference from usi    473 Here, using *tid* makes no difference from using *pid*.
474                                                   474 
475 ----------------------------                      475 ----------------------------
476 Parametrizing Task Iterators                      476 Parametrizing Task Iterators
477 ----------------------------                      477 ----------------------------
478                                                   478 
479 A BPF task iterator with *pid* includes all ta    479 A BPF task iterator with *pid* includes all tasks (threads) of a process. The
480 BPF program receives these tasks one after ano    480 BPF program receives these tasks one after another. You can specify a BPF task
481 iterator with *tid* parameter to include only     481 iterator with *tid* parameter to include only the tasks that match the given
482 *tid*.                                            482 *tid*.
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php