1 ============= 1 ============= 2 BPF Iterators 2 BPF Iterators 3 ============= 3 ============= 4 4 5 5 6 ---------- 6 ---------- 7 Motivation 7 Motivation 8 ---------- 8 ---------- 9 9 10 There are a few existing ways to dump kernel d 10 There are a few existing ways to dump kernel data into user space. The most 11 popular one is the ``/proc`` system. For examp 11 popular one is the ``/proc`` system. For example, ``cat /proc/net/tcp6`` dumps 12 all tcp6 sockets in the system, and ``cat /pro 12 all tcp6 sockets in the system, and ``cat /proc/net/netlink`` dumps all netlink 13 sockets in the system. However, their output f 13 sockets in the system. However, their output format tends to be fixed, and if 14 users want more information about these socket 14 users want more information about these sockets, they have to patch the kernel, 15 which often takes time to publish upstream and 15 which often takes time to publish upstream and release. The same is true for popular 16 tools like `ss <https://man7.org/linux/man-pag 16 tools like `ss <https://man7.org/linux/man-pages/man8/ss.8.html>`_ where any 17 additional information needs a kernel patch. 17 additional information needs a kernel patch. 18 18 19 To solve this problem, the `drgn 19 To solve this problem, the `drgn 20 <https://www.kernel.org/doc/html/latest/bpf/dr 20 <https://www.kernel.org/doc/html/latest/bpf/drgn.html>`_ tool is often used to 21 dig out the kernel data with no kernel change. 21 dig out the kernel data with no kernel change. However, the main drawback for 22 drgn is performance, as it cannot do pointer t 22 drgn is performance, as it cannot do pointer tracing inside the kernel. In 23 addition, drgn cannot validate a pointer value 23 addition, drgn cannot validate a pointer value and may read invalid data if the 24 pointer becomes invalid inside the kernel. 24 pointer becomes invalid inside the kernel. 25 25 26 The BPF iterator solves the above problem by p 26 The BPF iterator solves the above problem by providing flexibility on what data 27 (e.g., tasks, bpf_maps, etc.) to collect by ca 27 (e.g., tasks, bpf_maps, etc.) to collect by calling BPF programs for each kernel 28 data object. 28 data object. 29 29 30 ---------------------- 30 ---------------------- 31 How BPF Iterators Work 31 How BPF Iterators Work 32 ---------------------- 32 ---------------------- 33 33 34 A BPF iterator is a type of BPF program that a 34 A BPF iterator is a type of BPF program that allows users to iterate over 35 specific types of kernel objects. Unlike tradi 35 specific types of kernel objects. Unlike traditional BPF tracing programs that 36 allow users to define callbacks that are invok 36 allow users to define callbacks that are invoked at particular points of 37 execution in the kernel, BPF iterators allow u 37 execution in the kernel, BPF iterators allow users to define callbacks that 38 should be executed for every entry in a variet 38 should be executed for every entry in a variety of kernel data structures. 39 39 40 For example, users can define a BPF iterator t 40 For example, users can define a BPF iterator that iterates over every task on 41 the system and dumps the total amount of CPU r 41 the system and dumps the total amount of CPU runtime currently used by each of 42 them. Another BPF task iterator may instead du 42 them. Another BPF task iterator may instead dump the cgroup information for each 43 task. Such flexibility is the core value of BP 43 task. Such flexibility is the core value of BPF iterators. 44 44 45 A BPF program is always loaded into the kernel 45 A BPF program is always loaded into the kernel at the behest of a user space 46 process. A user space process loads a BPF prog 46 process. A user space process loads a BPF program by opening and initializing 47 the program skeleton as required and then invo 47 the program skeleton as required and then invoking a syscall to have the BPF 48 program verified and loaded by the kernel. 48 program verified and loaded by the kernel. 49 49 50 In traditional tracing programs, a program is 50 In traditional tracing programs, a program is activated by having user space 51 obtain a ``bpf_link`` to the program with ``bp 51 obtain a ``bpf_link`` to the program with ``bpf_program__attach()``. Once 52 activated, the program callback will be invoke 52 activated, the program callback will be invoked whenever the tracepoint is 53 triggered in the main kernel. For BPF iterator 53 triggered in the main kernel. For BPF iterator programs, a ``bpf_link`` to the 54 program is obtained using ``bpf_link_create()` 54 program is obtained using ``bpf_link_create()``, and the program callback is 55 invoked by issuing system calls from user spac 55 invoked by issuing system calls from user space. 56 56 57 Next, let us see how you can use the iterators 57 Next, let us see how you can use the iterators to iterate on kernel objects and 58 read data. 58 read data. 59 59 60 ------------------------ 60 ------------------------ 61 How to Use BPF iterators 61 How to Use BPF iterators 62 ------------------------ 62 ------------------------ 63 63 64 BPF selftests are a great resource to illustra 64 BPF selftests are a great resource to illustrate how to use the iterators. In 65 this section, we’ll walk through a BPF selft 65 this section, we’ll walk through a BPF selftest which shows how to load and use 66 a BPF iterator program. To begin, we’ll lo 66 a BPF iterator program. To begin, we’ll look at `bpf_iter.c 67 <https://git.kernel.org/pub/scm/linux/kernel/g 67 <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/prog_tests/bpf_iter.c>`_, 68 which illustrates how to load and trigger BPF 68 which illustrates how to load and trigger BPF iterators on the user space side. 69 Later, we’ll look at a BPF program that runs 69 Later, we’ll look at a BPF program that runs in kernel space. 70 70 71 Loading a BPF iterator in the kernel from user 71 Loading a BPF iterator in the kernel from user space typically involves the 72 following steps: 72 following steps: 73 73 74 * The BPF program is loaded into the kernel th 74 * The BPF program is loaded into the kernel through ``libbpf``. Once the kernel 75 has verified and loaded the program, it retu 75 has verified and loaded the program, it returns a file descriptor (fd) to user 76 space. 76 space. 77 * Obtain a ``link_fd`` to the BPF program by c 77 * Obtain a ``link_fd`` to the BPF program by calling the ``bpf_link_create()`` 78 specified with the BPF program file descript 78 specified with the BPF program file descriptor received from the kernel. 79 * Next, obtain a BPF iterator file descriptor 79 * Next, obtain a BPF iterator file descriptor (``bpf_iter_fd``) by calling the 80 ``bpf_iter_create()`` specified with the ``b 80 ``bpf_iter_create()`` specified with the ``bpf_link`` received from Step 2. 81 * Trigger the iteration by calling ``read(bpf_ 81 * Trigger the iteration by calling ``read(bpf_iter_fd)`` until no data is 82 available. 82 available. 83 * Close the iterator fd using ``close(bpf_iter 83 * Close the iterator fd using ``close(bpf_iter_fd)``. 84 * If needed to reread the data, get a new ``bp 84 * If needed to reread the data, get a new ``bpf_iter_fd`` and do the read again. 85 85 86 The following are a few examples of selftest B 86 The following are a few examples of selftest BPF iterator programs: 87 87 88 * `bpf_iter_tcp4.c <https://git.kernel.org/pub 88 * `bpf_iter_tcp4.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_tcp4.c>`_ 89 * `bpf_iter_task_vma.c <https://git.kernel.org 89 * `bpf_iter_task_vma.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c>`_ 90 * `bpf_iter_task_file.c <https://git.kernel.or 90 * `bpf_iter_task_file.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c>`_ 91 91 92 Let us look at ``bpf_iter_task_file.c``, which 92 Let us look at ``bpf_iter_task_file.c``, which runs in kernel space: 93 93 94 Here is the definition of ``bpf_iter__task_fil 94 Here is the definition of ``bpf_iter__task_file`` in `vmlinux.h 95 <https://facebookmicrosites.github.io/bpf/blog 95 <https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html#btf>`_. 96 Any struct name in ``vmlinux.h`` in the format 96 Any struct name in ``vmlinux.h`` in the format ``bpf_iter__<iter_name>`` 97 represents a BPF iterator. The suffix ``<iter_ 97 represents a BPF iterator. The suffix ``<iter_name>`` represents the type of 98 iterator. 98 iterator. 99 99 100 :: 100 :: 101 101 102 struct bpf_iter__task_file { 102 struct bpf_iter__task_file { 103 union { 103 union { 104 struct bpf_iter_meta *meta; 104 struct bpf_iter_meta *meta; 105 }; 105 }; 106 union { 106 union { 107 struct task_struct *task; 107 struct task_struct *task; 108 }; 108 }; 109 u32 fd; 109 u32 fd; 110 union { 110 union { 111 struct file *file; 111 struct file *file; 112 }; 112 }; 113 }; 113 }; 114 114 115 In the above code, the field 'meta' contains t 115 In the above code, the field 'meta' contains the metadata, which is the same for 116 all BPF iterator programs. The rest of the fie 116 all BPF iterator programs. The rest of the fields are specific to different 117 iterators. For example, for task_file iterator 117 iterators. For example, for task_file iterators, the kernel layer provides the 118 'task', 'fd' and 'file' field values. The 'tas 118 'task', 'fd' and 'file' field values. The 'task' and 'file' are `reference 119 counted 119 counted 120 <https://facebookmicrosites.github.io/bpf/blog 120 <https://facebookmicrosites.github.io/bpf/blog/2018/08/31/object-lifetime.html#file-descriptors-and-reference-counters>`_, 121 so they won't go away when the BPF program run 121 so they won't go away when the BPF program runs. 122 122 123 Here is a snippet from the ``bpf_iter_task_fi 123 Here is a snippet from the ``bpf_iter_task_file.c`` file: 124 124 125 :: 125 :: 126 126 127 SEC("iter/task_file") 127 SEC("iter/task_file") 128 int dump_task_file(struct bpf_iter__task_fil 128 int dump_task_file(struct bpf_iter__task_file *ctx) 129 { 129 { 130 struct seq_file *seq = ctx->meta->seq; 130 struct seq_file *seq = ctx->meta->seq; 131 struct task_struct *task = ctx->task; 131 struct task_struct *task = ctx->task; 132 struct file *file = ctx->file; 132 struct file *file = ctx->file; 133 __u32 fd = ctx->fd; 133 __u32 fd = ctx->fd; 134 134 135 if (task == NULL || file == NULL) 135 if (task == NULL || file == NULL) 136 return 0; 136 return 0; 137 137 138 if (ctx->meta->seq_num == 0) { 138 if (ctx->meta->seq_num == 0) { 139 count = 0; 139 count = 0; 140 BPF_SEQ_PRINTF(seq, " tgid gid 140 BPF_SEQ_PRINTF(seq, " tgid gid fd file\n"); 141 } 141 } 142 142 143 if (tgid == task->tgid && task->tgid != ta 143 if (tgid == task->tgid && task->tgid != task->pid) 144 count++; 144 count++; 145 145 146 if (last_tgid != task->tgid) { 146 if (last_tgid != task->tgid) { 147 last_tgid = task->tgid; 147 last_tgid = task->tgid; 148 unique_tgid_count++; 148 unique_tgid_count++; 149 } 149 } 150 150 151 BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", t 151 BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd, 152 (long)file->f_op); 152 (long)file->f_op); 153 return 0; 153 return 0; 154 } 154 } 155 155 156 In the above example, the section name ``SEC(i 156 In the above example, the section name ``SEC(iter/task_file)``, indicates that 157 the program is a BPF iterator program to itera 157 the program is a BPF iterator program to iterate all files from all tasks. The 158 context of the program is ``bpf_iter__task_fil 158 context of the program is ``bpf_iter__task_file`` struct. 159 159 160 The user space program invokes the BPF iterato 160 The user space program invokes the BPF iterator program running in the kernel 161 by issuing a ``read()`` syscall. Once invoked, 161 by issuing a ``read()`` syscall. Once invoked, the BPF 162 program can export data to user space using a 162 program can export data to user space using a variety of BPF helper functions. 163 You can use either ``bpf_seq_printf()`` (and B 163 You can use either ``bpf_seq_printf()`` (and BPF_SEQ_PRINTF helper macro) or 164 ``bpf_seq_write()`` function based on whether 164 ``bpf_seq_write()`` function based on whether you need formatted output or just 165 binary data, respectively. For binary-encoded 165 binary data, respectively. For binary-encoded data, the user space applications 166 can process the data from ``bpf_seq_write()`` 166 can process the data from ``bpf_seq_write()`` as needed. For the formatted data, 167 you can use ``cat <path>`` to print the result 167 you can use ``cat <path>`` to print the results similar to ``cat 168 /proc/net/netlink`` after pinning the BPF iter 168 /proc/net/netlink`` after pinning the BPF iterator to the bpffs mount. Later, 169 use ``rm -f <path>`` to remove the pinned ite 169 use ``rm -f <path>`` to remove the pinned iterator. 170 170 171 For example, you can use the following command 171 For example, you can use the following command to create a BPF iterator from the 172 ``bpf_iter_ipv6_route.o`` object file and pin 172 ``bpf_iter_ipv6_route.o`` object file and pin it to the ``/sys/fs/bpf/my_route`` 173 path: 173 path: 174 174 175 :: 175 :: 176 176 177 $ bpftool iter pin ./bpf_iter_ipv6_route.o 177 $ bpftool iter pin ./bpf_iter_ipv6_route.o /sys/fs/bpf/my_route 178 178 179 And then print out the results using the follo 179 And then print out the results using the following command: 180 180 181 :: 181 :: 182 182 183 $ cat /sys/fs/bpf/my_route 183 $ cat /sys/fs/bpf/my_route 184 184 185 185 186 ---------------------------------------------- 186 ------------------------------------------------------- 187 Implement Kernel Support for BPF Iterator Prog 187 Implement Kernel Support for BPF Iterator Program Types 188 ---------------------------------------------- 188 ------------------------------------------------------- 189 189 190 To implement a BPF iterator in the kernel, the 190 To implement a BPF iterator in the kernel, the developer must make a one-time 191 change to the following key data structure def 191 change to the following key data structure defined in the `bpf.h 192 <https://git.kernel.org/pub/scm/linux/kernel/g 192 <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/include/linux/bpf.h>`_ 193 file. 193 file. 194 194 195 :: 195 :: 196 196 197 struct bpf_iter_reg { 197 struct bpf_iter_reg { 198 const char *target; 198 const char *target; 199 bpf_iter_attach_target_t attach_ta 199 bpf_iter_attach_target_t attach_target; 200 bpf_iter_detach_target_t detach_ta 200 bpf_iter_detach_target_t detach_target; 201 bpf_iter_show_fdinfo_t show_fdinfo 201 bpf_iter_show_fdinfo_t show_fdinfo; 202 bpf_iter_fill_link_info_t fill_lin 202 bpf_iter_fill_link_info_t fill_link_info; 203 bpf_iter_get_func_proto_t get_func 203 bpf_iter_get_func_proto_t get_func_proto; 204 u32 ctx_arg_info_size; 204 u32 ctx_arg_info_size; 205 u32 feature; 205 u32 feature; 206 struct bpf_ctx_arg_aux ctx_arg_inf 206 struct bpf_ctx_arg_aux ctx_arg_info[BPF_ITER_CTX_ARG_MAX]; 207 const struct bpf_iter_seq_info *se 207 const struct bpf_iter_seq_info *seq_info; 208 }; 208 }; 209 209 210 After filling the data structure fields, call 210 After filling the data structure fields, call ``bpf_iter_reg_target()`` to 211 register the iterator to the main BPF iterator 211 register the iterator to the main BPF iterator subsystem. 212 212 213 The following is the breakdown for each field 213 The following is the breakdown for each field in struct ``bpf_iter_reg``. 214 214 215 .. list-table:: 215 .. list-table:: 216 :widths: 25 50 216 :widths: 25 50 217 :header-rows: 1 217 :header-rows: 1 218 218 219 * - Fields 219 * - Fields 220 - Description 220 - Description 221 * - target 221 * - target 222 - Specifies the name of the BPF iterator. 222 - Specifies the name of the BPF iterator. For example: ``bpf_map``, 223 ``bpf_map_elem``. The name should be di 223 ``bpf_map_elem``. The name should be different from other ``bpf_iter`` target names in the kernel. 224 * - attach_target and detach_target 224 * - attach_target and detach_target 225 - Allows for target specific ``link_creat 225 - Allows for target specific ``link_create`` action since some targets 226 may need special processing. Called dur 226 may need special processing. Called during the user space link_create stage. 227 * - show_fdinfo and fill_link_info 227 * - show_fdinfo and fill_link_info 228 - Called to fill target specific informat 228 - Called to fill target specific information when user tries to get link 229 info associated with the iterator. 229 info associated with the iterator. 230 * - get_func_proto 230 * - get_func_proto 231 - Permits a BPF iterator to access BPF he 231 - Permits a BPF iterator to access BPF helpers specific to the iterator. 232 * - ctx_arg_info_size and ctx_arg_info 232 * - ctx_arg_info_size and ctx_arg_info 233 - Specifies the verifier states for BPF p 233 - Specifies the verifier states for BPF program arguments associated with 234 the bpf iterator. 234 the bpf iterator. 235 * - feature 235 * - feature 236 - Specifies certain action requests in th 236 - Specifies certain action requests in the kernel BPF iterator 237 infrastructure. Currently, only BPF_ITE 237 infrastructure. Currently, only BPF_ITER_RESCHED is supported. This means 238 that the kernel function cond_resched() 238 that the kernel function cond_resched() is called to avoid other kernel 239 subsystem (e.g., rcu) misbehaving. 239 subsystem (e.g., rcu) misbehaving. 240 * - seq_info 240 * - seq_info 241 - Specifies the set of seq operations for 241 - Specifies the set of seq operations for the BPF iterator and helpers to 242 initialize/free the private data for th 242 initialize/free the private data for the corresponding ``seq_file``. 243 243 244 `Click here 244 `Click here 245 <https://lore.kernel.org/bpf/20210212183107.509 245 <https://lore.kernel.org/bpf/20210212183107.50963-2-songliubraving@fb.com/">https://lore.kernel.org/bpf/20210212183107.50963-2-songliubraving@fb.com/>`_ 246 to see an implementation of the ``task_vma`` B 246 to see an implementation of the ``task_vma`` BPF iterator in the kernel. 247 247 248 --------------------------------- 248 --------------------------------- 249 Parameterizing BPF Task Iterators 249 Parameterizing BPF Task Iterators 250 --------------------------------- 250 --------------------------------- 251 251 252 By default, BPF iterators walk through all the 252 By default, BPF iterators walk through all the objects of the specified types 253 (processes, cgroups, maps, etc.) across the en 253 (processes, cgroups, maps, etc.) across the entire system to read relevant 254 kernel data. But often, there are cases where 254 kernel data. But often, there are cases where we only care about a much smaller 255 subset of iterable kernel objects, such as onl 255 subset of iterable kernel objects, such as only iterating tasks within a 256 specific process. Therefore, BPF iterator prog 256 specific process. Therefore, BPF iterator programs support filtering out objects 257 from iteration by allowing user space to confi 257 from iteration by allowing user space to configure the iterator program when it 258 is attached. 258 is attached. 259 259 260 -------------------------- 260 -------------------------- 261 BPF Task Iterator Program 261 BPF Task Iterator Program 262 -------------------------- 262 -------------------------- 263 263 264 The following code is a BPF iterator program t 264 The following code is a BPF iterator program to print files and task information 265 through the ``seq_file`` of the iterator. It i 265 through the ``seq_file`` of the iterator. It is a standard BPF iterator program 266 that visits every file of an iterator. We will 266 that visits every file of an iterator. We will use this BPF program in our 267 example later. 267 example later. 268 268 269 :: 269 :: 270 270 271 #include <vmlinux.h> 271 #include <vmlinux.h> 272 #include <bpf/bpf_helpers.h> 272 #include <bpf/bpf_helpers.h> 273 273 274 char _license[] SEC("license") = "GPL"; 274 char _license[] SEC("license") = "GPL"; 275 275 276 SEC("iter/task_file") 276 SEC("iter/task_file") 277 int dump_task_file(struct bpf_iter__task_fil 277 int dump_task_file(struct bpf_iter__task_file *ctx) 278 { 278 { 279 struct seq_file *seq = ctx->meta->seq; 279 struct seq_file *seq = ctx->meta->seq; 280 struct task_struct *task = ctx->task; 280 struct task_struct *task = ctx->task; 281 struct file *file = ctx->file; 281 struct file *file = ctx->file; 282 __u32 fd = ctx->fd; 282 __u32 fd = ctx->fd; 283 if (task == NULL || file == NULL) 283 if (task == NULL || file == NULL) 284 return 0; 284 return 0; 285 if (ctx->meta->seq_num == 0) { 285 if (ctx->meta->seq_num == 0) { 286 BPF_SEQ_PRINTF(seq, " tgid 286 BPF_SEQ_PRINTF(seq, " tgid pid fd file\n"); 287 } 287 } 288 BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n 288 BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd, 289 (long)file->f_op); 289 (long)file->f_op); 290 return 0; 290 return 0; 291 } 291 } 292 292 293 ---------------------------------------- 293 ---------------------------------------- 294 Creating a File Iterator with Parameters 294 Creating a File Iterator with Parameters 295 ---------------------------------------- 295 ---------------------------------------- 296 296 297 Now, let us look at how to create an iterator 297 Now, let us look at how to create an iterator that includes only files of a 298 process. 298 process. 299 299 300 First, fill the ``bpf_iter_attach_opts`` stru 300 First, fill the ``bpf_iter_attach_opts`` struct as shown below: 301 301 302 :: 302 :: 303 303 304 LIBBPF_OPTS(bpf_iter_attach_opts, opts); 304 LIBBPF_OPTS(bpf_iter_attach_opts, opts); 305 union bpf_iter_link_info linfo; 305 union bpf_iter_link_info linfo; 306 memset(&linfo, 0, sizeof(linfo)); 306 memset(&linfo, 0, sizeof(linfo)); 307 linfo.task.pid = getpid(); 307 linfo.task.pid = getpid(); 308 opts.link_info = &linfo; 308 opts.link_info = &linfo; 309 opts.link_info_len = sizeof(linfo); 309 opts.link_info_len = sizeof(linfo); 310 310 311 ``linfo.task.pid``, if it is non-zero, directs 311 ``linfo.task.pid``, if it is non-zero, directs the kernel to create an iterator 312 that only includes opened files for the proces 312 that only includes opened files for the process with the specified ``pid``. In 313 this example, we will only be iterating files 313 this example, we will only be iterating files for our process. If 314 ``linfo.task.pid`` is zero, the iterator will 314 ``linfo.task.pid`` is zero, the iterator will visit every opened file of every 315 process. Similarly, ``linfo.task.tid`` directs 315 process. Similarly, ``linfo.task.tid`` directs the kernel to create an iterator 316 that visits opened files of a specific thread, 316 that visits opened files of a specific thread, not a process. In this example, 317 ``linfo.task.tid`` is different from ``linfo.t 317 ``linfo.task.tid`` is different from ``linfo.task.pid`` only if the thread has a 318 separate file descriptor table. In most circum 318 separate file descriptor table. In most circumstances, all process threads share 319 a single file descriptor table. 319 a single file descriptor table. 320 320 321 Now, in the userspace program, pass the pointe 321 Now, in the userspace program, pass the pointer of struct to the 322 ``bpf_program__attach_iter()``. 322 ``bpf_program__attach_iter()``. 323 323 324 :: 324 :: 325 325 326 link = bpf_program__attach_iter(prog, &opts) 326 link = bpf_program__attach_iter(prog, &opts); iter_fd = 327 bpf_iter_create(bpf_link__fd(link)); 327 bpf_iter_create(bpf_link__fd(link)); 328 328 329 If both *tid* and *pid* are zero, an iterator 329 If both *tid* and *pid* are zero, an iterator created from this struct 330 ``bpf_iter_attach_opts`` will include every op 330 ``bpf_iter_attach_opts`` will include every opened file of every task in the 331 system (in the namespace, actually.) It is the 331 system (in the namespace, actually.) It is the same as passing a NULL as the 332 second argument to ``bpf_program__attach_iter( 332 second argument to ``bpf_program__attach_iter()``. 333 333 334 The whole program looks like the following cod 334 The whole program looks like the following code: 335 335 336 :: 336 :: 337 337 338 #include <stdio.h> 338 #include <stdio.h> 339 #include <unistd.h> 339 #include <unistd.h> 340 #include <bpf/bpf.h> 340 #include <bpf/bpf.h> 341 #include <bpf/libbpf.h> 341 #include <bpf/libbpf.h> 342 #include "bpf_iter_task_ex.skel.h" 342 #include "bpf_iter_task_ex.skel.h" 343 343 344 static int do_read_opts(struct bpf_program * 344 static int do_read_opts(struct bpf_program *prog, struct bpf_iter_attach_opts *opts) 345 { 345 { 346 struct bpf_link *link; 346 struct bpf_link *link; 347 char buf[16] = {}; 347 char buf[16] = {}; 348 int iter_fd = -1, len; 348 int iter_fd = -1, len; 349 int ret = 0; 349 int ret = 0; 350 350 351 link = bpf_program__attach_iter(prog, 351 link = bpf_program__attach_iter(prog, opts); 352 if (!link) { 352 if (!link) { 353 fprintf(stderr, "bpf_program__ 353 fprintf(stderr, "bpf_program__attach_iter() fails\n"); 354 return -1; 354 return -1; 355 } 355 } 356 iter_fd = bpf_iter_create(bpf_link__fd 356 iter_fd = bpf_iter_create(bpf_link__fd(link)); 357 if (iter_fd < 0) { 357 if (iter_fd < 0) { 358 fprintf(stderr, "bpf_iter_crea 358 fprintf(stderr, "bpf_iter_create() fails\n"); 359 ret = -1; 359 ret = -1; 360 goto free_link; 360 goto free_link; 361 } 361 } 362 /* not check contents, but ensure read 362 /* not check contents, but ensure read() ends without error */ 363 while ((len = read(iter_fd, buf, sizeo 363 while ((len = read(iter_fd, buf, sizeof(buf) - 1)) > 0) { 364 buf[len] = 0; 364 buf[len] = 0; 365 printf("%s", buf); 365 printf("%s", buf); 366 } 366 } 367 printf("\n"); 367 printf("\n"); 368 free_link: 368 free_link: 369 if (iter_fd >= 0) 369 if (iter_fd >= 0) 370 close(iter_fd); 370 close(iter_fd); 371 bpf_link__destroy(link); 371 bpf_link__destroy(link); 372 return 0; 372 return 0; 373 } 373 } 374 374 375 static void test_task_file(void) 375 static void test_task_file(void) 376 { 376 { 377 LIBBPF_OPTS(bpf_iter_attach_opts, opts 377 LIBBPF_OPTS(bpf_iter_attach_opts, opts); 378 struct bpf_iter_task_ex *skel; 378 struct bpf_iter_task_ex *skel; 379 union bpf_iter_link_info linfo; 379 union bpf_iter_link_info linfo; 380 skel = bpf_iter_task_ex__open_and_load 380 skel = bpf_iter_task_ex__open_and_load(); 381 if (skel == NULL) 381 if (skel == NULL) 382 return; 382 return; 383 memset(&linfo, 0, sizeof(linfo)); 383 memset(&linfo, 0, sizeof(linfo)); 384 linfo.task.pid = getpid(); 384 linfo.task.pid = getpid(); 385 opts.link_info = &linfo; 385 opts.link_info = &linfo; 386 opts.link_info_len = sizeof(linfo); 386 opts.link_info_len = sizeof(linfo); 387 printf("PID %d\n", getpid()); 387 printf("PID %d\n", getpid()); 388 do_read_opts(skel->progs.dump_task_fil 388 do_read_opts(skel->progs.dump_task_file, &opts); 389 bpf_iter_task_ex__destroy(skel); 389 bpf_iter_task_ex__destroy(skel); 390 } 390 } 391 391 392 int main(int argc, const char * const * argv 392 int main(int argc, const char * const * argv) 393 { 393 { 394 test_task_file(); 394 test_task_file(); 395 return 0; 395 return 0; 396 } 396 } 397 397 398 The following lines are the output of the prog 398 The following lines are the output of the program. 399 :: 399 :: 400 400 401 PID 1859 401 PID 1859 402 402 403 tgid pid fd file 403 tgid pid fd file 404 1859 1859 0 ffffffff82270aa0 404 1859 1859 0 ffffffff82270aa0 405 1859 1859 1 ffffffff82270aa0 405 1859 1859 1 ffffffff82270aa0 406 1859 1859 2 ffffffff82270aa0 406 1859 1859 2 ffffffff82270aa0 407 1859 1859 3 ffffffff82272980 407 1859 1859 3 ffffffff82272980 408 1859 1859 4 ffffffff8225e120 408 1859 1859 4 ffffffff8225e120 409 1859 1859 5 ffffffff82255120 409 1859 1859 5 ffffffff82255120 410 1859 1859 6 ffffffff82254f00 410 1859 1859 6 ffffffff82254f00 411 1859 1859 7 ffffffff82254d80 411 1859 1859 7 ffffffff82254d80 412 1859 1859 8 ffffffff8225abe0 412 1859 1859 8 ffffffff8225abe0 413 413 414 ------------------ 414 ------------------ 415 Without Parameters 415 Without Parameters 416 ------------------ 416 ------------------ 417 417 418 Let us look at how a BPF iterator without para 418 Let us look at how a BPF iterator without parameters skips files of other 419 processes in the system. In this case, the BPF 419 processes in the system. In this case, the BPF program has to check the pid or 420 the tid of tasks, or it will receive every ope 420 the tid of tasks, or it will receive every opened file in the system (in the 421 current *pid* namespace, actually). So, we usu 421 current *pid* namespace, actually). So, we usually add a global variable in the 422 BPF program to pass a *pid* to the BPF program 422 BPF program to pass a *pid* to the BPF program. 423 423 424 The BPF program would look like the following 424 The BPF program would look like the following block. 425 425 426 :: 426 :: 427 427 428 ...... 428 ...... 429 int target_pid = 0; 429 int target_pid = 0; 430 430 431 SEC("iter/task_file") 431 SEC("iter/task_file") 432 int dump_task_file(struct bpf_iter__task_f 432 int dump_task_file(struct bpf_iter__task_file *ctx) 433 { 433 { 434 ...... 434 ...... 435 if (task->tgid != target_pid) /* Che 435 if (task->tgid != target_pid) /* Check task->pid instead to check thread IDs */ 436 return 0; 436 return 0; 437 BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx 437 BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd, 438 (long)file->f_op); 438 (long)file->f_op); 439 return 0; 439 return 0; 440 } 440 } 441 441 442 The user space program would look like the fol 442 The user space program would look like the following block: 443 443 444 :: 444 :: 445 445 446 ...... 446 ...... 447 static void test_task_file(void) 447 static void test_task_file(void) 448 { 448 { 449 ...... 449 ...... 450 skel = bpf_iter_task_ex__open_and_lo 450 skel = bpf_iter_task_ex__open_and_load(); 451 if (skel == NULL) 451 if (skel == NULL) 452 return; 452 return; 453 skel->bss->target_pid = getpid(); /* 453 skel->bss->target_pid = getpid(); /* process ID. For thread id, use gettid() */ 454 memset(&linfo, 0, sizeof(linfo)); 454 memset(&linfo, 0, sizeof(linfo)); 455 linfo.task.pid = getpid(); 455 linfo.task.pid = getpid(); 456 opts.link_info = &linfo; 456 opts.link_info = &linfo; 457 opts.link_info_len = sizeof(linfo); 457 opts.link_info_len = sizeof(linfo); 458 ...... 458 ...... 459 } 459 } 460 460 461 ``target_pid`` is a global variable in the BPF 461 ``target_pid`` is a global variable in the BPF program. The user space program 462 should initialize the variable with a process 462 should initialize the variable with a process ID to skip opened files of other 463 processes in the BPF program. When you paramet 463 processes in the BPF program. When you parametrize a BPF iterator, the iterator 464 calls the BPF program fewer times which can sa 464 calls the BPF program fewer times which can save significant resources. 465 465 466 --------------------------- 466 --------------------------- 467 Parametrizing VMA Iterators 467 Parametrizing VMA Iterators 468 --------------------------- 468 --------------------------- 469 469 470 By default, a BPF VMA iterator includes every 470 By default, a BPF VMA iterator includes every VMA in every process. However, 471 you can still specify a process or a thread to 471 you can still specify a process or a thread to include only its VMAs. Unlike 472 files, a thread can not have a separate addres 472 files, a thread can not have a separate address space (since Linux 2.6.0-test6). 473 Here, using *tid* makes no difference from usi 473 Here, using *tid* makes no difference from using *pid*. 474 474 475 ---------------------------- 475 ---------------------------- 476 Parametrizing Task Iterators 476 Parametrizing Task Iterators 477 ---------------------------- 477 ---------------------------- 478 478 479 A BPF task iterator with *pid* includes all ta 479 A BPF task iterator with *pid* includes all tasks (threads) of a process. The 480 BPF program receives these tasks one after ano 480 BPF program receives these tasks one after another. You can specify a BPF task 481 iterator with *tid* parameter to include only 481 iterator with *tid* parameter to include only the tasks that match the given 482 *tid*. 482 *tid*.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.