1 .. SPDX-License-Identifier: GPL-2.0 1 .. SPDX-License-Identifier: GPL-2.0 2 2 3 =================================== 3 =================================== 4 Running BPF programs from userspace 4 Running BPF programs from userspace 5 =================================== 5 =================================== 6 6 7 This document describes the ``BPF_PROG_RUN`` f 7 This document describes the ``BPF_PROG_RUN`` facility for running BPF programs 8 from userspace. 8 from userspace. 9 9 10 .. contents:: 10 .. contents:: 11 :local: 11 :local: 12 :depth: 2 12 :depth: 2 13 13 14 14 15 Overview 15 Overview 16 -------- 16 -------- 17 17 18 The ``BPF_PROG_RUN`` command can be used throu 18 The ``BPF_PROG_RUN`` command can be used through the ``bpf()`` syscall to 19 execute a BPF program in the kernel and return 19 execute a BPF program in the kernel and return the results to userspace. This 20 can be used to unit test BPF programs against 20 can be used to unit test BPF programs against user-supplied context objects, and 21 as way to explicitly execute programs in the k 21 as way to explicitly execute programs in the kernel for their side effects. The 22 command was previously named ``BPF_PROG_TEST_R 22 command was previously named ``BPF_PROG_TEST_RUN``, and both constants continue 23 to be defined in the UAPI header, aliased to t 23 to be defined in the UAPI header, aliased to the same value. 24 24 25 The ``BPF_PROG_RUN`` command can be used to ex 25 The ``BPF_PROG_RUN`` command can be used to execute BPF programs of the 26 following types: 26 following types: 27 27 28 - ``BPF_PROG_TYPE_SOCKET_FILTER`` 28 - ``BPF_PROG_TYPE_SOCKET_FILTER`` 29 - ``BPF_PROG_TYPE_SCHED_CLS`` 29 - ``BPF_PROG_TYPE_SCHED_CLS`` 30 - ``BPF_PROG_TYPE_SCHED_ACT`` 30 - ``BPF_PROG_TYPE_SCHED_ACT`` 31 - ``BPF_PROG_TYPE_XDP`` 31 - ``BPF_PROG_TYPE_XDP`` 32 - ``BPF_PROG_TYPE_SK_LOOKUP`` 32 - ``BPF_PROG_TYPE_SK_LOOKUP`` 33 - ``BPF_PROG_TYPE_CGROUP_SKB`` 33 - ``BPF_PROG_TYPE_CGROUP_SKB`` 34 - ``BPF_PROG_TYPE_LWT_IN`` 34 - ``BPF_PROG_TYPE_LWT_IN`` 35 - ``BPF_PROG_TYPE_LWT_OUT`` 35 - ``BPF_PROG_TYPE_LWT_OUT`` 36 - ``BPF_PROG_TYPE_LWT_XMIT`` 36 - ``BPF_PROG_TYPE_LWT_XMIT`` 37 - ``BPF_PROG_TYPE_LWT_SEG6LOCAL`` 37 - ``BPF_PROG_TYPE_LWT_SEG6LOCAL`` 38 - ``BPF_PROG_TYPE_FLOW_DISSECTOR`` 38 - ``BPF_PROG_TYPE_FLOW_DISSECTOR`` 39 - ``BPF_PROG_TYPE_STRUCT_OPS`` 39 - ``BPF_PROG_TYPE_STRUCT_OPS`` 40 - ``BPF_PROG_TYPE_RAW_TRACEPOINT`` 40 - ``BPF_PROG_TYPE_RAW_TRACEPOINT`` 41 - ``BPF_PROG_TYPE_SYSCALL`` 41 - ``BPF_PROG_TYPE_SYSCALL`` 42 42 43 When using the ``BPF_PROG_RUN`` command, users 43 When using the ``BPF_PROG_RUN`` command, userspace supplies an input context 44 object and (for program types operating on net 44 object and (for program types operating on network packets) a buffer containing 45 the packet data that the BPF program will oper 45 the packet data that the BPF program will operate on. The kernel will then 46 execute the program and return the results to 46 execute the program and return the results to userspace. Note that programs will 47 not have any side effects while being run in t 47 not have any side effects while being run in this mode; in particular, packets 48 will not actually be redirected or dropped, th 48 will not actually be redirected or dropped, the program return code will just be 49 returned to userspace. A separate mode for liv 49 returned to userspace. A separate mode for live execution of XDP programs is 50 provided, documented separately below. 50 provided, documented separately below. 51 51 52 Running XDP programs in "live frame mode" 52 Running XDP programs in "live frame mode" 53 ----------------------------------------- 53 ----------------------------------------- 54 54 55 The ``BPF_PROG_RUN`` command has a separate mo 55 The ``BPF_PROG_RUN`` command has a separate mode for running live XDP programs, 56 which can be used to execute XDP programs in a 56 which can be used to execute XDP programs in a way where packets will actually 57 be processed by the kernel after the execution 57 be processed by the kernel after the execution of the XDP program as if they 58 arrived on a physical interface. This mode is 58 arrived on a physical interface. This mode is activated by setting the 59 ``BPF_F_TEST_XDP_LIVE_FRAMES`` flag when suppl 59 ``BPF_F_TEST_XDP_LIVE_FRAMES`` flag when supplying an XDP program to 60 ``BPF_PROG_RUN``. 60 ``BPF_PROG_RUN``. 61 61 62 The live packet mode is optimised for high per 62 The live packet mode is optimised for high performance execution of the supplied 63 XDP program many times (suitable for, e.g., ru 63 XDP program many times (suitable for, e.g., running as a traffic generator), 64 which means the semantics are not quite as str 64 which means the semantics are not quite as straight-forward as the regular test 65 run mode. Specifically: 65 run mode. Specifically: 66 66 67 - When executing an XDP program in live frame 67 - When executing an XDP program in live frame mode, the result of the execution 68 will not be returned to userspace; instead, 68 will not be returned to userspace; instead, the kernel will perform the 69 operation indicated by the program's return 69 operation indicated by the program's return code (drop the packet, redirect 70 it, etc). For this reason, setting the ``dat 70 it, etc). For this reason, setting the ``data_out`` or ``ctx_out`` attributes 71 in the syscall parameters when running in th 71 in the syscall parameters when running in this mode will be rejected. In 72 addition, not all failures will be reported 72 addition, not all failures will be reported back to userspace directly; 73 specifically, only fatal errors in setup or 73 specifically, only fatal errors in setup or during execution (like memory 74 allocation errors) will halt execution and r 74 allocation errors) will halt execution and return an error. If an error occurs 75 in packet processing, like a failure to redi 75 in packet processing, like a failure to redirect to a given interface, 76 execution will continue with the next repeti 76 execution will continue with the next repetition; these errors can be detected 77 via the same trace points as for regular XDP 77 via the same trace points as for regular XDP programs. 78 78 79 - Userspace can supply an ifindex as part of t 79 - Userspace can supply an ifindex as part of the context object, just like in 80 the regular (non-live) mode. The XDP program 80 the regular (non-live) mode. The XDP program will be executed as though the 81 packet arrived on this interface; i.e., the 81 packet arrived on this interface; i.e., the ``ingress_ifindex`` of the context 82 object will point to that interface. Further 82 object will point to that interface. Furthermore, if the XDP program returns 83 ``XDP_PASS``, the packet will be injected in 83 ``XDP_PASS``, the packet will be injected into the kernel networking stack as 84 though it arrived on that ifindex, and if it 84 though it arrived on that ifindex, and if it returns ``XDP_TX``, the packet 85 will be transmitted *out* of that same inter 85 will be transmitted *out* of that same interface. Do note, though, that 86 because the program execution is not happeni 86 because the program execution is not happening in driver context, an 87 ``XDP_TX`` is actually turned into the same 87 ``XDP_TX`` is actually turned into the same action as an ``XDP_REDIRECT`` to 88 that same interface (i.e., it will only work 88 that same interface (i.e., it will only work if the driver has support for the 89 ``ndo_xdp_xmit`` driver op). 89 ``ndo_xdp_xmit`` driver op). 90 90 91 - When running the program with multiple repet 91 - When running the program with multiple repetitions, the execution will happen 92 in batches. The batch size defaults to 64 pa 92 in batches. The batch size defaults to 64 packets (which is same as the 93 maximum NAPI receive batch size), but can be 93 maximum NAPI receive batch size), but can be specified by userspace through 94 the ``batch_size`` parameter, up to a maximu 94 the ``batch_size`` parameter, up to a maximum of 256 packets. For each batch, 95 the kernel executes the XDP program repeated 95 the kernel executes the XDP program repeatedly, each invocation getting a 96 separate copy of the packet data. For each r 96 separate copy of the packet data. For each repetition, if the program drops 97 the packet, the data page is immediately rec 97 the packet, the data page is immediately recycled (see below). Otherwise, the 98 packet is buffered until the end of the batc 98 packet is buffered until the end of the batch, at which point all packets 99 buffered this way during the batch are trans 99 buffered this way during the batch are transmitted at once. 100 100 101 - When setting up the test run, the kernel wil 101 - When setting up the test run, the kernel will initialise a pool of memory 102 pages of the same size as the batch size. Ea 102 pages of the same size as the batch size. Each memory page will be initialised 103 with the initial packet data supplied by use 103 with the initial packet data supplied by userspace at ``BPF_PROG_RUN`` 104 invocation. When possible, the pages will be 104 invocation. When possible, the pages will be recycled on future program 105 invocations, to improve performance. Pages w 105 invocations, to improve performance. Pages will generally be recycled a full 106 batch at a time, except when a packet is dro 106 batch at a time, except when a packet is dropped (by return code or because 107 of, say, a redirection error), in which case 107 of, say, a redirection error), in which case that page will be recycled 108 immediately. If a packet ends up being passe 108 immediately. If a packet ends up being passed to the regular networking stack 109 (because the XDP program returns ``XDP_PASS` 109 (because the XDP program returns ``XDP_PASS``, or because it ends up being 110 redirected to an interface that injects it i 110 redirected to an interface that injects it into the stack), the page will be 111 released and a new one will be allocated whe 111 released and a new one will be allocated when the pool is empty. 112 112 113 When recycling, the page content is not rewr 113 When recycling, the page content is not rewritten; only the packet boundary 114 pointers (``data``, ``data_end`` and ``data_ 114 pointers (``data``, ``data_end`` and ``data_meta``) in the context object will 115 be reset to the original values. This means 115 be reset to the original values. This means that if a program rewrites the 116 packet contents, it has to be prepared to se 116 packet contents, it has to be prepared to see either the original content or 117 the modified version on subsequent invocatio 117 the modified version on subsequent invocations.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.