1 .. SPDX-License-Identifier: GPL-2.0 2 3 ============================ 4 BPF_PROG_TYPE_CGROUP_SOCKOPT 5 ============================ 6 7 ``BPF_PROG_TYPE_CGROUP_SOCKOPT`` program type can be attached to two 8 cgroup hooks: 9 10 * ``BPF_CGROUP_GETSOCKOPT`` - called every time process executes ``getsockopt`` 11 system call. 12 * ``BPF_CGROUP_SETSOCKOPT`` - called every time process executes ``setsockopt`` 13 system call. 14 15 The context (``struct bpf_sockopt``) has associated socket (``sk``) and 16 all input arguments: ``level``, ``optname``, ``optval`` and ``optlen``. 17 18 BPF_CGROUP_SETSOCKOPT 19 ===================== 20 21 ``BPF_CGROUP_SETSOCKOPT`` is triggered *before* the kernel handling of 22 sockopt and it has writable context: it can modify the supplied arguments 23 before passing them down to the kernel. This hook has access to the cgroup 24 and socket local storage. 25 26 If BPF program sets ``optlen`` to -1, the control will be returned 27 back to the userspace after all other BPF programs in the cgroup 28 chain finish (i.e. kernel ``setsockopt`` handling will *not* be executed). 29 30 Note, that ``optlen`` can not be increased beyond the user-supplied 31 value. It can only be decreased or set to -1. Any other value will 32 trigger ``EFAULT``. 33 34 Return Type 35 ----------- 36 37 * ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace. 38 * ``1`` - success, continue with next BPF program in the cgroup chain. 39 40 BPF_CGROUP_GETSOCKOPT 41 ===================== 42 43 ``BPF_CGROUP_GETSOCKOPT`` is triggered *after* the kernel handing of 44 sockopt. The BPF hook can observe ``optval``, ``optlen`` and ``retval`` 45 if it's interested in whatever kernel has returned. BPF hook can override 46 the values above, adjust ``optlen`` and reset ``retval`` to 0. If ``optlen`` 47 has been increased above initial ``getsockopt`` value (i.e. userspace 48 buffer is too small), ``EFAULT`` is returned. 49 50 This hook has access to the cgroup and socket local storage. 51 52 Note, that the only acceptable value to set to ``retval`` is 0 and the 53 original value that the kernel returned. Any other value will trigger 54 ``EFAULT``. 55 56 Return Type 57 ----------- 58 59 * ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace. 60 * ``1`` - success: copy ``optval`` and ``optlen`` to userspace, return 61 ``retval`` from the syscall (note that this can be overwritten by 62 the BPF program from the parent cgroup). 63 64 Cgroup Inheritance 65 ================== 66 67 Suppose, there is the following cgroup hierarchy where each cgroup 68 has ``BPF_CGROUP_GETSOCKOPT`` attached at each level with 69 ``BPF_F_ALLOW_MULTI`` flag:: 70 71 A (root, parent) 72 \ 73 B (child) 74 75 When the application calls ``getsockopt`` syscall from the cgroup B, 76 the programs are executed from the bottom up: B, A. First program 77 (B) sees the result of kernel's ``getsockopt``. It can optionally 78 adjust ``optval``, ``optlen`` and reset ``retval`` to 0. After that 79 control will be passed to the second (A) program which will see the 80 same context as B including any potential modifications. 81 82 Same for ``BPF_CGROUP_SETSOCKOPT``: if the program is attached to 83 A and B, the trigger order is B, then A. If B does any changes 84 to the input arguments (``level``, ``optname``, ``optval``, ``optlen``), 85 then the next program in the chain (A) will see those changes, 86 *not* the original input ``setsockopt`` arguments. The potentially 87 modified values will be then passed down to the kernel. 88 89 Large optval 90 ============ 91 When the ``optval`` is greater than the ``PAGE_SIZE``, the BPF program 92 can access only the first ``PAGE_SIZE`` of that data. So it has to options: 93 94 * Set ``optlen`` to zero, which indicates that the kernel should 95 use the original buffer from the userspace. Any modifications 96 done by the BPF program to the ``optval`` are ignored. 97 * Set ``optlen`` to the value less than ``PAGE_SIZE``, which 98 indicates that the kernel should use BPF's trimmed ``optval``. 99 100 When the BPF program returns with the ``optlen`` greater than 101 ``PAGE_SIZE``, the userspace will receive original kernel 102 buffers without any modifications that the BPF program might have 103 applied. 104 105 Example 106 ======= 107 108 Recommended way to handle BPF programs is as follows: 109 110 .. code-block:: c 111 112 SEC("cgroup/getsockopt") 113 int getsockopt(struct bpf_sockopt *ctx) 114 { 115 /* Custom socket option. */ 116 if (ctx->level == MY_SOL && ctx->optname == MY_OPTNAME) { 117 ctx->retval = 0; 118 optval[0] = ...; 119 ctx->optlen = 1; 120 return 1; 121 } 122 123 /* Modify kernel's socket option. */ 124 if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) { 125 ctx->retval = 0; 126 optval[0] = ...; 127 ctx->optlen = 1; 128 return 1; 129 } 130 131 /* optval larger than PAGE_SIZE use kernel's buffer. */ 132 if (ctx->optlen > PAGE_SIZE) 133 ctx->optlen = 0; 134 135 return 1; 136 } 137 138 SEC("cgroup/setsockopt") 139 int setsockopt(struct bpf_sockopt *ctx) 140 { 141 /* Custom socket option. */ 142 if (ctx->level == MY_SOL && ctx->optname == MY_OPTNAME) { 143 /* do something */ 144 ctx->optlen = -1; 145 return 1; 146 } 147 148 /* Modify kernel's socket option. */ 149 if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) { 150 optval[0] = ...; 151 return 1; 152 } 153 154 /* optval larger than PAGE_SIZE use kernel's buffer. */ 155 if (ctx->optlen > PAGE_SIZE) 156 ctx->optlen = 0; 157 158 return 1; 159 } 160 161 See ``tools/testing/selftests/bpf/progs/sockopt_sk.c`` for an example 162 of BPF program that handles socket options.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.