~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/bpf/prog_cgroup_sockopt.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 .. SPDX-License-Identifier: GPL-2.0
  2 
  3 ============================
  4 BPF_PROG_TYPE_CGROUP_SOCKOPT
  5 ============================
  6 
  7 ``BPF_PROG_TYPE_CGROUP_SOCKOPT`` program type can be attached to two
  8 cgroup hooks:
  9 
 10 * ``BPF_CGROUP_GETSOCKOPT`` - called every time process executes ``getsockopt``
 11   system call.
 12 * ``BPF_CGROUP_SETSOCKOPT`` - called every time process executes ``setsockopt``
 13   system call.
 14 
 15 The context (``struct bpf_sockopt``) has associated socket (``sk``) and
 16 all input arguments: ``level``, ``optname``, ``optval`` and ``optlen``.
 17 
 18 BPF_CGROUP_SETSOCKOPT
 19 =====================
 20 
 21 ``BPF_CGROUP_SETSOCKOPT`` is triggered *before* the kernel handling of
 22 sockopt and it has writable context: it can modify the supplied arguments
 23 before passing them down to the kernel. This hook has access to the cgroup
 24 and socket local storage.
 25 
 26 If BPF program sets ``optlen`` to -1, the control will be returned
 27 back to the userspace after all other BPF programs in the cgroup
 28 chain finish (i.e. kernel ``setsockopt`` handling will *not* be executed).
 29 
 30 Note, that ``optlen`` can not be increased beyond the user-supplied
 31 value. It can only be decreased or set to -1. Any other value will
 32 trigger ``EFAULT``.
 33 
 34 Return Type
 35 -----------
 36 
 37 * ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace.
 38 * ``1`` - success, continue with next BPF program in the cgroup chain.
 39 
 40 BPF_CGROUP_GETSOCKOPT
 41 =====================
 42 
 43 ``BPF_CGROUP_GETSOCKOPT`` is triggered *after* the kernel handing of
 44 sockopt. The BPF hook can observe ``optval``, ``optlen`` and ``retval``
 45 if it's interested in whatever kernel has returned. BPF hook can override
 46 the values above, adjust ``optlen`` and reset ``retval`` to 0. If ``optlen``
 47 has been increased above initial ``getsockopt`` value (i.e. userspace
 48 buffer is too small), ``EFAULT`` is returned.
 49 
 50 This hook has access to the cgroup and socket local storage.
 51 
 52 Note, that the only acceptable value to set to ``retval`` is 0 and the
 53 original value that the kernel returned. Any other value will trigger
 54 ``EFAULT``.
 55 
 56 Return Type
 57 -----------
 58 
 59 * ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace.
 60 * ``1`` - success: copy ``optval`` and ``optlen`` to userspace, return
 61   ``retval`` from the syscall (note that this can be overwritten by
 62   the BPF program from the parent cgroup).
 63 
 64 Cgroup Inheritance
 65 ==================
 66 
 67 Suppose, there is the following cgroup hierarchy where each cgroup
 68 has ``BPF_CGROUP_GETSOCKOPT`` attached at each level with
 69 ``BPF_F_ALLOW_MULTI`` flag::
 70 
 71   A (root, parent)
 72    \
 73     B (child)
 74 
 75 When the application calls ``getsockopt`` syscall from the cgroup B,
 76 the programs are executed from the bottom up: B, A. First program
 77 (B) sees the result of kernel's ``getsockopt``. It can optionally
 78 adjust ``optval``, ``optlen`` and reset ``retval`` to 0. After that
 79 control will be passed to the second (A) program which will see the
 80 same context as B including any potential modifications.
 81 
 82 Same for ``BPF_CGROUP_SETSOCKOPT``: if the program is attached to
 83 A and B, the trigger order is B, then A. If B does any changes
 84 to the input arguments (``level``, ``optname``, ``optval``, ``optlen``),
 85 then the next program in the chain (A) will see those changes,
 86 *not* the original input ``setsockopt`` arguments. The potentially
 87 modified values will be then passed down to the kernel.
 88 
 89 Large optval
 90 ============
 91 When the ``optval`` is greater than the ``PAGE_SIZE``, the BPF program
 92 can access only the first ``PAGE_SIZE`` of that data. So it has to options:
 93 
 94 * Set ``optlen`` to zero, which indicates that the kernel should
 95   use the original buffer from the userspace. Any modifications
 96   done by the BPF program to the ``optval`` are ignored.
 97 * Set ``optlen`` to the value less than ``PAGE_SIZE``, which
 98   indicates that the kernel should use BPF's trimmed ``optval``.
 99 
100 When the BPF program returns with the ``optlen`` greater than
101 ``PAGE_SIZE``, the userspace will receive original kernel
102 buffers without any modifications that the BPF program might have
103 applied.
104 
105 Example
106 =======
107 
108 Recommended way to handle BPF programs is as follows:
109 
110 .. code-block:: c
111 
112         SEC("cgroup/getsockopt")
113         int getsockopt(struct bpf_sockopt *ctx)
114         {
115                 /* Custom socket option. */
116                 if (ctx->level == MY_SOL && ctx->optname == MY_OPTNAME) {
117                         ctx->retval = 0;
118                         optval[0] = ...;
119                         ctx->optlen = 1;
120                         return 1;
121                 }
122 
123                 /* Modify kernel's socket option. */
124                 if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) {
125                         ctx->retval = 0;
126                         optval[0] = ...;
127                         ctx->optlen = 1;
128                         return 1;
129                 }
130 
131                 /* optval larger than PAGE_SIZE use kernel's buffer. */
132                 if (ctx->optlen > PAGE_SIZE)
133                         ctx->optlen = 0;
134 
135                 return 1;
136         }
137 
138         SEC("cgroup/setsockopt")
139         int setsockopt(struct bpf_sockopt *ctx)
140         {
141                 /* Custom socket option. */
142                 if (ctx->level == MY_SOL && ctx->optname == MY_OPTNAME) {
143                         /* do something */
144                         ctx->optlen = -1;
145                         return 1;
146                 }
147 
148                 /* Modify kernel's socket option. */
149                 if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) {
150                         optval[0] = ...;
151                         return 1;
152                 }
153 
154                 /* optval larger than PAGE_SIZE use kernel's buffer. */
155                 if (ctx->optlen > PAGE_SIZE)
156                         ctx->optlen = 0;
157 
158                 return 1;
159         }
160 
161 See ``tools/testing/selftests/bpf/progs/sockopt_sk.c`` for an example
162 of BPF program that handles socket options.

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php