1 .. SPDX-License-Identifier: GPL-2.0-only 2 .. Copyright (C) 2022 Red Hat, Inc. 3 4 ================================================= 5 BPF_MAP_TYPE_DEVMAP and BPF_MAP_TYPE_DEVMAP_HASH 6 ================================================= 7 8 .. note:: 9 - ``BPF_MAP_TYPE_DEVMAP`` was introduced in kernel version 4.14 10 - ``BPF_MAP_TYPE_DEVMAP_HASH`` was introduced in kernel version 5.4 11 12 ``BPF_MAP_TYPE_DEVMAP`` and ``BPF_MAP_TYPE_DEVMAP_HASH`` are BPF maps primarily 13 used as backend maps for the XDP BPF helper call ``bpf_redirect_map()``. 14 ``BPF_MAP_TYPE_DEVMAP`` is backed by an array that uses the key as 15 the index to lookup a reference to a net device. While ``BPF_MAP_TYPE_DEVMAP_HASH`` 16 is backed by a hash table that uses a key to lookup a reference to a net device. 17 The user provides either <``key``/ ``ifindex``> or <``key``/ ``struct bpf_devmap_val``> 18 pairs to update the maps with new net devices. 19 20 .. note:: 21 - The key to a hash map doesn't have to be an ``ifindex``. 22 - While ``BPF_MAP_TYPE_DEVMAP_HASH`` allows for densely packing the net devices 23 it comes at the cost of a hash of the key when performing a look up. 24 25 The setup and packet enqueue/send code is shared between the two types of 26 devmap; only the lookup and insertion is different. 27 28 Usage 29 ===== 30 Kernel BPF 31 ---------- 32 bpf_redirect_map() 33 ^^^^^^^^^^^^^^^^^^ 34 .. code-block:: c 35 36 long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags) 37 38 Redirect the packet to the endpoint referenced by ``map`` at index ``key``. 39 For ``BPF_MAP_TYPE_DEVMAP`` and ``BPF_MAP_TYPE_DEVMAP_HASH`` this map contains 40 references to net devices (for forwarding packets through other ports). 41 42 The lower two bits of *flags* are used as the return code if the map lookup 43 fails. This is so that the return value can be one of the XDP program return 44 codes up to ``XDP_TX``, as chosen by the caller. The higher bits of ``flags`` 45 can be set to ``BPF_F_BROADCAST`` or ``BPF_F_EXCLUDE_INGRESS`` as defined 46 below. 47 48 With ``BPF_F_BROADCAST`` the packet will be broadcast to all the interfaces 49 in the map, with ``BPF_F_EXCLUDE_INGRESS`` the ingress interface will be excluded 50 from the broadcast. 51 52 .. note:: 53 - The key is ignored if BPF_F_BROADCAST is set. 54 - The broadcast feature can also be used to implement multicast forwarding: 55 simply create multiple DEVMAPs, each one corresponding to a single multicast group. 56 57 This helper will return ``XDP_REDIRECT`` on success, or the value of the two 58 lower bits of the ``flags`` argument if the map lookup fails. 59 60 More information about redirection can be found :doc:`redirect` 61 62 bpf_map_lookup_elem() 63 ^^^^^^^^^^^^^^^^^^^^^ 64 .. code-block:: c 65 66 void *bpf_map_lookup_elem(struct bpf_map *map, const void *key) 67 68 Net device entries can be retrieved using the ``bpf_map_lookup_elem()`` 69 helper. 70 71 User space 72 ---------- 73 .. note:: 74 DEVMAP entries can only be updated/deleted from user space and not 75 from an eBPF program. Trying to call these functions from a kernel eBPF 76 program will result in the program failing to load and a verifier warning. 77 78 bpf_map_update_elem() 79 ^^^^^^^^^^^^^^^^^^^^^ 80 .. code-block:: c 81 82 int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags); 83 84 Net device entries can be added or updated using the ``bpf_map_update_elem()`` 85 helper. This helper replaces existing elements atomically. The ``value`` parameter 86 can be ``struct bpf_devmap_val`` or a simple ``int ifindex`` for backwards 87 compatibility. 88 89 .. code-block:: c 90 91 struct bpf_devmap_val { 92 __u32 ifindex; /* device index */ 93 union { 94 int fd; /* prog fd on map write */ 95 __u32 id; /* prog id on map read */ 96 } bpf_prog; 97 }; 98 99 The ``flags`` argument can be one of the following: 100 - ``BPF_ANY``: Create a new element or update an existing element. 101 - ``BPF_NOEXIST``: Create a new element only if it did not exist. 102 - ``BPF_EXIST``: Update an existing element. 103 104 DEVMAPs can associate a program with a device entry by adding a ``bpf_prog.fd`` 105 to ``struct bpf_devmap_val``. Programs are run after ``XDP_REDIRECT`` and have 106 access to both Rx device and Tx device. The program associated with the ``fd`` 107 must have type XDP with expected attach type ``xdp_devmap``. 108 When a program is associated with a device index, the program is run on an 109 ``XDP_REDIRECT`` and before the buffer is added to the per-cpu queue. Examples 110 of how to attach/use xdp_devmap progs can be found in the kernel selftests: 111 112 - ``tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c`` 113 - ``tools/testing/selftests/bpf/progs/test_xdp_with_devmap_helpers.c`` 114 115 bpf_map_lookup_elem() 116 ^^^^^^^^^^^^^^^^^^^^^ 117 .. code-block:: c 118 119 .. c:function:: 120 int bpf_map_lookup_elem(int fd, const void *key, void *value); 121 122 Net device entries can be retrieved using the ``bpf_map_lookup_elem()`` 123 helper. 124 125 bpf_map_delete_elem() 126 ^^^^^^^^^^^^^^^^^^^^^ 127 .. code-block:: c 128 129 .. c:function:: 130 int bpf_map_delete_elem(int fd, const void *key); 131 132 Net device entries can be deleted using the ``bpf_map_delete_elem()`` 133 helper. This helper will return 0 on success, or negative error in case of 134 failure. 135 136 Examples 137 ======== 138 139 Kernel BPF 140 ---------- 141 142 The following code snippet shows how to declare a ``BPF_MAP_TYPE_DEVMAP`` 143 called tx_port. 144 145 .. code-block:: c 146 147 struct { 148 __uint(type, BPF_MAP_TYPE_DEVMAP); 149 __type(key, __u32); 150 __type(value, __u32); 151 __uint(max_entries, 256); 152 } tx_port SEC(".maps"); 153 154 The following code snippet shows how to declare a ``BPF_MAP_TYPE_DEVMAP_HASH`` 155 called forward_map. 156 157 .. code-block:: c 158 159 struct { 160 __uint(type, BPF_MAP_TYPE_DEVMAP_HASH); 161 __type(key, __u32); 162 __type(value, struct bpf_devmap_val); 163 __uint(max_entries, 32); 164 } forward_map SEC(".maps"); 165 166 .. note:: 167 168 The value type in the DEVMAP above is a ``struct bpf_devmap_val`` 169 170 The following code snippet shows a simple xdp_redirect_map program. This program 171 would work with a user space program that populates the devmap ``forward_map`` based 172 on ingress ifindexes. The BPF program (below) is redirecting packets using the 173 ingress ``ifindex`` as the ``key``. 174 175 .. code-block:: c 176 177 SEC("xdp") 178 int xdp_redirect_map_func(struct xdp_md *ctx) 179 { 180 int index = ctx->ingress_ifindex; 181 182 return bpf_redirect_map(&forward_map, index, 0); 183 } 184 185 The following code snippet shows a BPF program that is broadcasting packets to 186 all the interfaces in the ``tx_port`` devmap. 187 188 .. code-block:: c 189 190 SEC("xdp") 191 int xdp_redirect_map_func(struct xdp_md *ctx) 192 { 193 return bpf_redirect_map(&tx_port, 0, BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS); 194 } 195 196 User space 197 ---------- 198 199 The following code snippet shows how to update a devmap called ``tx_port``. 200 201 .. code-block:: c 202 203 int update_devmap(int ifindex, int redirect_ifindex) 204 { 205 int ret; 206 207 ret = bpf_map_update_elem(bpf_map__fd(tx_port), &ifindex, &redirect_ifindex, 0); 208 if (ret < 0) { 209 fprintf(stderr, "Failed to update devmap_ value: %s\n", 210 strerror(errno)); 211 } 212 213 return ret; 214 } 215 216 The following code snippet shows how to update a hash_devmap called ``forward_map``. 217 218 .. code-block:: c 219 220 int update_devmap(int ifindex, int redirect_ifindex) 221 { 222 struct bpf_devmap_val devmap_val = { .ifindex = redirect_ifindex }; 223 int ret; 224 225 ret = bpf_map_update_elem(bpf_map__fd(forward_map), &ifindex, &devmap_val, 0); 226 if (ret < 0) { 227 fprintf(stderr, "Failed to update devmap_ value: %s\n", 228 strerror(errno)); 229 } 230 return ret; 231 } 232 233 References 234 =========== 235 236 - https://lwn.net/Articles/728146/ 237 - https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=6f9d451ab1a33728adb72d7ff66a7b374d665176 238 - https://elixir.bootlin.com/linux/latest/source/net/core/filter.c#L4106
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.