1 .. SPDX-License-Identifier: GPL-2.0-only 1 .. SPDX-License-Identifier: GPL-2.0-only 2 .. Copyright (C) 2022 Red Hat, Inc. 2 .. Copyright (C) 2022 Red Hat, Inc. 3 .. Copyright (C) 2022-2023 Isovalent, Inc. 3 .. Copyright (C) 2022-2023 Isovalent, Inc. 4 4 5 ============================================== 5 =============================================== 6 BPF_MAP_TYPE_HASH, with PERCPU and LRU Variant 6 BPF_MAP_TYPE_HASH, with PERCPU and LRU Variants 7 ============================================== 7 =============================================== 8 8 9 .. note:: 9 .. note:: 10 - ``BPF_MAP_TYPE_HASH`` was introduced in k 10 - ``BPF_MAP_TYPE_HASH`` was introduced in kernel version 3.19 11 - ``BPF_MAP_TYPE_PERCPU_HASH`` was introduc 11 - ``BPF_MAP_TYPE_PERCPU_HASH`` was introduced in version 4.6 12 - Both ``BPF_MAP_TYPE_LRU_HASH`` and ``BPF_ 12 - Both ``BPF_MAP_TYPE_LRU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH`` 13 were introduced in version 4.10 13 were introduced in version 4.10 14 14 15 ``BPF_MAP_TYPE_HASH`` and ``BPF_MAP_TYPE_PERCP 15 ``BPF_MAP_TYPE_HASH`` and ``BPF_MAP_TYPE_PERCPU_HASH`` provide general 16 purpose hash map storage. Both the key and the 16 purpose hash map storage. Both the key and the value can be structs, 17 allowing for composite keys and values. 17 allowing for composite keys and values. 18 18 19 The kernel is responsible for allocating and f 19 The kernel is responsible for allocating and freeing key/value pairs, up 20 to the max_entries limit that you specify. Has 20 to the max_entries limit that you specify. Hash maps use pre-allocation 21 of hash table elements by default. The ``BPF_F 21 of hash table elements by default. The ``BPF_F_NO_PREALLOC`` flag can be 22 used to disable pre-allocation when it is too 22 used to disable pre-allocation when it is too memory expensive. 23 23 24 ``BPF_MAP_TYPE_PERCPU_HASH`` provides a separa 24 ``BPF_MAP_TYPE_PERCPU_HASH`` provides a separate value slot per 25 CPU. The per-cpu values are stored internally 25 CPU. The per-cpu values are stored internally in an array. 26 26 27 The ``BPF_MAP_TYPE_LRU_HASH`` and ``BPF_MAP_TY 27 The ``BPF_MAP_TYPE_LRU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH`` 28 variants add LRU semantics to their respective 28 variants add LRU semantics to their respective hash tables. An LRU hash 29 will automatically evict the least recently us 29 will automatically evict the least recently used entries when the hash 30 table reaches capacity. An LRU hash maintains 30 table reaches capacity. An LRU hash maintains an internal LRU list that 31 is used to select elements for eviction. This 31 is used to select elements for eviction. This internal LRU list is 32 shared across CPUs but it is possible to reque 32 shared across CPUs but it is possible to request a per CPU LRU list with 33 the ``BPF_F_NO_COMMON_LRU`` flag when calling 33 the ``BPF_F_NO_COMMON_LRU`` flag when calling ``bpf_map_create``. The 34 following table outlines the properties of LRU 34 following table outlines the properties of LRU maps depending on the a 35 map type and the flags used to create the map. 35 map type and the flags used to create the map. 36 36 37 ======================== ===================== 37 ======================== ========================= ================================ 38 Flag ``BPF_MAP_TYPE_LRU_HA 38 Flag ``BPF_MAP_TYPE_LRU_HASH`` ``BPF_MAP_TYPE_LRU_PERCPU_HASH`` 39 ======================== ===================== 39 ======================== ========================= ================================ 40 **BPF_F_NO_COMMON_LRU** Per-CPU LRU, global m 40 **BPF_F_NO_COMMON_LRU** Per-CPU LRU, global map Per-CPU LRU, per-cpu map 41 **!BPF_F_NO_COMMON_LRU** Global LRU, global ma 41 **!BPF_F_NO_COMMON_LRU** Global LRU, global map Global LRU, per-cpu map 42 ======================== ===================== 42 ======================== ========================= ================================ 43 43 44 Usage 44 Usage 45 ===== 45 ===== 46 46 47 Kernel BPF 47 Kernel BPF 48 ---------- 48 ---------- 49 49 50 bpf_map_update_elem() 50 bpf_map_update_elem() 51 ~~~~~~~~~~~~~~~~~~~~~ 51 ~~~~~~~~~~~~~~~~~~~~~ 52 52 53 .. code-block:: c 53 .. code-block:: c 54 54 55 long bpf_map_update_elem(struct bpf_map *ma 55 long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags) 56 56 57 Hash entries can be added or updated using the 57 Hash entries can be added or updated using the ``bpf_map_update_elem()`` 58 helper. This helper replaces existing elements 58 helper. This helper replaces existing elements atomically. The ``flags`` 59 parameter can be used to control the update be 59 parameter can be used to control the update behaviour: 60 60 61 - ``BPF_ANY`` will create a new element or upd 61 - ``BPF_ANY`` will create a new element or update an existing element 62 - ``BPF_NOEXIST`` will create a new element on 62 - ``BPF_NOEXIST`` will create a new element only if one did not already 63 exist 63 exist 64 - ``BPF_EXIST`` will update an existing elemen 64 - ``BPF_EXIST`` will update an existing element 65 65 66 ``bpf_map_update_elem()`` returns 0 on success 66 ``bpf_map_update_elem()`` returns 0 on success, or negative error in 67 case of failure. 67 case of failure. 68 68 69 bpf_map_lookup_elem() 69 bpf_map_lookup_elem() 70 ~~~~~~~~~~~~~~~~~~~~~ 70 ~~~~~~~~~~~~~~~~~~~~~ 71 71 72 .. code-block:: c 72 .. code-block:: c 73 73 74 void *bpf_map_lookup_elem(struct bpf_map *m 74 void *bpf_map_lookup_elem(struct bpf_map *map, const void *key) 75 75 76 Hash entries can be retrieved using the ``bpf_ 76 Hash entries can be retrieved using the ``bpf_map_lookup_elem()`` 77 helper. This helper returns a pointer to the v 77 helper. This helper returns a pointer to the value associated with 78 ``key``, or ``NULL`` if no entry was found. 78 ``key``, or ``NULL`` if no entry was found. 79 79 80 bpf_map_delete_elem() 80 bpf_map_delete_elem() 81 ~~~~~~~~~~~~~~~~~~~~~ 81 ~~~~~~~~~~~~~~~~~~~~~ 82 82 83 .. code-block:: c 83 .. code-block:: c 84 84 85 long bpf_map_delete_elem(struct bpf_map *ma 85 long bpf_map_delete_elem(struct bpf_map *map, const void *key) 86 86 87 Hash entries can be deleted using the ``bpf_ma 87 Hash entries can be deleted using the ``bpf_map_delete_elem()`` 88 helper. This helper will return 0 on success, 88 helper. This helper will return 0 on success, or negative error in case 89 of failure. 89 of failure. 90 90 91 Per CPU Hashes 91 Per CPU Hashes 92 -------------- 92 -------------- 93 93 94 For ``BPF_MAP_TYPE_PERCPU_HASH`` and ``BPF_MAP 94 For ``BPF_MAP_TYPE_PERCPU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH`` 95 the ``bpf_map_update_elem()`` and ``bpf_map_lo 95 the ``bpf_map_update_elem()`` and ``bpf_map_lookup_elem()`` helpers 96 automatically access the hash slot for the cur 96 automatically access the hash slot for the current CPU. 97 97 98 bpf_map_lookup_percpu_elem() 98 bpf_map_lookup_percpu_elem() 99 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 99 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 100 100 101 .. code-block:: c 101 .. code-block:: c 102 102 103 void *bpf_map_lookup_percpu_elem(struct bpf 103 void *bpf_map_lookup_percpu_elem(struct bpf_map *map, const void *key, u32 cpu) 104 104 105 The ``bpf_map_lookup_percpu_elem()`` helper ca 105 The ``bpf_map_lookup_percpu_elem()`` helper can be used to lookup the 106 value in the hash slot for a specific CPU. Ret 106 value in the hash slot for a specific CPU. Returns value associated with 107 ``key`` on ``cpu`` , or ``NULL`` if no entry w 107 ``key`` on ``cpu`` , or ``NULL`` if no entry was found or ``cpu`` is 108 invalid. 108 invalid. 109 109 110 Concurrency 110 Concurrency 111 ----------- 111 ----------- 112 112 113 Values stored in ``BPF_MAP_TYPE_HASH`` can be 113 Values stored in ``BPF_MAP_TYPE_HASH`` can be accessed concurrently by 114 programs running on different CPUs. Since Ker 114 programs running on different CPUs. Since Kernel version 5.1, the BPF 115 infrastructure provides ``struct bpf_spin_lock 115 infrastructure provides ``struct bpf_spin_lock`` to synchronise access. 116 See ``tools/testing/selftests/bpf/progs/test_s 116 See ``tools/testing/selftests/bpf/progs/test_spin_lock.c``. 117 117 118 Userspace 118 Userspace 119 --------- 119 --------- 120 120 121 bpf_map_get_next_key() 121 bpf_map_get_next_key() 122 ~~~~~~~~~~~~~~~~~~~~~~ 122 ~~~~~~~~~~~~~~~~~~~~~~ 123 123 124 .. code-block:: c 124 .. code-block:: c 125 125 126 int bpf_map_get_next_key(int fd, const void 126 int bpf_map_get_next_key(int fd, const void *cur_key, void *next_key) 127 127 128 In userspace, it is possible to iterate throug 128 In userspace, it is possible to iterate through the keys of a hash using 129 libbpf's ``bpf_map_get_next_key()`` function. 129 libbpf's ``bpf_map_get_next_key()`` function. The first key can be fetched by 130 calling ``bpf_map_get_next_key()`` with ``cur_ 130 calling ``bpf_map_get_next_key()`` with ``cur_key`` set to 131 ``NULL``. Subsequent calls will fetch the next 131 ``NULL``. Subsequent calls will fetch the next key that follows the 132 current key. ``bpf_map_get_next_key()`` return 132 current key. ``bpf_map_get_next_key()`` returns 0 on success, -ENOENT if 133 cur_key is the last key in the hash, or negati 133 cur_key is the last key in the hash, or negative error in case of 134 failure. 134 failure. 135 135 136 Note that if ``cur_key`` gets deleted then ``b 136 Note that if ``cur_key`` gets deleted then ``bpf_map_get_next_key()`` 137 will instead return the *first* key in the has 137 will instead return the *first* key in the hash table which is 138 undesirable. It is recommended to use batched 138 undesirable. It is recommended to use batched lookup if there is going 139 to be key deletion intermixed with ``bpf_map_g 139 to be key deletion intermixed with ``bpf_map_get_next_key()``. 140 140 141 Examples 141 Examples 142 ======== 142 ======== 143 143 144 Please see the ``tools/testing/selftests/bpf`` 144 Please see the ``tools/testing/selftests/bpf`` directory for functional 145 examples. The code snippets below demonstrate 145 examples. The code snippets below demonstrates API usage. 146 146 147 This example shows how to declare an LRU Hash 147 This example shows how to declare an LRU Hash with a struct key and a 148 struct value. 148 struct value. 149 149 150 .. code-block:: c 150 .. code-block:: c 151 151 152 #include <linux/bpf.h> 152 #include <linux/bpf.h> 153 #include <bpf/bpf_helpers.h> 153 #include <bpf/bpf_helpers.h> 154 154 155 struct key { 155 struct key { 156 __u32 srcip; 156 __u32 srcip; 157 }; 157 }; 158 158 159 struct value { 159 struct value { 160 __u64 packets; 160 __u64 packets; 161 __u64 bytes; 161 __u64 bytes; 162 }; 162 }; 163 163 164 struct { 164 struct { 165 __uint(type, BPF_MAP_TYPE_LRU_HASH 165 __uint(type, BPF_MAP_TYPE_LRU_HASH); 166 __uint(max_entries, 32); 166 __uint(max_entries, 32); 167 __type(key, struct key); 167 __type(key, struct key); 168 __type(value, struct value); 168 __type(value, struct value); 169 } packet_stats SEC(".maps"); 169 } packet_stats SEC(".maps"); 170 170 171 This example shows how to create or update has 171 This example shows how to create or update hash values using atomic 172 instructions: 172 instructions: 173 173 174 .. code-block:: c 174 .. code-block:: c 175 175 176 static void update_stats(__u32 srcip, int 176 static void update_stats(__u32 srcip, int bytes) 177 { 177 { 178 struct key key = { 178 struct key key = { 179 .srcip = srcip, 179 .srcip = srcip, 180 }; 180 }; 181 struct value *value = bpf_map_look 181 struct value *value = bpf_map_lookup_elem(&packet_stats, &key); 182 182 183 if (value) { 183 if (value) { 184 __sync_fetch_and_add(&valu 184 __sync_fetch_and_add(&value->packets, 1); 185 __sync_fetch_and_add(&valu 185 __sync_fetch_and_add(&value->bytes, bytes); 186 } else { 186 } else { 187 struct value newval = { 1, 187 struct value newval = { 1, bytes }; 188 188 189 bpf_map_update_elem(&packe 189 bpf_map_update_elem(&packet_stats, &key, &newval, BPF_NOEXIST); 190 } 190 } 191 } 191 } 192 192 193 Userspace walking the map elements from the ma 193 Userspace walking the map elements from the map declared above: 194 194 195 .. code-block:: c 195 .. code-block:: c 196 196 197 #include <bpf/libbpf.h> 197 #include <bpf/libbpf.h> 198 #include <bpf/bpf.h> 198 #include <bpf/bpf.h> 199 199 200 static void walk_hash_elements(int map_fd) 200 static void walk_hash_elements(int map_fd) 201 { 201 { 202 struct key *cur_key = NULL; 202 struct key *cur_key = NULL; 203 struct key next_key; 203 struct key next_key; 204 struct value value; 204 struct value value; 205 int err; 205 int err; 206 206 207 for (;;) { 207 for (;;) { 208 err = bpf_map_get_next_key 208 err = bpf_map_get_next_key(map_fd, cur_key, &next_key); 209 if (err) 209 if (err) 210 break; 210 break; 211 211 212 bpf_map_lookup_elem(map_fd 212 bpf_map_lookup_elem(map_fd, &next_key, &value); 213 213 214 // Use key and value here 214 // Use key and value here 215 215 216 cur_key = &next_key; 216 cur_key = &next_key; 217 } 217 } 218 } 218 } 219 219 220 Internals 220 Internals 221 ========= 221 ========= 222 222 223 This section of the document is targeted at Li 223 This section of the document is targeted at Linux developers and describes 224 aspects of the map implementations that are no 224 aspects of the map implementations that are not considered stable ABI. The 225 following details are subject to change in fut 225 following details are subject to change in future versions of the kernel. 226 226 227 ``BPF_MAP_TYPE_LRU_HASH`` and variants 227 ``BPF_MAP_TYPE_LRU_HASH`` and variants 228 -------------------------------------- 228 -------------------------------------- 229 229 230 Updating elements in LRU maps may trigger evic 230 Updating elements in LRU maps may trigger eviction behaviour when the capacity 231 of the map is reached. There are various steps 231 of the map is reached. There are various steps that the update algorithm 232 attempts in order to enforce the LRU property 232 attempts in order to enforce the LRU property which have increasing impacts on 233 other CPUs involved in the following operation 233 other CPUs involved in the following operation attempts: 234 234 235 - Attempt to use CPU-local state to batch oper 235 - Attempt to use CPU-local state to batch operations 236 - Attempt to fetch free nodes from global list 236 - Attempt to fetch free nodes from global lists 237 - Attempt to pull any node from a global list 237 - Attempt to pull any node from a global list and remove it from the hashmap 238 - Attempt to pull any node from any CPU's list 238 - Attempt to pull any node from any CPU's list and remove it from the hashmap 239 239 240 This algorithm is described visually in the fo 240 This algorithm is described visually in the following diagram. See the 241 description in commit 3a08c2fd7634 ("bpf: LRU 241 description in commit 3a08c2fd7634 ("bpf: LRU List") for a full explanation of 242 the corresponding operations: 242 the corresponding operations: 243 243 244 .. kernel-figure:: map_lru_hash_update.dot 244 .. kernel-figure:: map_lru_hash_update.dot 245 :alt: Diagram outlining the LRU eviction 245 :alt: Diagram outlining the LRU eviction steps taken during map update. 246 246 247 LRU hash eviction during map update for ``B 247 LRU hash eviction during map update for ``BPF_MAP_TYPE_LRU_HASH`` and 248 variants. See the dot file source for kerne 248 variants. See the dot file source for kernel function name code references. 249 249 250 Map updates start from the oval in the top rig 250 Map updates start from the oval in the top right "begin ``bpf_map_update()``" 251 and progress through the graph towards the bot 251 and progress through the graph towards the bottom where the result may be 252 either a successful update or a failure with v 252 either a successful update or a failure with various error codes. The key in 253 the top right provides indicators for which lo 253 the top right provides indicators for which locks may be involved in specific 254 operations. This is intended as a visual hint 254 operations. This is intended as a visual hint for reasoning about how map 255 contention may impact update operations, thoug 255 contention may impact update operations, though the map type and flags may 256 impact the actual contention on those locks, b 256 impact the actual contention on those locks, based on the logic described in 257 the table above. For instance, if the map is c 257 the table above. For instance, if the map is created with type 258 ``BPF_MAP_TYPE_LRU_PERCPU_HASH`` and flags ``B 258 ``BPF_MAP_TYPE_LRU_PERCPU_HASH`` and flags ``BPF_F_NO_COMMON_LRU`` then all map 259 properties would be per-cpu. 259 properties would be per-cpu.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.