1 .. SPDX-License-Identifier: GPL-2.0-only 1 .. SPDX-License-Identifier: GPL-2.0-only 2 .. Copyright (C) 2022 Red Hat, Inc. 2 .. Copyright (C) 2022 Red Hat, Inc. 3 3 4 ===================== 4 ===================== 5 BPF_MAP_TYPE_LPM_TRIE 5 BPF_MAP_TYPE_LPM_TRIE 6 ===================== 6 ===================== 7 7 8 .. note:: 8 .. note:: 9 - ``BPF_MAP_TYPE_LPM_TRIE`` was introduced 9 - ``BPF_MAP_TYPE_LPM_TRIE`` was introduced in kernel version 4.11 10 10 11 ``BPF_MAP_TYPE_LPM_TRIE`` provides a longest p 11 ``BPF_MAP_TYPE_LPM_TRIE`` provides a longest prefix match algorithm that 12 can be used to match IP addresses to a stored 12 can be used to match IP addresses to a stored set of prefixes. 13 Internally, data is stored in an unbalanced tr 13 Internally, data is stored in an unbalanced trie of nodes that uses 14 ``prefixlen,data`` pairs as its keys. The ``da 14 ``prefixlen,data`` pairs as its keys. The ``data`` is interpreted in 15 network byte order, i.e. big endian, so ``data 15 network byte order, i.e. big endian, so ``data[0]`` stores the most 16 significant byte. 16 significant byte. 17 17 18 LPM tries may be created with a maximum prefix 18 LPM tries may be created with a maximum prefix length that is a multiple 19 of 8, in the range from 8 to 2048. The key use 19 of 8, in the range from 8 to 2048. The key used for lookup and update 20 operations is a ``struct bpf_lpm_trie_key_u8`` 20 operations is a ``struct bpf_lpm_trie_key_u8``, extended by 21 ``max_prefixlen/8`` bytes. 21 ``max_prefixlen/8`` bytes. 22 22 23 - For IPv4 addresses the data length is 4 byte 23 - For IPv4 addresses the data length is 4 bytes 24 - For IPv6 addresses the data length is 16 byt 24 - For IPv6 addresses the data length is 16 bytes 25 25 26 The value type stored in the LPM trie can be a 26 The value type stored in the LPM trie can be any user defined type. 27 27 28 .. note:: 28 .. note:: 29 When creating a map of type ``BPF_MAP_TYPE_ 29 When creating a map of type ``BPF_MAP_TYPE_LPM_TRIE`` you must set the 30 ``BPF_F_NO_PREALLOC`` flag. 30 ``BPF_F_NO_PREALLOC`` flag. 31 31 32 Usage 32 Usage 33 ===== 33 ===== 34 34 35 Kernel BPF 35 Kernel BPF 36 ---------- 36 ---------- 37 37 38 bpf_map_lookup_elem() 38 bpf_map_lookup_elem() 39 ~~~~~~~~~~~~~~~~~~~~~ 39 ~~~~~~~~~~~~~~~~~~~~~ 40 40 41 .. code-block:: c 41 .. code-block:: c 42 42 43 void *bpf_map_lookup_elem(struct bpf_map *m 43 void *bpf_map_lookup_elem(struct bpf_map *map, const void *key) 44 44 45 The longest prefix entry for a given data valu 45 The longest prefix entry for a given data value can be found using the 46 ``bpf_map_lookup_elem()`` helper. This helper 46 ``bpf_map_lookup_elem()`` helper. This helper returns a pointer to the 47 value associated with the longest matching ``k 47 value associated with the longest matching ``key``, or ``NULL`` if no 48 entry was found. 48 entry was found. 49 49 50 The ``key`` should have ``prefixlen`` set to ` 50 The ``key`` should have ``prefixlen`` set to ``max_prefixlen`` when 51 performing longest prefix lookups. For example 51 performing longest prefix lookups. For example, when searching for the 52 longest prefix match for an IPv4 address, ``pr 52 longest prefix match for an IPv4 address, ``prefixlen`` should be set to 53 ``32``. 53 ``32``. 54 54 55 bpf_map_update_elem() 55 bpf_map_update_elem() 56 ~~~~~~~~~~~~~~~~~~~~~ 56 ~~~~~~~~~~~~~~~~~~~~~ 57 57 58 .. code-block:: c 58 .. code-block:: c 59 59 60 long bpf_map_update_elem(struct bpf_map *ma 60 long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags) 61 61 62 Prefix entries can be added or updated using t 62 Prefix entries can be added or updated using the ``bpf_map_update_elem()`` 63 helper. This helper replaces existing elements 63 helper. This helper replaces existing elements atomically. 64 64 65 ``bpf_map_update_elem()`` returns ``0`` on suc 65 ``bpf_map_update_elem()`` returns ``0`` on success, or negative error in 66 case of failure. 66 case of failure. 67 67 68 .. note:: 68 .. note:: 69 The flags parameter must be one of BPF_ANY 69 The flags parameter must be one of BPF_ANY, BPF_NOEXIST or BPF_EXIST, 70 but the value is ignored, giving BPF_ANY s 70 but the value is ignored, giving BPF_ANY semantics. 71 71 72 bpf_map_delete_elem() 72 bpf_map_delete_elem() 73 ~~~~~~~~~~~~~~~~~~~~~ 73 ~~~~~~~~~~~~~~~~~~~~~ 74 74 75 .. code-block:: c 75 .. code-block:: c 76 76 77 long bpf_map_delete_elem(struct bpf_map *ma 77 long bpf_map_delete_elem(struct bpf_map *map, const void *key) 78 78 79 Prefix entries can be deleted using the ``bpf_ 79 Prefix entries can be deleted using the ``bpf_map_delete_elem()`` 80 helper. This helper will return 0 on success, 80 helper. This helper will return 0 on success, or negative error in case 81 of failure. 81 of failure. 82 82 83 Userspace 83 Userspace 84 --------- 84 --------- 85 85 86 Access from userspace uses libbpf APIs with th 86 Access from userspace uses libbpf APIs with the same names as above, with 87 the map identified by ``fd``. 87 the map identified by ``fd``. 88 88 89 bpf_map_get_next_key() 89 bpf_map_get_next_key() 90 ~~~~~~~~~~~~~~~~~~~~~~ 90 ~~~~~~~~~~~~~~~~~~~~~~ 91 91 92 .. code-block:: c 92 .. code-block:: c 93 93 94 int bpf_map_get_next_key (int fd, const voi 94 int bpf_map_get_next_key (int fd, const void *cur_key, void *next_key) 95 95 96 A userspace program can iterate through the en 96 A userspace program can iterate through the entries in an LPM trie using 97 libbpf's ``bpf_map_get_next_key()`` function. 97 libbpf's ``bpf_map_get_next_key()`` function. The first key can be 98 fetched by calling ``bpf_map_get_next_key()`` 98 fetched by calling ``bpf_map_get_next_key()`` with ``cur_key`` set to 99 ``NULL``. Subsequent calls will fetch the next 99 ``NULL``. Subsequent calls will fetch the next key that follows the 100 current key. ``bpf_map_get_next_key()`` return 100 current key. ``bpf_map_get_next_key()`` returns ``0`` on success, 101 ``-ENOENT`` if ``cur_key`` is the last key in 101 ``-ENOENT`` if ``cur_key`` is the last key in the trie, or negative 102 error in case of failure. 102 error in case of failure. 103 103 104 ``bpf_map_get_next_key()`` will iterate throug 104 ``bpf_map_get_next_key()`` will iterate through the LPM trie elements 105 from leftmost leaf first. This means that iter 105 from leftmost leaf first. This means that iteration will return more 106 specific keys before less specific ones. 106 specific keys before less specific ones. 107 107 108 Examples 108 Examples 109 ======== 109 ======== 110 110 111 Please see ``tools/testing/selftests/bpf/test_ 111 Please see ``tools/testing/selftests/bpf/test_lpm_map.c`` for examples 112 of LPM trie usage from userspace. The code sni 112 of LPM trie usage from userspace. The code snippets below demonstrate 113 API usage. 113 API usage. 114 114 115 Kernel BPF 115 Kernel BPF 116 ---------- 116 ---------- 117 117 118 The following BPF code snippet shows how to de 118 The following BPF code snippet shows how to declare a new LPM trie for IPv4 119 address prefixes: 119 address prefixes: 120 120 121 .. code-block:: c 121 .. code-block:: c 122 122 123 #include <linux/bpf.h> 123 #include <linux/bpf.h> 124 #include <bpf/bpf_helpers.h> 124 #include <bpf/bpf_helpers.h> 125 125 126 struct ipv4_lpm_key { 126 struct ipv4_lpm_key { 127 __u32 prefixlen; 127 __u32 prefixlen; 128 __u32 data; 128 __u32 data; 129 }; 129 }; 130 130 131 struct { 131 struct { 132 __uint(type, BPF_MAP_TYPE_LPM_TRIE 132 __uint(type, BPF_MAP_TYPE_LPM_TRIE); 133 __type(key, struct ipv4_lpm_key); 133 __type(key, struct ipv4_lpm_key); 134 __type(value, __u32); 134 __type(value, __u32); 135 __uint(map_flags, BPF_F_NO_PREALLO 135 __uint(map_flags, BPF_F_NO_PREALLOC); 136 __uint(max_entries, 255); 136 __uint(max_entries, 255); 137 } ipv4_lpm_map SEC(".maps"); 137 } ipv4_lpm_map SEC(".maps"); 138 138 139 The following BPF code snippet shows how to lo 139 The following BPF code snippet shows how to lookup by IPv4 address: 140 140 141 .. code-block:: c 141 .. code-block:: c 142 142 143 void *lookup(__u32 ipaddr) 143 void *lookup(__u32 ipaddr) 144 { 144 { 145 struct ipv4_lpm_key key = { 145 struct ipv4_lpm_key key = { 146 .prefixlen = 32, 146 .prefixlen = 32, 147 .data = ipaddr 147 .data = ipaddr 148 }; 148 }; 149 149 150 return bpf_map_lookup_elem(&ipv4_l 150 return bpf_map_lookup_elem(&ipv4_lpm_map, &key); 151 } 151 } 152 152 153 Userspace 153 Userspace 154 --------- 154 --------- 155 155 156 The following snippet shows how to insert an I 156 The following snippet shows how to insert an IPv4 prefix entry into an 157 LPM trie: 157 LPM trie: 158 158 159 .. code-block:: c 159 .. code-block:: c 160 160 161 int add_prefix_entry(int lpm_fd, __u32 add 161 int add_prefix_entry(int lpm_fd, __u32 addr, __u32 prefixlen, struct value *value) 162 { 162 { 163 struct ipv4_lpm_key ipv4_key = { 163 struct ipv4_lpm_key ipv4_key = { 164 .prefixlen = prefixlen, 164 .prefixlen = prefixlen, 165 .data = addr 165 .data = addr 166 }; 166 }; 167 return bpf_map_update_elem(lpm_fd, 167 return bpf_map_update_elem(lpm_fd, &ipv4_key, value, BPF_ANY); 168 } 168 } 169 169 170 The following snippet shows a userspace progra 170 The following snippet shows a userspace program walking through the entries 171 of an LPM trie: 171 of an LPM trie: 172 172 173 173 174 .. code-block:: c 174 .. code-block:: c 175 175 176 #include <bpf/libbpf.h> 176 #include <bpf/libbpf.h> 177 #include <bpf/bpf.h> 177 #include <bpf/bpf.h> 178 178 179 void iterate_lpm_trie(int map_fd) 179 void iterate_lpm_trie(int map_fd) 180 { 180 { 181 struct ipv4_lpm_key *cur_key = NUL 181 struct ipv4_lpm_key *cur_key = NULL; 182 struct ipv4_lpm_key next_key; 182 struct ipv4_lpm_key next_key; 183 struct value value; 183 struct value value; 184 int err; 184 int err; 185 185 186 for (;;) { 186 for (;;) { 187 err = bpf_map_get_next_key 187 err = bpf_map_get_next_key(map_fd, cur_key, &next_key); 188 if (err) 188 if (err) 189 break; 189 break; 190 190 191 bpf_map_lookup_elem(map_fd 191 bpf_map_lookup_elem(map_fd, &next_key, &value); 192 192 193 /* Use key and value here 193 /* Use key and value here */ 194 194 195 cur_key = &next_key; 195 cur_key = &next_key; 196 } 196 } 197 } 197 }
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.