~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/networking/fib_trie.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/networking/fib_trie.rst (Architecture sparc) and /Documentation/networking/fib_trie.rst (Architecture i386)


  1 .. SPDX-License-Identifier: GPL-2.0                 1 .. SPDX-License-Identifier: GPL-2.0
  2                                                     2 
  3 ============================                        3 ============================
  4 LC-trie implementation notes                        4 LC-trie implementation notes
  5 ============================                        5 ============================
  6                                                     6 
  7 Node types                                          7 Node types
  8 ----------                                          8 ----------
  9 leaf                                                9 leaf
 10         An end node with data. This has a copy     10         An end node with data. This has a copy of the relevant key, along
 11         with 'hlist' with routing table entrie     11         with 'hlist' with routing table entries sorted by prefix length.
 12         See struct leaf and struct leaf_info.      12         See struct leaf and struct leaf_info.
 13                                                    13 
 14 trie node or tnode                                 14 trie node or tnode
 15         An internal node, holding an array of      15         An internal node, holding an array of child (leaf or tnode) pointers,
 16         indexed through a subset of the key. S     16         indexed through a subset of the key. See Level Compression.
 17                                                    17 
 18 A few concepts explained                           18 A few concepts explained
 19 ------------------------                           19 ------------------------
 20 Bits (tnode)                                       20 Bits (tnode)
 21         The number of bits in the key segment      21         The number of bits in the key segment used for indexing into the
 22         child array - the "child index". See L     22         child array - the "child index". See Level Compression.
 23                                                    23 
 24 Pos (tnode)                                        24 Pos (tnode)
 25         The position (in the key) of the key s     25         The position (in the key) of the key segment used for indexing into
 26         the child array. See Path Compression.     26         the child array. See Path Compression.
 27                                                    27 
 28 Path Compression / skipped bits                    28 Path Compression / skipped bits
 29         Any given tnode is linked to from the      29         Any given tnode is linked to from the child array of its parent, using
 30         a segment of the key specified by the      30         a segment of the key specified by the parent's "pos" and "bits"
 31         In certain cases, this tnode's own "po     31         In certain cases, this tnode's own "pos" will not be immediately
 32         adjacent to the parent (pos+bits), but     32         adjacent to the parent (pos+bits), but there will be some bits
 33         in the key skipped over because they r     33         in the key skipped over because they represent a single path with no
 34         deviations. These "skipped bits" const     34         deviations. These "skipped bits" constitute Path Compression.
 35         Note that the search algorithm will si     35         Note that the search algorithm will simply skip over these bits when
 36         searching, making it necessary to save     36         searching, making it necessary to save the keys in the leaves to
 37         verify that they actually do match the     37         verify that they actually do match the key we are searching for.
 38                                                    38 
 39 Level Compression / child arrays                   39 Level Compression / child arrays
 40         the trie is kept level balanced moving     40         the trie is kept level balanced moving, under certain conditions, the
 41         children of a full child (see "full_ch     41         children of a full child (see "full_children") up one level, so that
 42         instead of a pure binary tree, each in     42         instead of a pure binary tree, each internal node ("tnode") may
 43         contain an arbitrarily large array of      43         contain an arbitrarily large array of links to several children.
 44         Conversely, a tnode with a mostly empt     44         Conversely, a tnode with a mostly empty child array (see empty_children)
 45         may be "halved", having some of its ch     45         may be "halved", having some of its children moved downwards one level,
 46         in order to avoid ever-increasing chil     46         in order to avoid ever-increasing child arrays.
 47                                                    47 
 48 empty_children                                     48 empty_children
 49         the number of positions in the child a     49         the number of positions in the child array of a given tnode that are
 50         NULL.                                      50         NULL.
 51                                                    51 
 52 full_children                                      52 full_children
 53         the number of children of a given tnod     53         the number of children of a given tnode that aren't path compressed.
 54         (in other words, they aren't NULL or l     54         (in other words, they aren't NULL or leaves and their "pos" is equal
 55         to this tnode's "pos"+"bits").             55         to this tnode's "pos"+"bits").
 56                                                    56 
 57         (The word "full" here is used more in      57         (The word "full" here is used more in the sense of "complete" than
 58         as the opposite of "empty", which migh     58         as the opposite of "empty", which might be a tad confusing.)
 59                                                    59 
 60 Comments                                           60 Comments
 61 ---------                                          61 ---------
 62                                                    62 
 63 We have tried to keep the structure of the cod     63 We have tried to keep the structure of the code as close to fib_hash as
 64 possible to allow verification and help up rev     64 possible to allow verification and help up reviewing.
 65                                                    65 
 66 fib_find_node()                                    66 fib_find_node()
 67         A good start for understanding this co     67         A good start for understanding this code. This function implements a
 68         straightforward trie lookup.               68         straightforward trie lookup.
 69                                                    69 
 70 fib_insert_node()                                  70 fib_insert_node()
 71         Inserts a new leaf node in the trie. T     71         Inserts a new leaf node in the trie. This is bit more complicated than
 72         fib_find_node(). Inserting a new node      72         fib_find_node(). Inserting a new node means we might have to run the
 73         level compression algorithm on part of     73         level compression algorithm on part of the trie.
 74                                                    74 
 75 trie_leaf_remove()                                 75 trie_leaf_remove()
 76         Looks up a key, deletes it and runs th     76         Looks up a key, deletes it and runs the level compression algorithm.
 77                                                    77 
 78 trie_rebalance()                                   78 trie_rebalance()
 79         The key function for the dynamic trie      79         The key function for the dynamic trie after any change in the trie
 80         it is run to optimize and reorganize.      80         it is run to optimize and reorganize. It will walk the trie upwards
 81         towards the root from a given tnode, d     81         towards the root from a given tnode, doing a resize() at each step
 82         to implement level compression.            82         to implement level compression.
 83                                                    83 
 84 resize()                                           84 resize()
 85         Analyzes a tnode and optimizes the chi     85         Analyzes a tnode and optimizes the child array size by either inflating
 86         or shrinking it repeatedly until it fu     86         or shrinking it repeatedly until it fulfills the criteria for optimal
 87         level compression. This part follows t     87         level compression. This part follows the original paper pretty closely
 88         and there may be some room for experim     88         and there may be some room for experimentation here.
 89                                                    89 
 90 inflate()                                          90 inflate()
 91         Doubles the size of the child array wi     91         Doubles the size of the child array within a tnode. Used by resize().
 92                                                    92 
 93 halve()                                            93 halve()
 94         Halves the size of the child array wit     94         Halves the size of the child array within a tnode - the inverse of
 95         inflate(). Used by resize();               95         inflate(). Used by resize();
 96                                                    96 
 97 fn_trie_insert(), fn_trie_delete(), fn_trie_se     97 fn_trie_insert(), fn_trie_delete(), fn_trie_select_default()
 98         The route manipulation functions. Shou     98         The route manipulation functions. Should conform pretty closely to the
 99         corresponding functions in fib_hash.       99         corresponding functions in fib_hash.
100                                                   100 
101 fn_trie_flush()                                   101 fn_trie_flush()
102         This walks the full trie (using nextle    102         This walks the full trie (using nextleaf()) and searches for empty
103         leaves which have to be removed.          103         leaves which have to be removed.
104                                                   104 
105 fn_trie_dump()                                    105 fn_trie_dump()
106         Dumps the routing table ordered by pre    106         Dumps the routing table ordered by prefix length. This is somewhat
107         slower than the corresponding fib_hash    107         slower than the corresponding fib_hash function, as we have to walk the
108         entire trie for each prefix length. In    108         entire trie for each prefix length. In comparison, fib_hash is organized
109         as one "zone"/hash per prefix length.     109         as one "zone"/hash per prefix length.
110                                                   110 
111 Locking                                           111 Locking
112 -------                                           112 -------
113                                                   113 
114 fib_lock is used for an RW-lock in the same wa    114 fib_lock is used for an RW-lock in the same way that this is done in fib_hash.
115 However, the functions are somewhat separated     115 However, the functions are somewhat separated for other possible locking
116 scenarios. It might conceivably be possible to    116 scenarios. It might conceivably be possible to run trie_rebalance via RCU
117 to avoid read_lock in the fn_trie_lookup() fun    117 to avoid read_lock in the fn_trie_lookup() function.
118                                                   118 
119 Main lookup mechanism                             119 Main lookup mechanism
120 ---------------------                             120 ---------------------
121 fn_trie_lookup() is the main lookup function.     121 fn_trie_lookup() is the main lookup function.
122                                                   122 
123 The lookup is in its simplest form just like f    123 The lookup is in its simplest form just like fib_find_node(). We descend the
124 trie, key segment by key segment, until we fin    124 trie, key segment by key segment, until we find a leaf. check_leaf() does
125 the fib_semantic_match in the leaf's sorted pr    125 the fib_semantic_match in the leaf's sorted prefix hlist.
126                                                   126 
127 If we find a match, we are done.                  127 If we find a match, we are done.
128                                                   128 
129 If we don't find a match, we enter prefix matc    129 If we don't find a match, we enter prefix matching mode. The prefix length,
130 starting out at the same as the key length, is    130 starting out at the same as the key length, is reduced one step at a time,
131 and we backtrack upwards through the trie tryi    131 and we backtrack upwards through the trie trying to find a longest matching
132 prefix. The goal is always to reach a leaf and    132 prefix. The goal is always to reach a leaf and get a positive result from the
133 fib_semantic_match mechanism.                     133 fib_semantic_match mechanism.
134                                                   134 
135 Inside each tnode, the search for longest matc    135 Inside each tnode, the search for longest matching prefix consists of searching
136 through the child array, chopping off (zeroing    136 through the child array, chopping off (zeroing) the least significant "1" of
137 the child index until we find a match or the c    137 the child index until we find a match or the child index consists of nothing but
138 zeros.                                            138 zeros.
139                                                   139 
140 At this point we backtrack (t->stats.backtrack    140 At this point we backtrack (t->stats.backtrack++) up the trie, continuing to
141 chop off part of the key in order to find the     141 chop off part of the key in order to find the longest matching prefix.
142                                                   142 
143 At this point we will repeatedly descend subtr    143 At this point we will repeatedly descend subtries to look for a match, and there
144 are some optimizations available that can prov    144 are some optimizations available that can provide us with "shortcuts" to avoid
145 descending into dead ends. Look for "HL_OPTIMI    145 descending into dead ends. Look for "HL_OPTIMIZE" sections in the code.
146                                                   146 
147 To alleviate any doubts about the correctness     147 To alleviate any doubts about the correctness of the route selection process,
148 a new netlink operation has been added. Look f    148 a new netlink operation has been added. Look for NETLINK_FIB_LOOKUP, which
149 gives userland access to fib_lookup().            149 gives userland access to fib_lookup().
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php