~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/networking/devlink/devlink-dpipe.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/networking/devlink/devlink-dpipe.rst (Architecture m68k) and /Documentation/networking/devlink/devlink-dpipe.rst (Architecture i386)


  1 .. SPDX-License-Identifier: GPL-2.0                 1 .. SPDX-License-Identifier: GPL-2.0
  2                                                     2 
  3 =============                                       3 =============
  4 Devlink DPIPE                                       4 Devlink DPIPE
  5 =============                                       5 =============
  6                                                     6 
  7 Background                                          7 Background
  8 ==========                                          8 ==========
  9                                                     9 
 10 While performing the hardware offloading proce     10 While performing the hardware offloading process, much of the hardware
 11 specifics cannot be presented. These details a     11 specifics cannot be presented. These details are useful for debugging, and
 12 ``devlink-dpipe`` provides a standardized way      12 ``devlink-dpipe`` provides a standardized way to provide visibility into the
 13 offloading process.                                13 offloading process.
 14                                                    14 
 15 For example, the routing longest prefix match      15 For example, the routing longest prefix match (LPM) algorithm used by the
 16 Linux kernel may differ from the hardware impl     16 Linux kernel may differ from the hardware implementation. The pipeline debug
 17 API (DPIPE) is aimed at providing the user vis     17 API (DPIPE) is aimed at providing the user visibility into the ASIC's
 18 pipeline in a generic way.                         18 pipeline in a generic way.
 19                                                    19 
 20 The hardware offload process is expected to be     20 The hardware offload process is expected to be done in a way that the user
 21 should not be able to distinguish between the      21 should not be able to distinguish between the hardware vs. software
 22 implementation. In this process, hardware spec     22 implementation. In this process, hardware specifics are neglected. In
 23 reality those details can have lots of meaning     23 reality those details can have lots of meaning and should be exposed in some
 24 standard way.                                      24 standard way.
 25                                                    25 
 26 This problem is made even more complex when on     26 This problem is made even more complex when one wishes to offload the
 27 control path of the whole networking stack to      27 control path of the whole networking stack to a switch ASIC. Due to
 28 differences in the hardware and software model     28 differences in the hardware and software models some processes cannot be
 29 represented correctly.                             29 represented correctly.
 30                                                    30 
 31 One example is the kernel's LPM algorithm whic     31 One example is the kernel's LPM algorithm which in many cases differs
 32 greatly to the hardware implementation. The co     32 greatly to the hardware implementation. The configuration API is the same,
 33 but one cannot rely on the Forward Information     33 but one cannot rely on the Forward Information Base (FIB) to look like the
 34 Level Path Compression trie (LPC-trie) in hard     34 Level Path Compression trie (LPC-trie) in hardware.
 35                                                    35 
 36 In many situations trying to analyze systems f     36 In many situations trying to analyze systems failure solely based on the
 37 kernel's dump may not be enough. By combining      37 kernel's dump may not be enough. By combining this data with complementary
 38 information about the underlying hardware, thi     38 information about the underlying hardware, this debugging can be made
 39 easier; additionally, the information can be u     39 easier; additionally, the information can be useful when debugging
 40 performance issues.                                40 performance issues.
 41                                                    41 
 42 Overview                                           42 Overview
 43 ========                                           43 ========
 44                                                    44 
 45 The ``devlink-dpipe`` interface closes this ga     45 The ``devlink-dpipe`` interface closes this gap. The hardware's pipeline is
 46 modeled as a graph of match/action tables. Eac     46 modeled as a graph of match/action tables. Each table represents a specific
 47 hardware block. This model is not new, first b     47 hardware block. This model is not new, first being used by the P4 language.
 48                                                    48 
 49 Traditionally it has been used as an alternati     49 Traditionally it has been used as an alternative model for hardware
 50 configuration, but the ``devlink-dpipe`` inter     50 configuration, but the ``devlink-dpipe`` interface uses it for visibility
 51 purposes as a standard complementary tool. The     51 purposes as a standard complementary tool. The system's view from
 52 ``devlink-dpipe`` should change according to t     52 ``devlink-dpipe`` should change according to the changes done by the
 53 standard configuration tools.                      53 standard configuration tools.
 54                                                    54 
 55 For example, it’s quite common to  implement     55 For example, it’s quite common to  implement Access Control Lists (ACL)
 56 using Ternary Content Addressable Memory (TCAM     56 using Ternary Content Addressable Memory (TCAM). The TCAM memory can be
 57 divided into TCAM regions. Complex TC filters      57 divided into TCAM regions. Complex TC filters can have multiple rules with
 58 different priorities and different lookup keys     58 different priorities and different lookup keys. On the other hand hardware
 59 TCAM regions have a predefined lookup key. Off     59 TCAM regions have a predefined lookup key. Offloading the TC filter rules
 60 using TCAM engine can result in multiple TCAM      60 using TCAM engine can result in multiple TCAM regions being interconnected
 61 in a chain (which may affect the data path lat     61 in a chain (which may affect the data path latency). In response to a new TC
 62 filter new tables should be created describing     62 filter new tables should be created describing those regions.
 63                                                    63 
 64 Model                                              64 Model
 65 =====                                              65 =====
 66                                                    66 
 67 The ``DPIPE`` model introduces several objects     67 The ``DPIPE`` model introduces several objects:
 68                                                    68 
 69   * headers                                        69   * headers
 70   * tables                                         70   * tables
 71   * entries                                        71   * entries
 72                                                    72 
 73 A ``header`` describes packet formats and prov     73 A ``header`` describes packet formats and provides names for fields within
 74 the packet. A ``table`` describes hardware blo     74 the packet. A ``table`` describes hardware blocks. An ``entry`` describes
 75 the actual content of a specific table.            75 the actual content of a specific table.
 76                                                    76 
 77 The hardware pipeline is not port specific, bu     77 The hardware pipeline is not port specific, but rather describes the whole
 78 ASIC. Thus it is tied to the top of the ``devl     78 ASIC. Thus it is tied to the top of the ``devlink`` infrastructure.
 79                                                    79 
 80 Drivers can register and unregister tables at      80 Drivers can register and unregister tables at run time, in order to support
 81 dynamic behavior. This dynamic behavior is man     81 dynamic behavior. This dynamic behavior is mandatory for describing hardware
 82 blocks like TCAM regions which can be allocate     82 blocks like TCAM regions which can be allocated and freed dynamically.
 83                                                    83 
 84 ``devlink-dpipe`` generally is not intended fo     84 ``devlink-dpipe`` generally is not intended for configuration. The exception
 85 is hardware counting for a specific table.         85 is hardware counting for a specific table.
 86                                                    86 
 87 The following commands are used to obtain the      87 The following commands are used to obtain the ``dpipe`` objects from
 88 userspace:                                         88 userspace:
 89                                                    89 
 90   * ``table_get``: Receive a table's descripti     90   * ``table_get``: Receive a table's description.
 91   * ``headers_get``: Receive a device's suppor     91   * ``headers_get``: Receive a device's supported headers.
 92   * ``entries_get``: Receive a table's current     92   * ``entries_get``: Receive a table's current entries.
 93   * ``counters_set``: Enable or disable counte     93   * ``counters_set``: Enable or disable counters on a table.
 94                                                    94 
 95 Table                                              95 Table
 96 -----                                              96 -----
 97                                                    97 
 98 The driver should implement the following oper     98 The driver should implement the following operations for each table:
 99                                                    99 
100   * ``matches_dump``: Dump the supported match    100   * ``matches_dump``: Dump the supported matches.
101   * ``actions_dump``: Dump the supported actio    101   * ``actions_dump``: Dump the supported actions.
102   * ``entries_dump``: Dump the actual content     102   * ``entries_dump``: Dump the actual content of the table.
103   * ``counters_set_update``: Synchronize hardw    103   * ``counters_set_update``: Synchronize hardware with counters enabled or
104     disabled.                                     104     disabled.
105                                                   105 
106 Header/Field                                      106 Header/Field
107 ------------                                      107 ------------
108                                                   108 
109 In a similar way to P4 headers and fields are     109 In a similar way to P4 headers and fields are used to describe a table's
110 behavior. There is a slight difference between    110 behavior. There is a slight difference between the standard protocol headers
111 and specific ASIC metadata. The protocol heade    111 and specific ASIC metadata. The protocol headers should be declared in the
112 ``devlink`` core API. On the other hand ASIC m    112 ``devlink`` core API. On the other hand ASIC meta data is driver specific
113 and should be defined in the driver. Additiona    113 and should be defined in the driver. Additionally, each driver-specific
114 devlink documentation file should document the    114 devlink documentation file should document the driver-specific ``dpipe``
115 headers it implements. The headers and fields     115 headers it implements. The headers and fields are identified by enumeration.
116                                                   116 
117 In order to provide further visibility some AS    117 In order to provide further visibility some ASIC metadata fields could be
118 mapped to kernel objects. For example, interna    118 mapped to kernel objects. For example, internal router interface indexes can
119 be directly mapped to the net device ifindex.     119 be directly mapped to the net device ifindex. FIB table indexes used by
120 different Virtual Routing and Forwarding (VRF)    120 different Virtual Routing and Forwarding (VRF) tables can be mapped to
121 internal routing table indexes.                   121 internal routing table indexes.
122                                                   122 
123 Match                                             123 Match
124 -----                                             124 -----
125                                                   125 
126 Matches are kept primitive and close to hardwa    126 Matches are kept primitive and close to hardware operation. Match types like
127 LPM are not supported due to the fact that thi    127 LPM are not supported due to the fact that this is exactly a process we wish
128 to describe in full detail. Example of matches    128 to describe in full detail. Example of matches:
129                                                   129 
130   * ``field_exact``: Exact match on a specific    130   * ``field_exact``: Exact match on a specific field.
131   * ``field_exact_mask``: Exact match on a spe    131   * ``field_exact_mask``: Exact match on a specific field after masking.
132   * ``field_range``: Match on a specific range    132   * ``field_range``: Match on a specific range.
133                                                   133 
134 The id's of the header and the field should be    134 The id's of the header and the field should be specified in order to
135 identify the specific field. Furthermore, the     135 identify the specific field. Furthermore, the header index should be
136 specified in order to distinguish multiple hea    136 specified in order to distinguish multiple headers of the same type in a
137 packet (tunneling).                               137 packet (tunneling).
138                                                   138 
139 Action                                            139 Action
140 ------                                            140 ------
141                                                   141 
142 Similar to match, the actions are kept primiti    142 Similar to match, the actions are kept primitive and close to hardware
143 operation. For example:                           143 operation. For example:
144                                                   144 
145   * ``field_modify``: Modify the field value.     145   * ``field_modify``: Modify the field value.
146   * ``field_inc``: Increment the field value.     146   * ``field_inc``: Increment the field value.
147   * ``push_header``: Add a header.                147   * ``push_header``: Add a header.
148   * ``pop_header``: Remove a header.              148   * ``pop_header``: Remove a header.
149                                                   149 
150 Entry                                             150 Entry
151 -----                                             151 -----
152                                                   152 
153 Entries of a specific table can be dumped on d    153 Entries of a specific table can be dumped on demand. Each eentry is
154 identified with an index and its properties ar    154 identified with an index and its properties are described by a list of
155 match/action values and specific counter. By d    155 match/action values and specific counter. By dumping the tables content the
156 interactions between tables can be resolved.      156 interactions between tables can be resolved.
157                                                   157 
158 Abstraction Example                               158 Abstraction Example
159 ===================                               159 ===================
160                                                   160 
161 The following is an example of the abstraction    161 The following is an example of the abstraction model of the L3 part of
162 Mellanox Spectrum ASIC. The blocks are describ    162 Mellanox Spectrum ASIC. The blocks are described in the order they appear in
163 the pipeline. The table sizes in the following    163 the pipeline. The table sizes in the following examples are not real
164 hardware sizes and are provided for demonstrat    164 hardware sizes and are provided for demonstration purposes.
165                                                   165 
166 LPM                                               166 LPM
167 ---                                               167 ---
168                                                   168 
169 The LPM algorithm can be implemented as a list    169 The LPM algorithm can be implemented as a list of hash tables. Each hash
170 table contains routes with the same prefix len    170 table contains routes with the same prefix length. The root of the list is
171 /32, and in case of a miss the hardware will c    171 /32, and in case of a miss the hardware will continue to the next hash
172 table. The depth of the search will affect the    172 table. The depth of the search will affect the data path latency.
173                                                   173 
174 In case of a hit the entry contains informatio    174 In case of a hit the entry contains information about the next stage of the
175 pipeline which resolves the MAC address. The n    175 pipeline which resolves the MAC address. The next stage can be either local
176 host table for directly connected routes, or a    176 host table for directly connected routes, or adjacency table for next-hops.
177 The ``meta.lpm_prefix`` field is used to conne    177 The ``meta.lpm_prefix`` field is used to connect two LPM tables.
178                                                   178 
179 .. code::                                         179 .. code::
180                                                   180 
181     table lpm_prefix_16 {                         181     table lpm_prefix_16 {
182       size: 4096,                                 182       size: 4096,
183       counters_enabled: true,                     183       counters_enabled: true,
184       match: { meta.vr_id: exact,                 184       match: { meta.vr_id: exact,
185                ipv4.dst_addr: exact_mask,         185                ipv4.dst_addr: exact_mask,
186                ipv6.dst_addr: exact_mask,         186                ipv6.dst_addr: exact_mask,
187                meta.lpm_prefix: exact },          187                meta.lpm_prefix: exact },
188       action: { meta.adj_index: set,              188       action: { meta.adj_index: set,
189                 meta.adj_group_size: set,         189                 meta.adj_group_size: set,
190                 meta.rif_port: set,               190                 meta.rif_port: set,
191                 meta.lpm_prefix: set },           191                 meta.lpm_prefix: set },
192     }                                             192     }
193                                                   193 
194 Local Host                                        194 Local Host
195 ----------                                        195 ----------
196                                                   196 
197 In the case of local routes the LPM lookup alr    197 In the case of local routes the LPM lookup already resolves the egress
198 router interface (RIF), yet the exact MAC addr    198 router interface (RIF), yet the exact MAC address is not known. The local
199 host table is a hash table combining the outpu    199 host table is a hash table combining the output interface id with
200 destination IP address as a key. The result is    200 destination IP address as a key. The result is the MAC address.
201                                                   201 
202 .. code::                                         202 .. code::
203                                                   203 
204     table local_host {                            204     table local_host {
205       size: 4096,                                 205       size: 4096,
206       counters_enabled: true,                     206       counters_enabled: true,
207       match: { meta.rif_port: exact,              207       match: { meta.rif_port: exact,
208                ipv4.dst_addr: exact},             208                ipv4.dst_addr: exact},
209       action: { ethernet.daddr: set }             209       action: { ethernet.daddr: set }
210     }                                             210     }
211                                                   211 
212 Adjacency                                         212 Adjacency
213 ---------                                         213 ---------
214                                                   214 
215 In case of remote routes this table does the E    215 In case of remote routes this table does the ECMP. The LPM lookup results in
216 ECMP group size and index that serves as a glo    216 ECMP group size and index that serves as a global offset into this table.
217 Concurrently a hash of the packet is generated    217 Concurrently a hash of the packet is generated. Based on the ECMP group size
218 and the packet's hash a local offset is genera    218 and the packet's hash a local offset is generated. Multiple LPM entries can
219 point to the same adjacency group.                219 point to the same adjacency group.
220                                                   220 
221 .. code::                                         221 .. code::
222                                                   222 
223     table adjacency {                             223     table adjacency {
224       size: 4096,                                 224       size: 4096,
225       counters_enabled: true,                     225       counters_enabled: true,
226       match: { meta.adj_index: exact,             226       match: { meta.adj_index: exact,
227                meta.adj_group_size: exact,        227                meta.adj_group_size: exact,
228                meta.packet_hash_index: exact }    228                meta.packet_hash_index: exact },
229       action: { ethernet.daddr: set,              229       action: { ethernet.daddr: set,
230                 meta.erif: set }                  230                 meta.erif: set }
231     }                                             231     }
232                                                   232 
233 ERIF                                              233 ERIF
234 ----                                              234 ----
235                                                   235 
236 In case the egress RIF and destination MAC hav    236 In case the egress RIF and destination MAC have been resolved by previous
237 tables this table does multiple operations lik    237 tables this table does multiple operations like TTL decrease and MTU check.
238 Then the decision of forward/drop is taken and    238 Then the decision of forward/drop is taken and the port L3 statistics are
239 updated based on the packet's type (broadcast,    239 updated based on the packet's type (broadcast, unicast, multicast).
240                                                   240 
241 .. code::                                         241 .. code::
242                                                   242 
243     table erif {                                  243     table erif {
244       size: 800,                                  244       size: 800,
245       counters_enabled: true,                     245       counters_enabled: true,
246       match: { meta.rif_port: exact,              246       match: { meta.rif_port: exact,
247                meta.is_l3_unicast: exact,         247                meta.is_l3_unicast: exact,
248                meta.is_l3_broadcast: exact,       248                meta.is_l3_broadcast: exact,
249                meta.is_l3_multicast, exact },     249                meta.is_l3_multicast, exact },
250       action: { meta.l3_drop: set,                250       action: { meta.l3_drop: set,
251                 meta.l3_forward: set }            251                 meta.l3_forward: set }
252     }                                             252     }
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php