~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/networking/openvswitch.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/networking/openvswitch.rst (Version linux-6.12-rc7) and /Documentation/networking/openvswitch.rst (Version linux-5.10.229)


  1 .. SPDX-License-Identifier: GPL-2.0                 1 .. SPDX-License-Identifier: GPL-2.0
  2                                                     2 
  3 =============================================       3 =============================================
  4 Open vSwitch datapath developer documentation       4 Open vSwitch datapath developer documentation
  5 =============================================       5 =============================================
  6                                                     6 
  7 The Open vSwitch kernel module allows flexible      7 The Open vSwitch kernel module allows flexible userspace control over
  8 flow-level packet processing on selected netwo      8 flow-level packet processing on selected network devices.  It can be
  9 used to implement a plain Ethernet switch, net      9 used to implement a plain Ethernet switch, network device bonding,
 10 VLAN processing, network access control, flow-     10 VLAN processing, network access control, flow-based network control,
 11 and so on.                                         11 and so on.
 12                                                    12 
 13 The kernel module implements multiple "datapat     13 The kernel module implements multiple "datapaths" (analogous to
 14 bridges), each of which can have multiple "vpo     14 bridges), each of which can have multiple "vports" (analogous to ports
 15 within a bridge).  Each datapath also has asso     15 within a bridge).  Each datapath also has associated with it a "flow
 16 table" that userspace populates with "flows" t     16 table" that userspace populates with "flows" that map from keys based
 17 on packet headers and metadata to sets of acti     17 on packet headers and metadata to sets of actions.  The most common
 18 action forwards the packet to another vport; o     18 action forwards the packet to another vport; other actions are also
 19 implemented.                                       19 implemented.
 20                                                    20 
 21 When a packet arrives on a vport, the kernel m     21 When a packet arrives on a vport, the kernel module processes it by
 22 extracting its flow key and looking it up in t     22 extracting its flow key and looking it up in the flow table.  If there
 23 is a matching flow, it executes the associated     23 is a matching flow, it executes the associated actions.  If there is
 24 no match, it queues the packet to userspace fo     24 no match, it queues the packet to userspace for processing (as part of
 25 its processing, userspace will likely set up a     25 its processing, userspace will likely set up a flow to handle further
 26 packets of the same type entirely in-kernel).      26 packets of the same type entirely in-kernel).
 27                                                    27 
 28                                                    28 
 29 Flow key compatibility                             29 Flow key compatibility
 30 ----------------------                             30 ----------------------
 31                                                    31 
 32 Network protocols evolve over time.  New proto     32 Network protocols evolve over time.  New protocols become important
 33 and existing protocols lose their prominence.      33 and existing protocols lose their prominence.  For the Open vSwitch
 34 kernel module to remain relevant, it must be p     34 kernel module to remain relevant, it must be possible for newer
 35 versions to parse additional protocols as part     35 versions to parse additional protocols as part of the flow key.  It
 36 might even be desirable, someday, to drop supp     36 might even be desirable, someday, to drop support for parsing
 37 protocols that have become obsolete.  Therefor     37 protocols that have become obsolete.  Therefore, the Netlink interface
 38 to Open vSwitch is designed to allow carefully     38 to Open vSwitch is designed to allow carefully written userspace
 39 applications to work with any version of the f     39 applications to work with any version of the flow key, past or future.
 40                                                    40 
 41 To support this forward and backward compatibi     41 To support this forward and backward compatibility, whenever the
 42 kernel module passes a packet to userspace, it     42 kernel module passes a packet to userspace, it also passes along the
 43 flow key that it parsed from the packet.  User     43 flow key that it parsed from the packet.  Userspace then extracts its
 44 own notion of a flow key from the packet and c     44 own notion of a flow key from the packet and compares it against the
 45 kernel-provided version:                           45 kernel-provided version:
 46                                                    46 
 47     - If userspace's notion of the flow key fo     47     - If userspace's notion of the flow key for the packet matches the
 48       kernel's, then nothing special is necess     48       kernel's, then nothing special is necessary.
 49                                                    49 
 50     - If the kernel's flow key includes more f     50     - If the kernel's flow key includes more fields than the userspace
 51       version of the flow key, for example if      51       version of the flow key, for example if the kernel decoded IPv6
 52       headers but userspace stopped at the Eth     52       headers but userspace stopped at the Ethernet type (because it
 53       does not understand IPv6), then again no     53       does not understand IPv6), then again nothing special is
 54       necessary.  Userspace can still set up a     54       necessary.  Userspace can still set up a flow in the usual way,
 55       as long as it uses the kernel-provided f     55       as long as it uses the kernel-provided flow key to do it.
 56                                                    56 
 57     - If the userspace flow key includes more      57     - If the userspace flow key includes more fields than the
 58       kernel's, for example if userspace decod     58       kernel's, for example if userspace decoded an IPv6 header but
 59       the kernel stopped at the Ethernet type,     59       the kernel stopped at the Ethernet type, then userspace can
 60       forward the packet manually, without set     60       forward the packet manually, without setting up a flow in the
 61       kernel.  This case is bad for performanc     61       kernel.  This case is bad for performance because every packet
 62       that the kernel considers part of the fl     62       that the kernel considers part of the flow must go to userspace,
 63       but the forwarding behavior is correct.      63       but the forwarding behavior is correct.  (If userspace can
 64       determine that the values of the extra f     64       determine that the values of the extra fields would not affect
 65       forwarding behavior, then it could set u     65       forwarding behavior, then it could set up a flow anyway.)
 66                                                    66 
 67 How flow keys evolve over time is important to     67 How flow keys evolve over time is important to making this work, so
 68 the following sections go into detail.             68 the following sections go into detail.
 69                                                    69 
 70                                                    70 
 71 Flow key format                                    71 Flow key format
 72 ---------------                                    72 ---------------
 73                                                    73 
 74 A flow key is passed over a Netlink socket as      74 A flow key is passed over a Netlink socket as a sequence of Netlink
 75 attributes.  Some attributes represent packet      75 attributes.  Some attributes represent packet metadata, defined as any
 76 information about a packet that cannot be extr     76 information about a packet that cannot be extracted from the packet
 77 itself, e.g. the vport on which the packet was     77 itself, e.g. the vport on which the packet was received.  Most
 78 attributes, however, are extracted from header     78 attributes, however, are extracted from headers within the packet,
 79 e.g. source and destination addresses from Eth     79 e.g. source and destination addresses from Ethernet, IP, or TCP
 80 headers.                                           80 headers.
 81                                                    81 
 82 The <linux/openvswitch.h> header file defines      82 The <linux/openvswitch.h> header file defines the exact format of the
 83 flow key attributes.  For informal explanatory     83 flow key attributes.  For informal explanatory purposes here, we write
 84 them as comma-separated strings, with parenthe     84 them as comma-separated strings, with parentheses indicating arguments
 85 and nesting.  For example, the following could     85 and nesting.  For example, the following could represent a flow key
 86 corresponding to a TCP packet that arrived on      86 corresponding to a TCP packet that arrived on vport 1::
 87                                                    87 
 88     in_port(1), eth(src=e0:91:f5:21:d0:b2, dst     88     in_port(1), eth(src=e0:91:f5:21:d0:b2, dst=00:02:e3:0f:80:a4),
 89     eth_type(0x0800), ipv4(src=172.16.0.20, ds     89     eth_type(0x0800), ipv4(src=172.16.0.20, dst=172.18.0.52, proto=17, tos=0,
 90     frag=no), tcp(src=49163, dst=80)               90     frag=no), tcp(src=49163, dst=80)
 91                                                    91 
 92 Often we ellipsize arguments not important to      92 Often we ellipsize arguments not important to the discussion, e.g.::
 93                                                    93 
 94     in_port(1), eth(...), eth_type(0x0800), ip     94     in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...)
 95                                                    95 
 96                                                    96 
 97 Wildcarded flow key format                         97 Wildcarded flow key format
 98 --------------------------                         98 --------------------------
 99                                                    99 
100 A wildcarded flow is described with two sequen    100 A wildcarded flow is described with two sequences of Netlink attributes
101 passed over the Netlink socket. A flow key, ex    101 passed over the Netlink socket. A flow key, exactly as described above, and an
102 optional corresponding flow mask.                 102 optional corresponding flow mask.
103                                                   103 
104 A wildcarded flow can represent a group of exa    104 A wildcarded flow can represent a group of exact match flows. Each '1' bit
105 in the mask specifies a exact match with the c    105 in the mask specifies a exact match with the corresponding bit in the flow key.
106 A '0' bit specifies a don't care bit, which wi    106 A '0' bit specifies a don't care bit, which will match either a '1' or '0' bit
107 of a incoming packet. Using wildcarded flow ca    107 of a incoming packet. Using wildcarded flow can improve the flow set up rate
108 by reduce the number of new flows need to be p    108 by reduce the number of new flows need to be processed by the user space program.
109                                                   109 
110 Support for the mask Netlink attribute is opti    110 Support for the mask Netlink attribute is optional for both the kernel and user
111 space program. The kernel can ignore the mask     111 space program. The kernel can ignore the mask attribute, installing an exact
112 match flow, or reduce the number of don't care    112 match flow, or reduce the number of don't care bits in the kernel to less than
113 what was specified by the user space program.     113 what was specified by the user space program. In this case, variations in bits
114 that the kernel does not implement will simply    114 that the kernel does not implement will simply result in additional flow setups.
115 The kernel module will also work with user spa    115 The kernel module will also work with user space programs that neither support
116 nor supply flow mask attributes.                  116 nor supply flow mask attributes.
117                                                   117 
118 Since the kernel may ignore or modify wildcard    118 Since the kernel may ignore or modify wildcard bits, it can be difficult for
119 the userspace program to know exactly what mat    119 the userspace program to know exactly what matches are installed. There are
120 two possible approaches: reactively install fl    120 two possible approaches: reactively install flows as they miss the kernel
121 flow table (and therefore not attempt to deter    121 flow table (and therefore not attempt to determine wildcard changes at all)
122 or use the kernel's response messages to deter    122 or use the kernel's response messages to determine the installed wildcards.
123                                                   123 
124 When interacting with userspace, the kernel sh    124 When interacting with userspace, the kernel should maintain the match portion
125 of the key exactly as originally installed. Th    125 of the key exactly as originally installed. This will provides a handle to
126 identify the flow for all future operations. H    126 identify the flow for all future operations. However, when reporting the
127 mask of an installed flow, the mask should inc    127 mask of an installed flow, the mask should include any restrictions imposed
128 by the kernel.                                    128 by the kernel.
129                                                   129 
130 The behavior when using overlapping wildcarded    130 The behavior when using overlapping wildcarded flows is undefined. It is the
131 responsibility of the user space program to en    131 responsibility of the user space program to ensure that any incoming packet
132 can match at most one flow, wildcarded or not.    132 can match at most one flow, wildcarded or not. The current implementation
133 performs best-effort detection of overlapping     133 performs best-effort detection of overlapping wildcarded flows and may reject
134 some but not all of them. However, this behavi    134 some but not all of them. However, this behavior may change in future versions.
135                                                   135 
136                                                   136 
137 Unique flow identifiers                           137 Unique flow identifiers
138 -----------------------                           138 -----------------------
139                                                   139 
140 An alternative to using the original match por    140 An alternative to using the original match portion of a key as the handle for
141 flow identification is a unique flow identifie    141 flow identification is a unique flow identifier, or "UFID". UFIDs are optional
142 for both the kernel and user space program.       142 for both the kernel and user space program.
143                                                   143 
144 User space programs that support UFID are expe    144 User space programs that support UFID are expected to provide it during flow
145 setup in addition to the flow, then refer to t    145 setup in addition to the flow, then refer to the flow using the UFID for all
146 future operations. The kernel is not required     146 future operations. The kernel is not required to index flows by the original
147 flow key if a UFID is specified.                  147 flow key if a UFID is specified.
148                                                   148 
149                                                   149 
150 Basic rule for evolving flow keys                 150 Basic rule for evolving flow keys
151 ---------------------------------                 151 ---------------------------------
152                                                   152 
153 Some care is needed to really maintain forward    153 Some care is needed to really maintain forward and backward
154 compatibility for applications that follow the    154 compatibility for applications that follow the rules listed under
155 "Flow key compatibility" above.                   155 "Flow key compatibility" above.
156                                                   156 
157 The basic rule is obvious::                       157 The basic rule is obvious::
158                                                   158 
159     ==========================================    159     ==================================================================
160     New network protocol support must only sup    160     New network protocol support must only supplement existing flow
161     key attributes.  It must not change the me    161     key attributes.  It must not change the meaning of already defined
162     flow key attributes.                          162     flow key attributes.
163     ==========================================    163     ==================================================================
164                                                   164 
165 This rule does have less-obvious consequences     165 This rule does have less-obvious consequences so it is worth working
166 through a few examples.  Suppose, for example,    166 through a few examples.  Suppose, for example, that the kernel module
167 did not already implement VLAN parsing.  Inste    167 did not already implement VLAN parsing.  Instead, it just interpreted
168 the 802.1Q TPID (0x8100) as the Ethertype then    168 the 802.1Q TPID (0x8100) as the Ethertype then stopped parsing the
169 packet.  The flow key for any packet with an 8    169 packet.  The flow key for any packet with an 802.1Q header would look
170 essentially like this, ignoring metadata::        170 essentially like this, ignoring metadata::
171                                                   171 
172     eth(...), eth_type(0x8100)                    172     eth(...), eth_type(0x8100)
173                                                   173 
174 Naively, to add VLAN support, it makes sense t    174 Naively, to add VLAN support, it makes sense to add a new "vlan" flow
175 key attribute to contain the VLAN tag, then co    175 key attribute to contain the VLAN tag, then continue to decode the
176 encapsulated headers beyond the VLAN tag using    176 encapsulated headers beyond the VLAN tag using the existing field
177 definitions.  With this change, a TCP packet i    177 definitions.  With this change, a TCP packet in VLAN 10 would have a
178 flow key much like this::                         178 flow key much like this::
179                                                   179 
180     eth(...), vlan(vid=10, pcp=0), eth_type(0x    180     eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...)
181                                                   181 
182 But this change would negatively affect a user    182 But this change would negatively affect a userspace application that
183 has not been updated to understand the new "vl    183 has not been updated to understand the new "vlan" flow key attribute.
184 The application could, following the flow comp    184 The application could, following the flow compatibility rules above,
185 ignore the "vlan" attribute that it does not u    185 ignore the "vlan" attribute that it does not understand and therefore
186 assume that the flow contained IP packets.  Th    186 assume that the flow contained IP packets.  This is a bad assumption
187 (the flow only contains IP packets if one pars    187 (the flow only contains IP packets if one parses and skips over the
188 802.1Q header) and it could cause the applicat    188 802.1Q header) and it could cause the application's behavior to change
189 across kernel versions even though it follows     189 across kernel versions even though it follows the compatibility rules.
190                                                   190 
191 The solution is to use a set of nested attribu    191 The solution is to use a set of nested attributes.  This is, for
192 example, why 802.1Q support uses nested attrib    192 example, why 802.1Q support uses nested attributes.  A TCP packet in
193 VLAN 10 is actually expressed as::                193 VLAN 10 is actually expressed as::
194                                                   194 
195     eth(...), eth_type(0x8100), vlan(vid=10, p    195     eth(...), eth_type(0x8100), vlan(vid=10, pcp=0), encap(eth_type(0x0800),
196     ip(proto=6, ...), tcp(...)))                  196     ip(proto=6, ...), tcp(...)))
197                                                   197 
198 Notice how the "eth_type", "ip", and "tcp" flo    198 Notice how the "eth_type", "ip", and "tcp" flow key attributes are
199 nested inside the "encap" attribute.  Thus, an    199 nested inside the "encap" attribute.  Thus, an application that does
200 not understand the "vlan" key will not see eit    200 not understand the "vlan" key will not see either of those attributes
201 and therefore will not misinterpret them.  (Al    201 and therefore will not misinterpret them.  (Also, the outer eth_type
202 is still 0x8100, not changed to 0x0800.)          202 is still 0x8100, not changed to 0x0800.)
203                                                   203 
204 Handling malformed packets                        204 Handling malformed packets
205 --------------------------                        205 --------------------------
206                                                   206 
207 Don't drop packets in the kernel for malformed    207 Don't drop packets in the kernel for malformed protocol headers, bad
208 checksums, etc.  This would prevent userspace     208 checksums, etc.  This would prevent userspace from implementing a
209 simple Ethernet switch that forwards every pac    209 simple Ethernet switch that forwards every packet.
210                                                   210 
211 Instead, in such a case, include an attribute     211 Instead, in such a case, include an attribute with "empty" content.
212 It doesn't matter if the empty content could b    212 It doesn't matter if the empty content could be valid protocol values,
213 as long as those values are rarely seen in pra    213 as long as those values are rarely seen in practice, because userspace
214 can always forward all packets with those valu    214 can always forward all packets with those values to userspace and
215 handle them individually.                         215 handle them individually.
216                                                   216 
217 For example, consider a packet that contains a    217 For example, consider a packet that contains an IP header that
218 indicates protocol 6 for TCP, but which is tru    218 indicates protocol 6 for TCP, but which is truncated just after the IP
219 header, so that the TCP header is missing.  Th    219 header, so that the TCP header is missing.  The flow key for this
220 packet would include a tcp attribute with all-    220 packet would include a tcp attribute with all-zero src and dst, like
221 this::                                            221 this::
222                                                   222 
223     eth(...), eth_type(0x0800), ip(proto=6, ..    223     eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0)
224                                                   224 
225 As another example, consider a packet with an     225 As another example, consider a packet with an Ethernet type of 0x8100,
226 indicating that a VLAN TCI should follow, but     226 indicating that a VLAN TCI should follow, but which is truncated just
227 after the Ethernet type.  The flow key for thi    227 after the Ethernet type.  The flow key for this packet would include
228 an all-zero-bits vlan and an empty encap attri    228 an all-zero-bits vlan and an empty encap attribute, like this::
229                                                   229 
230     eth(...), eth_type(0x8100), vlan(0), encap    230     eth(...), eth_type(0x8100), vlan(0), encap()
231                                                   231 
232 Unlike a TCP packet with source and destinatio    232 Unlike a TCP packet with source and destination ports 0, an
233 all-zero-bits VLAN TCI is not that rare, so th    233 all-zero-bits VLAN TCI is not that rare, so the CFI bit (aka
234 VLAN_TAG_PRESENT inside the kernel) is ordinar    234 VLAN_TAG_PRESENT inside the kernel) is ordinarily set in a vlan
235 attribute expressly to allow this situation to    235 attribute expressly to allow this situation to be distinguished.
236 Thus, the flow key in this second example unam    236 Thus, the flow key in this second example unambiguously indicates a
237 missing or malformed VLAN TCI.                    237 missing or malformed VLAN TCI.
238                                                   238 
239 Other rules                                       239 Other rules
240 -----------                                       240 -----------
241                                                   241 
242 The other rules for flow keys are much less su    242 The other rules for flow keys are much less subtle:
243                                                   243 
244     - Duplicate attributes are not allowed at     244     - Duplicate attributes are not allowed at a given nesting level.
245                                                   245 
246     - Ordering of attributes is not significan    246     - Ordering of attributes is not significant.
247                                                   247 
248     - When the kernel sends a given flow key t    248     - When the kernel sends a given flow key to userspace, it always
249       composes it the same way.  This allows u    249       composes it the same way.  This allows userspace to hash and
250       compare entire flow keys that it may not    250       compare entire flow keys that it may not be able to fully
251       interpret.                                  251       interpret.
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php