1 .. SPDX-License-Identifier: GPL-2.0 2 3 ============================================= 4 Open vSwitch datapath developer documentation 5 ============================================= 6 7 The Open vSwitch kernel module allows flexible 8 flow-level packet processing on selected netwo 9 used to implement a plain Ethernet switch, net 10 VLAN processing, network access control, flow- 11 and so on. 12 13 The kernel module implements multiple "datapat 14 bridges), each of which can have multiple "vpo 15 within a bridge). Each datapath also has asso 16 table" that userspace populates with "flows" t 17 on packet headers and metadata to sets of acti 18 action forwards the packet to another vport; o 19 implemented. 20 21 When a packet arrives on a vport, the kernel m 22 extracting its flow key and looking it up in t 23 is a matching flow, it executes the associated 24 no match, it queues the packet to userspace fo 25 its processing, userspace will likely set up a 26 packets of the same type entirely in-kernel). 27 28 29 Flow key compatibility 30 ---------------------- 31 32 Network protocols evolve over time. New proto 33 and existing protocols lose their prominence. 34 kernel module to remain relevant, it must be p 35 versions to parse additional protocols as part 36 might even be desirable, someday, to drop supp 37 protocols that have become obsolete. Therefor 38 to Open vSwitch is designed to allow carefully 39 applications to work with any version of the f 40 41 To support this forward and backward compatibi 42 kernel module passes a packet to userspace, it 43 flow key that it parsed from the packet. User 44 own notion of a flow key from the packet and c 45 kernel-provided version: 46 47 - If userspace's notion of the flow key fo 48 kernel's, then nothing special is necess 49 50 - If the kernel's flow key includes more f 51 version of the flow key, for example if 52 headers but userspace stopped at the Eth 53 does not understand IPv6), then again no 54 necessary. Userspace can still set up a 55 as long as it uses the kernel-provided f 56 57 - If the userspace flow key includes more 58 kernel's, for example if userspace decod 59 the kernel stopped at the Ethernet type, 60 forward the packet manually, without set 61 kernel. This case is bad for performanc 62 that the kernel considers part of the fl 63 but the forwarding behavior is correct. 64 determine that the values of the extra f 65 forwarding behavior, then it could set u 66 67 How flow keys evolve over time is important to 68 the following sections go into detail. 69 70 71 Flow key format 72 --------------- 73 74 A flow key is passed over a Netlink socket as 75 attributes. Some attributes represent packet 76 information about a packet that cannot be extr 77 itself, e.g. the vport on which the packet was 78 attributes, however, are extracted from header 79 e.g. source and destination addresses from Eth 80 headers. 81 82 The <linux/openvswitch.h> header file defines 83 flow key attributes. For informal explanatory 84 them as comma-separated strings, with parenthe 85 and nesting. For example, the following could 86 corresponding to a TCP packet that arrived on 87 88 in_port(1), eth(src=e0:91:f5:21:d0:b2, dst 89 eth_type(0x0800), ipv4(src=172.16.0.20, ds 90 frag=no), tcp(src=49163, dst=80) 91 92 Often we ellipsize arguments not important to 93 94 in_port(1), eth(...), eth_type(0x0800), ip 95 96 97 Wildcarded flow key format 98 -------------------------- 99 100 A wildcarded flow is described with two sequen 101 passed over the Netlink socket. A flow key, ex 102 optional corresponding flow mask. 103 104 A wildcarded flow can represent a group of exa 105 in the mask specifies a exact match with the c 106 A '0' bit specifies a don't care bit, which wi 107 of a incoming packet. Using wildcarded flow ca 108 by reduce the number of new flows need to be p 109 110 Support for the mask Netlink attribute is opti 111 space program. The kernel can ignore the mask 112 match flow, or reduce the number of don't care 113 what was specified by the user space program. 114 that the kernel does not implement will simply 115 The kernel module will also work with user spa 116 nor supply flow mask attributes. 117 118 Since the kernel may ignore or modify wildcard 119 the userspace program to know exactly what mat 120 two possible approaches: reactively install fl 121 flow table (and therefore not attempt to deter 122 or use the kernel's response messages to deter 123 124 When interacting with userspace, the kernel sh 125 of the key exactly as originally installed. Th 126 identify the flow for all future operations. H 127 mask of an installed flow, the mask should inc 128 by the kernel. 129 130 The behavior when using overlapping wildcarded 131 responsibility of the user space program to en 132 can match at most one flow, wildcarded or not. 133 performs best-effort detection of overlapping 134 some but not all of them. However, this behavi 135 136 137 Unique flow identifiers 138 ----------------------- 139 140 An alternative to using the original match por 141 flow identification is a unique flow identifie 142 for both the kernel and user space program. 143 144 User space programs that support UFID are expe 145 setup in addition to the flow, then refer to t 146 future operations. The kernel is not required 147 flow key if a UFID is specified. 148 149 150 Basic rule for evolving flow keys 151 --------------------------------- 152 153 Some care is needed to really maintain forward 154 compatibility for applications that follow the 155 "Flow key compatibility" above. 156 157 The basic rule is obvious:: 158 159 ========================================== 160 New network protocol support must only sup 161 key attributes. It must not change the me 162 flow key attributes. 163 ========================================== 164 165 This rule does have less-obvious consequences 166 through a few examples. Suppose, for example, 167 did not already implement VLAN parsing. Inste 168 the 802.1Q TPID (0x8100) as the Ethertype then 169 packet. The flow key for any packet with an 8 170 essentially like this, ignoring metadata:: 171 172 eth(...), eth_type(0x8100) 173 174 Naively, to add VLAN support, it makes sense t 175 key attribute to contain the VLAN tag, then co 176 encapsulated headers beyond the VLAN tag using 177 definitions. With this change, a TCP packet i 178 flow key much like this:: 179 180 eth(...), vlan(vid=10, pcp=0), eth_type(0x 181 182 But this change would negatively affect a user 183 has not been updated to understand the new "vl 184 The application could, following the flow comp 185 ignore the "vlan" attribute that it does not u 186 assume that the flow contained IP packets. Th 187 (the flow only contains IP packets if one pars 188 802.1Q header) and it could cause the applicat 189 across kernel versions even though it follows 190 191 The solution is to use a set of nested attribu 192 example, why 802.1Q support uses nested attrib 193 VLAN 10 is actually expressed as:: 194 195 eth(...), eth_type(0x8100), vlan(vid=10, p 196 ip(proto=6, ...), tcp(...))) 197 198 Notice how the "eth_type", "ip", and "tcp" flo 199 nested inside the "encap" attribute. Thus, an 200 not understand the "vlan" key will not see eit 201 and therefore will not misinterpret them. (Al 202 is still 0x8100, not changed to 0x0800.) 203 204 Handling malformed packets 205 -------------------------- 206 207 Don't drop packets in the kernel for malformed 208 checksums, etc. This would prevent userspace 209 simple Ethernet switch that forwards every pac 210 211 Instead, in such a case, include an attribute 212 It doesn't matter if the empty content could b 213 as long as those values are rarely seen in pra 214 can always forward all packets with those valu 215 handle them individually. 216 217 For example, consider a packet that contains a 218 indicates protocol 6 for TCP, but which is tru 219 header, so that the TCP header is missing. Th 220 packet would include a tcp attribute with all- 221 this:: 222 223 eth(...), eth_type(0x0800), ip(proto=6, .. 224 225 As another example, consider a packet with an 226 indicating that a VLAN TCI should follow, but 227 after the Ethernet type. The flow key for thi 228 an all-zero-bits vlan and an empty encap attri 229 230 eth(...), eth_type(0x8100), vlan(0), encap 231 232 Unlike a TCP packet with source and destinatio 233 all-zero-bits VLAN TCI is not that rare, so th 234 VLAN_TAG_PRESENT inside the kernel) is ordinar 235 attribute expressly to allow this situation to 236 Thus, the flow key in this second example unam 237 missing or malformed VLAN TCI. 238 239 Other rules 240 ----------- 241 242 The other rules for flow keys are much less su 243 244 - Duplicate attributes are not allowed at 245 246 - Ordering of attributes is not significan 247 248 - When the kernel sends a given flow key t 249 composes it the same way. This allows u 250 compare entire flow keys that it may not 251 interpret.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.