~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/networking/nf_flowtable.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 .. SPDX-License-Identifier: GPL-2.0
  2 
  3 ====================================
  4 Netfilter's flowtable infrastructure
  5 ====================================
  6 
  7 This documentation describes the Netfilter flowtable infrastructure which allows
  8 you to define a fastpath through the flowtable datapath. This infrastructure
  9 also provides hardware offload support. The flowtable supports for the layer 3
 10 IPv4 and IPv6 and the layer 4 TCP and UDP protocols.
 11 
 12 Overview
 13 --------
 14 
 15 Once the first packet of the flow successfully goes through the IP forwarding
 16 path, from the second packet on, you might decide to offload the flow to the
 17 flowtable through your ruleset. The flowtable infrastructure provides a rule
 18 action that allows you to specify when to add a flow to the flowtable.
 19 
 20 A packet that finds a matching entry in the flowtable (ie. flowtable hit) is
 21 transmitted to the output netdevice via neigh_xmit(), hence, packets bypass the
 22 classic IP forwarding path (the visible effect is that you do not see these
 23 packets from any of the Netfilter hooks coming after ingress). In case that
 24 there is no matching entry in the flowtable (ie. flowtable miss), the packet
 25 follows the classic IP forwarding path.
 26 
 27 The flowtable uses a resizable hashtable. Lookups are based on the following
 28 n-tuple selectors: layer 2 protocol encapsulation (VLAN and PPPoE), layer 3
 29 source and destination, layer 4 source and destination ports and the input
 30 interface (useful in case there are several conntrack zones in place).
 31 
 32 The 'flow add' action allows you to populate the flowtable, the user selectively
 33 specifies what flows are placed into the flowtable. Hence, packets follow the
 34 classic IP forwarding path unless the user explicitly instruct flows to use this
 35 new alternative forwarding path via policy.
 36 
 37 The flowtable datapath is represented in Fig.1, which describes the classic IP
 38 forwarding path including the Netfilter hooks and the flowtable fastpath bypass.
 39 
 40 ::
 41 
 42                                          userspace process
 43                                           ^              |
 44                                           |              |
 45                                      _____|____     ____\/___
 46                                     /          \   /         \
 47                                     |   input   |  |  output  |
 48                                     \__________/   \_________/
 49                                          ^               |
 50                                          |               |
 51       _________      __________      ---------     _____\/_____
 52      /         \    /          \     |Routing |   /            \
 53   -->  ingress  ---> prerouting ---> |decision|   | postrouting |--> neigh_xmit
 54      \_________/    \__________/     ----------   \____________/          ^
 55        |      ^                          |               ^                |
 56    flowtable  |                     ____\/___            |                |
 57        |      |                    /         \           |                |
 58     __\/___   |                    | forward |------------                |
 59     |-----|   |                    \_________/                            |
 60     |-----|   |                 'flow offload' rule                       |
 61     |-----|   |                   adds entry to                           |
 62     |_____|   |                     flowtable                             |
 63        |      |                                                           |
 64       / \     |                                                           |
 65      /hit\_no_|                                                           |
 66      \ ? /                                                                |
 67       \ /                                                                 |
 68        |__yes_________________fastpath bypass ____________________________|
 69 
 70                Fig.1 Netfilter hooks and flowtable interactions
 71 
 72 The flowtable entry also stores the NAT configuration, so all packets are
 73 mangled according to the NAT policy that is specified from the classic IP
 74 forwarding path. The TTL is decremented before calling neigh_xmit(). Fragmented
 75 traffic is passed up to follow the classic IP forwarding path given that the
 76 transport header is missing, in this case, flowtable lookups are not possible.
 77 TCP RST and FIN packets are also passed up to the classic IP forwarding path to
 78 release the flow gracefully. Packets that exceed the MTU are also passed up to
 79 the classic forwarding path to report packet-too-big ICMP errors to the sender.
 80 
 81 Example configuration
 82 ---------------------
 83 
 84 Enabling the flowtable bypass is relatively easy, you only need to create a
 85 flowtable and add one rule to your forward chain::
 86 
 87         table inet x {
 88                 flowtable f {
 89                         hook ingress priority 0; devices = { eth0, eth1 };
 90                 }
 91                 chain y {
 92                         type filter hook forward priority 0; policy accept;
 93                         ip protocol tcp flow add @f
 94                         counter packets 0 bytes 0
 95                 }
 96         }
 97 
 98 This example adds the flowtable 'f' to the ingress hook of the eth0 and eth1
 99 netdevices. You can create as many flowtables as you want in case you need to
100 perform resource partitioning. The flowtable priority defines the order in which
101 hooks are run in the pipeline, this is convenient in case you already have a
102 nftables ingress chain (make sure the flowtable priority is smaller than the
103 nftables ingress chain hence the flowtable runs before in the pipeline).
104 
105 The 'flow offload' action from the forward chain 'y' adds an entry to the
106 flowtable for the TCP syn-ack packet coming in the reply direction. Once the
107 flow is offloaded, you will observe that the counter rule in the example above
108 does not get updated for the packets that are being forwarded through the
109 forwarding bypass.
110 
111 You can identify offloaded flows through the [OFFLOAD] tag when listing your
112 connection tracking table.
113 
114 ::
115 
116         # conntrack -L
117         tcp      6 src=10.141.10.2 dst=192.168.10.2 sport=52728 dport=5201 src=192.168.10.2 dst=192.168.10.1 sport=5201 dport=52728 [OFFLOAD] mark=0 use=2
118 
119 
120 Layer 2 encapsulation
121 ---------------------
122 
123 Since Linux kernel 5.13, the flowtable infrastructure discovers the real
124 netdevice behind VLAN and PPPoE netdevices. The flowtable software datapath
125 parses the VLAN and PPPoE layer 2 headers to extract the ethertype and the
126 VLAN ID / PPPoE session ID which are used for the flowtable lookups. The
127 flowtable datapath also deals with layer 2 decapsulation.
128 
129 You do not need to add the PPPoE and the VLAN devices to your flowtable,
130 instead the real device is sufficient for the flowtable to track your flows.
131 
132 Bridge and IP forwarding
133 ------------------------
134 
135 Since Linux kernel 5.13, you can add bridge ports to the flowtable. The
136 flowtable infrastructure discovers the topology behind the bridge device. This
137 allows the flowtable to define a fastpath bypass between the bridge ports
138 (represented as eth1 and eth2 in the example figure below) and the gateway
139 device (represented as eth0) in your switch/router.
140 
141 ::
142 
143                       fastpath bypass
144                .-------------------------.
145               /                           \
146               |           IP forwarding   |
147               |          /             \ \/
148               |       br0               eth0 ..... eth0
149               .       / \                          *host B*
150                -> eth1  eth2
151                    .           *switch/router*
152                    .
153                    .
154                  eth0
155                *host A*
156 
157 The flowtable infrastructure also supports for bridge VLAN filtering actions
158 such as PVID and untagged. You can also stack a classic VLAN device on top of
159 your bridge port.
160 
161 If you would like that your flowtable defines a fastpath between your bridge
162 ports and your IP forwarding path, you have to add your bridge ports (as
163 represented by the real netdevice) to your flowtable definition.
164 
165 Counters
166 --------
167 
168 The flowtable can synchronize packet and byte counters with the existing
169 connection tracking entry by specifying the counter statement in your flowtable
170 definition, e.g.
171 
172 ::
173 
174         table inet x {
175                 flowtable f {
176                         hook ingress priority 0; devices = { eth0, eth1 };
177                         counter
178                 }
179         }
180 
181 Counter support is available since Linux kernel 5.7.
182 
183 Hardware offload
184 ----------------
185 
186 If your network device provides hardware offload support, you can turn it on by
187 means of the 'offload' flag in your flowtable definition, e.g.
188 
189 ::
190 
191         table inet x {
192                 flowtable f {
193                         hook ingress priority 0; devices = { eth0, eth1 };
194                         flags offload;
195                 }
196         }
197 
198 There is a workqueue that adds the flows to the hardware. Note that a few
199 packets might still run over the flowtable software path until the workqueue has
200 a chance to offload the flow to the network device.
201 
202 You can identify hardware offloaded flows through the [HW_OFFLOAD] tag when
203 listing your connection tracking table. Please, note that the [OFFLOAD] tag
204 refers to the software offload mode, so there is a distinction between [OFFLOAD]
205 which refers to the software flowtable fastpath and [HW_OFFLOAD] which refers
206 to the hardware offload datapath being used by the flow.
207 
208 The flowtable hardware offload infrastructure also supports for the DSA
209 (Distributed Switch Architecture).
210 
211 Limitations
212 -----------
213 
214 The flowtable behaves like a cache. The flowtable entries might get stale if
215 either the destination MAC address or the egress netdevice that is used for
216 transmission changes.
217 
218 This might be a problem if:
219 
220 - You run the flowtable in software mode and you combine bridge and IP
221   forwarding in your setup.
222 - Hardware offload is enabled.
223 
224 More reading
225 ------------
226 
227 This documentation is based on the LWN.net articles [1]_\ [2]_. Rafal Milecki
228 also made a very complete and comprehensive summary called "A state of network
229 acceleration" that describes how things were before this infrastructure was
230 mainlined [3]_ and it also makes a rough summary of this work [4]_.
231 
232 .. [1] https://lwn.net/Articles/738214/
233 .. [2] https://lwn.net/Articles/742164/
234 .. [3] http://lists.infradead.org/pipermail/lede-dev/2018-January/010830.html
235 .. [4] http://lists.infradead.org/pipermail/lede-dev/2018-January/010829.html

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php