~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 .. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
  2 .. include:: <isonum.txt>
  3 
  4 =========
  5 Switchdev
  6 =========
  7 
  8 :Copyright: |copy| 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
  9 
 10 .. _mlx5_bridge_offload:
 11 
 12 Bridge offload
 13 ==============
 14 
 15 The mlx5 driver implements support for offloading bridge rules when in switchdev
 16 mode. Linux bridge FDBs are automatically offloaded when mlx5 switchdev
 17 representor is attached to bridge.
 18 
 19 - Change device to switchdev mode::
 20 
 21     $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
 22 
 23 - Attach mlx5 switchdev representor 'enp8s0f0' to bridge netdev 'bridge1'::
 24 
 25     $ ip link set enp8s0f0 master bridge1
 26 
 27 VLANs
 28 -----
 29 
 30 Following bridge VLAN functions are supported by mlx5:
 31 
 32 - VLAN filtering (including multiple VLANs per port)::
 33 
 34     $ ip link set bridge1 type bridge vlan_filtering 1
 35     $ bridge vlan add dev enp8s0f0 vid 2-3
 36 
 37 - VLAN push on bridge ingress::
 38 
 39     $ bridge vlan add dev enp8s0f0 vid 3 pvid
 40 
 41 - VLAN pop on bridge egress::
 42 
 43     $ bridge vlan add dev enp8s0f0 vid 3 untagged
 44 
 45 Subfunction
 46 ===========
 47 
 48 Subfunction which are spawned over the E-switch are created only with devlink
 49 device, and by default all the SF auxiliary devices are disabled.
 50 This will allow user to configure the SF before the SF have been fully probed,
 51 which will save time.
 52 
 53 Usage example:
 54 
 55 - Create SF::
 56 
 57     $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
 58     $ devlink port function set pci/0000:08:00.0/32768 hw_addr 00:00:00:00:00:11 state active
 59 
 60 - Enable ETH auxiliary device::
 61 
 62     $ devlink dev param set auxiliary/mlx5_core.sf.1 name enable_eth value true cmode driverinit
 63 
 64 - Now, in order to fully probe the SF, use devlink reload::
 65 
 66     $ devlink dev reload auxiliary/mlx5_core.sf.1
 67 
 68 mlx5 supports ETH,rdma and vdpa (vnet) auxiliary devices devlink params (see :ref:`Documentation/networking/devlink/devlink-params.rst <devlink_params_generic>`).
 69 
 70 mlx5 supports subfunction management using devlink port (see :ref:`Documentation/networking/devlink/devlink-port.rst <devlink_port>`) interface.
 71 
 72 A subfunction has its own function capabilities and its own resources. This
 73 means a subfunction has its own dedicated queues (txq, rxq, cq, eq). These
 74 queues are neither shared nor stolen from the parent PCI function.
 75 
 76 When a subfunction is RDMA capable, it has its own QP1, GID table, and RDMA
 77 resources neither shared nor stolen from the parent PCI function.
 78 
 79 A subfunction has a dedicated window in PCI BAR space that is not shared
 80 with the other subfunctions or the parent PCI function. This ensures that all
 81 devices (netdev, rdma, vdpa, etc.) of the subfunction accesses only assigned
 82 PCI BAR space.
 83 
 84 A subfunction supports eswitch representation through which it supports tc
 85 offloads. The user configures eswitch to send/receive packets from/to
 86 the subfunction port.
 87 
 88 Subfunctions share PCI level resources such as PCI MSI-X IRQs with
 89 other subfunctions and/or with its parent PCI function.
 90 
 91 Example mlx5 software, system, and device view::
 92 
 93        _______
 94       | admin |
 95       | user  |----------
 96       |_______|         |
 97           |             |
 98       ____|____       __|______            _________________
 99      |         |     |         |          |                 |
100      | devlink |     | tc tool |          |    user         |
101      | tool    |     |_________|          | applications    |
102      |_________|         |                |_________________|
103            |             |                   |          |
104            |             |                   |          |         Userspace
105  +---------|-------------|-------------------|----------|--------------------+
106            |             |           +----------+   +----------+   Kernel
107            |             |           |  netdev  |   | rdma dev |
108            |             |           +----------+   +----------+
109    (devlink port add/del |              ^               ^
110     port function set)   |              |               |
111            |             |              +---------------|
112       _____|___          |              |        _______|_______
113      |         |         |              |       | mlx5 class    |
114      | devlink |   +------------+       |       |   drivers     |
115      | kernel  |   | rep netdev |       |       |(mlx5_core,ib) |
116      |_________|   +------------+       |       |_______________|
117            |             |              |               ^
118    (devlink ops)         |              |          (probe/remove)
119   _________|________     |              |           ____|________
120  | subfunction      |    |     +---------------+   | subfunction |
121  | management driver|-----     | subfunction   |---|  driver     |
122  | (mlx5_core)      |          | auxiliary dev |   | (mlx5_core) |
123  |__________________|          +---------------+   |_____________|
124            |                                            ^
125   (sf add/del, vhca events)                             |
126            |                                      (device add/del)
127       _____|____                                    ____|________
128      |          |                                  | subfunction |
129      |  PCI NIC |--- activate/deactivate events--->| host driver |
130      |__________|                                  | (mlx5_core) |
131                                                    |_____________|
132 
133 Subfunction is created using devlink port interface.
134 
135 - Change device to switchdev mode::
136 
137     $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
138 
139 - Add a devlink port of subfunction flavour::
140 
141     $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
142     pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
143       function:
144         hw_addr 00:00:00:00:00:00 state inactive opstate detached
145 
146 - Show a devlink port of the subfunction::
147 
148     $ devlink port show pci/0000:06:00.0/32768
149     pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88
150       function:
151         hw_addr 00:00:00:00:00:00 state inactive opstate detached
152 
153 - Delete a devlink port of subfunction after use::
154 
155     $ devlink port del pci/0000:06:00.0/32768
156 
157 Function attributes
158 ===================
159 
160 The mlx5 driver provides a mechanism to setup PCI VF/SF function attributes in
161 a unified way for SmartNIC and non-SmartNIC.
162 
163 This is supported only when the eswitch mode is set to switchdev. Port function
164 configuration of the PCI VF/SF is supported through devlink eswitch port.
165 
166 Port function attributes should be set before PCI VF/SF is enumerated by the
167 driver.
168 
169 MAC address setup
170 -----------------
171 
172 mlx5 driver support devlink port function attr mechanism to setup MAC
173 address. (refer to Documentation/networking/devlink/devlink-port.rst)
174 
175 RoCE capability setup
176 ~~~~~~~~~~~~~~~~~~~~~
177 Not all mlx5 PCI devices/SFs require RoCE capability.
178 
179 When RoCE capability is disabled, it saves 1 Mbytes worth of system memory per
180 PCI devices/SF.
181 
182 mlx5 driver support devlink port function attr mechanism to setup RoCE
183 capability. (refer to Documentation/networking/devlink/devlink-port.rst)
184 
185 migratable capability setup
186 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
187 User who wants mlx5 PCI VFs to be able to perform live migration need to
188 explicitly enable the VF migratable capability.
189 
190 mlx5 driver support devlink port function attr mechanism to setup migratable
191 capability. (refer to Documentation/networking/devlink/devlink-port.rst)
192 
193 IPsec crypto capability setup
194 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
195 User who wants mlx5 PCI VFs to be able to perform IPsec crypto offloading need
196 to explicitly enable the VF ipsec_crypto capability. Enabling IPsec capability
197 for VFs is supported starting with ConnectX6dx devices and above. When a VF has
198 IPsec capability enabled, any IPsec offloading is blocked on the PF.
199 
200 mlx5 driver support devlink port function attr mechanism to setup ipsec_crypto
201 capability. (refer to Documentation/networking/devlink/devlink-port.rst)
202 
203 IPsec packet capability setup
204 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
205 User who wants mlx5 PCI VFs to be able to perform IPsec packet offloading need
206 to explicitly enable the VF ipsec_packet capability. Enabling IPsec capability
207 for VFs is supported starting with ConnectX6dx devices and above. When a VF has
208 IPsec capability enabled, any IPsec offloading is blocked on the PF.
209 
210 mlx5 driver support devlink port function attr mechanism to setup ipsec_packet
211 capability. (refer to Documentation/networking/devlink/devlink-port.rst)
212 
213 SF state setup
214 --------------
215 
216 To use the SF, the user must activate the SF using the SF function state
217 attribute.
218 
219 - Get the state of the SF identified by its unique devlink port index::
220 
221    $ devlink port show ens2f0npf0sf88
222    pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
223      function:
224        hw_addr 00:00:00:00:88:88 state inactive opstate detached
225 
226 - Activate the function and verify its state is active::
227 
228    $ devlink port function set ens2f0npf0sf88 state active
229 
230    $ devlink port show ens2f0npf0sf88
231    pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
232      function:
233        hw_addr 00:00:00:00:88:88 state active opstate detached
234 
235 Upon function activation, the PF driver instance gets the event from the device
236 that a particular SF was activated. It's the cue to put the device on bus, probe
237 it and instantiate the devlink instance and class specific auxiliary devices
238 for it.
239 
240 - Show the auxiliary device and port of the subfunction::
241 
242     $ devlink dev show
243     devlink dev show auxiliary/mlx5_core.sf.4
244 
245     $ devlink port show auxiliary/mlx5_core.sf.4/1
246     auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false
247 
248     $ rdma link show mlx5_0/1
249     link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88
250 
251     $ rdma dev show
252     8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112
253     13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112
254 
255 - Subfunction auxiliary device and class device hierarchy::
256 
257                  mlx5_core.sf.4
258           (subfunction auxiliary device)
259                        /\
260                       /  \
261                      /    \
262                     /      \
263                    /        \
264       mlx5_core.eth.4     mlx5_core.rdma.4
265      (sf eth aux dev)     (sf rdma aux dev)
266          |                      |
267          |                      |
268       p0sf88                  mlx5_0
269      (sf netdev)          (sf rdma device)
270 
271 Additionally, the SF port also gets the event when the driver attaches to the
272 auxiliary device of the subfunction. This results in changing the operational
273 state of the function. This provides visibility to the user to decide when is it
274 safe to delete the SF port for graceful termination of the subfunction.
275 
276 - Show the SF port operational state::
277 
278     $ devlink port show ens2f0npf0sf88
279     pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
280       function:
281         hw_addr 00:00:00:00:88:88 state active opstate attached

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php