~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 .. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
  2 
  3 ====================================
  4 Marvell OcteonTx2 RVU Kernel Drivers
  5 ====================================
  6 
  7 Copyright (c) 2020 Marvell International Ltd.
  8 
  9 Contents
 10 ========
 11 
 12 - `Overview`_
 13 - `Drivers`_
 14 - `Basic packet flow`_
 15 - `Devlink health reporters`_
 16 - `Quality of service`_
 17 
 18 Overview
 19 ========
 20 
 21 Resource virtualization unit (RVU) on Marvell's OcteonTX2 SOC maps HW
 22 resources from the network, crypto and other functional blocks into
 23 PCI-compatible physical and virtual functions. Each functional block
 24 again has multiple local functions (LFs) for provisioning to PCI devices.
 25 RVU supports multiple PCIe SRIOV physical functions (PFs) and virtual
 26 functions (VFs). PF0 is called the administrative / admin function (AF)
 27 and has privileges to provision RVU functional block's LFs to each of the
 28 PF/VF.
 29 
 30 RVU managed networking functional blocks
 31  - Network pool or buffer allocator (NPA)
 32  - Network interface controller (NIX)
 33  - Network parser CAM (NPC)
 34  - Schedule/Synchronize/Order unit (SSO)
 35  - Loopback interface (LBK)
 36 
 37 RVU managed non-networking functional blocks
 38  - Crypto accelerator (CPT)
 39  - Scheduled timers unit (TIM)
 40  - Schedule/Synchronize/Order unit (SSO)
 41    Used for both networking and non networking usecases
 42 
 43 Resource provisioning examples
 44  - A PF/VF with NIX-LF & NPA-LF resources works as a pure network device
 45  - A PF/VF with CPT-LF resource works as a pure crypto offload device.
 46 
 47 RVU functional blocks are highly configurable as per software requirements.
 48 
 49 Firmware setups following stuff before kernel boots
 50  - Enables required number of RVU PFs based on number of physical links.
 51  - Number of VFs per PF are either static or configurable at compile time.
 52    Based on config, firmware assigns VFs to each of the PFs.
 53  - Also assigns MSIX vectors to each of PF and VFs.
 54  - These are not changed after kernel boot.
 55 
 56 Drivers
 57 =======
 58 
 59 Linux kernel will have multiple drivers registering to different PF and VFs
 60 of RVU. Wrt networking there will be 3 flavours of drivers.
 61 
 62 Admin Function driver
 63 ---------------------
 64 
 65 As mentioned above RVU PF0 is called the admin function (AF), this driver
 66 supports resource provisioning and configuration of functional blocks.
 67 Doesn't handle any I/O. It sets up few basic stuff but most of the
 68 funcionality is achieved via configuration requests from PFs and VFs.
 69 
 70 PF/VFs communicates with AF via a shared memory region (mailbox). Upon
 71 receiving requests AF does resource provisioning and other HW configuration.
 72 AF is always attached to host kernel, but PFs and their VFs may be used by host
 73 kernel itself, or attached to VMs or to userspace applications like
 74 DPDK etc. So AF has to handle provisioning/configuration requests sent
 75 by any device from any domain.
 76 
 77 AF driver also interacts with underlying firmware to
 78  - Manage physical ethernet links ie CGX LMACs.
 79  - Retrieve information like speed, duplex, autoneg etc
 80  - Retrieve PHY EEPROM and stats.
 81  - Configure FEC, PAM modes
 82  - etc
 83 
 84 From pure networking side AF driver supports following functionality.
 85  - Map a physical link to a RVU PF to which a netdev is registered.
 86  - Attach NIX and NPA block LFs to RVU PF/VF which provide buffer pools, RQs, SQs
 87    for regular networking functionality.
 88  - Flow control (pause frames) enable/disable/config.
 89  - HW PTP timestamping related config.
 90  - NPC parser profile config, basically how to parse pkt and what info to extract.
 91  - NPC extract profile config, what to extract from the pkt to match data in MCAM entries.
 92  - Manage NPC MCAM entries, upon request can frame and install requested packet forwarding rules.
 93  - Defines receive side scaling (RSS) algorithms.
 94  - Defines segmentation offload algorithms (eg TSO)
 95  - VLAN stripping, capture and insertion config.
 96  - SSO and TIM blocks config which provide packet scheduling support.
 97  - Debugfs support, to check current resource provising, current status of
 98    NPA pools, NIX RQ, SQ and CQs, various stats etc which helps in debugging issues.
 99  - And many more.
100 
101 Physical Function driver
102 ------------------------
103 
104 This RVU PF handles IO, is mapped to a physical ethernet link and this
105 driver registers a netdev. This supports SR-IOV. As said above this driver
106 communicates with AF with a mailbox. To retrieve information from physical
107 links this driver talks to AF and AF gets that info from firmware and responds
108 back ie cannot talk to firmware directly.
109 
110 Supports ethtool for configuring links, RSS, queue count, queue size,
111 flow control, ntuple filters, dump PHY EEPROM, config FEC etc.
112 
113 Virtual Function driver
114 -----------------------
115 
116 There are two types VFs, VFs that share the physical link with their parent
117 SR-IOV PF and the VFs which work in pairs using internal HW loopback channels (LBK).
118 
119 Type1:
120  - These VFs and their parent PF share a physical link and used for outside communication.
121  - VFs cannot communicate with AF directly, they send mbox message to PF and PF
122    forwards that to AF. AF after processing, responds back to PF and PF forwards
123    the reply to VF.
124  - From functionality point of view there is no difference between PF and VF as same type
125    HW resources are attached to both. But user would be able to configure few stuff only
126    from PF as PF is treated as owner/admin of the link.
127 
128 Type2:
129  - RVU PF0 ie admin function creates these VFs and maps them to loopback block's channels.
130  - A set of two VFs (VF0 & VF1, VF2 & VF3 .. so on) works as a pair ie pkts sent out of
131    VF0 will be received by VF1 and vice versa.
132  - These VFs can be used by applications or virtual machines to communicate between them
133    without sending traffic outside. There is no switch present in HW, hence the support
134    for loopback VFs.
135  - These communicate directly with AF (PF0) via mbox.
136 
137 Except for the IO channels or links used for packet reception and transmission there is
138 no other difference between these VF types. AF driver takes care of IO channel mapping,
139 hence same VF driver works for both types of devices.
140 
141 Basic packet flow
142 =================
143 
144 Ingress
145 -------
146 
147 1. CGX LMAC receives packet.
148 2. Forwards the packet to the NIX block.
149 3. Then submitted to NPC block for parsing and then MCAM lookup to get the destination RVU device.
150 4. NIX LF attached to the destination RVU device allocates a buffer from RQ mapped buffer pool of NPA block LF.
151 5. RQ may be selected by RSS or by configuring MCAM rule with a RQ number.
152 6. Packet is DMA'ed and driver is notified.
153 
154 Egress
155 ------
156 
157 1. Driver prepares a send descriptor and submits to SQ for transmission.
158 2. The SQ is already configured (by AF) to transmit on a specific link/channel.
159 3. The SQ descriptor ring is maintained in buffers allocated from SQ mapped pool of NPA block LF.
160 4. NIX block transmits the pkt on the designated channel.
161 5. NPC MCAM entries can be installed to divert pkt onto a different channel.
162 
163 Devlink health reporters
164 ========================
165 
166 NPA Reporters
167 -------------
168 The NPA reporters are responsible for reporting and recovering the following group of errors:
169 
170 1. GENERAL events
171 
172    - Error due to operation of unmapped PF.
173    - Error due to disabled alloc/free for other HW blocks (NIX, SSO, TIM, DPI and AURA).
174 
175 2. ERROR events
176 
177    - Fault due to NPA_AQ_INST_S read or NPA_AQ_RES_S write.
178    - AQ Doorbell Error.
179 
180 3. RAS events
181 
182    - RAS Error Reporting for NPA_AQ_INST_S/NPA_AQ_RES_S.
183 
184 4. RVU events
185 
186    - Error due to unmapped slot.
187 
188 Sample Output::
189 
190         ~# devlink health
191         pci/0002:01:00.0:
192           reporter hw_npa_intr
193               state healthy error 2872 recover 2872 last_dump_date 2020-12-10 last_dump_time 09:39:09 grace_period 0 auto_recover true auto_dump true
194           reporter hw_npa_gen
195               state healthy error 2872 recover 2872 last_dump_date 2020-12-11 last_dump_time 04:43:04 grace_period 0 auto_recover true auto_dump true
196           reporter hw_npa_err
197               state healthy error 2871 recover 2871 last_dump_date 2020-12-10 last_dump_time 09:39:17 grace_period 0 auto_recover true auto_dump true
198            reporter hw_npa_ras
199               state healthy error 0 recover 0 last_dump_date 2020-12-10 last_dump_time 09:32:40 grace_period 0 auto_recover true auto_dump true
200 
201 Each reporter dumps the
202 
203  - Error Type
204  - Error Register value
205  - Reason in words
206 
207 For example::
208 
209         ~# devlink health dump show  pci/0002:01:00.0 reporter hw_npa_gen
210          NPA_AF_GENERAL:
211                  NPA General Interrupt Reg : 1
212                  NIX0: free disabled RX
213         ~# devlink health dump show  pci/0002:01:00.0 reporter hw_npa_intr
214          NPA_AF_RVU:
215                  NPA RVU Interrupt Reg : 1
216                  Unmap Slot Error
217         ~# devlink health dump show  pci/0002:01:00.0 reporter hw_npa_err
218          NPA_AF_ERR:
219                 NPA Error Interrupt Reg : 4096
220                 AQ Doorbell Error
221 
222 
223 NIX Reporters
224 -------------
225 The NIX reporters are responsible for reporting and recovering the following group of errors:
226 
227 1. GENERAL events
228 
229    - Receive mirror/multicast packet drop due to insufficient buffer.
230    - SMQ Flush operation.
231 
232 2. ERROR events
233 
234    - Memory Fault due to WQE read/write from multicast/mirror buffer.
235    - Receive multicast/mirror replication list error.
236    - Receive packet on an unmapped PF.
237    - Fault due to NIX_AQ_INST_S read or NIX_AQ_RES_S write.
238    - AQ Doorbell Error.
239 
240 3. RAS events
241 
242    - RAS Error Reporting for NIX Receive Multicast/Mirror Entry Structure.
243    - RAS Error Reporting for WQE/Packet Data read from Multicast/Mirror Buffer..
244    - RAS Error Reporting for NIX_AQ_INST_S/NIX_AQ_RES_S.
245 
246 4. RVU events
247 
248    - Error due to unmapped slot.
249 
250 Sample Output::
251 
252         ~# ./devlink health
253         pci/0002:01:00.0:
254           reporter hw_npa_intr
255             state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true
256           reporter hw_npa_gen
257             state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true
258           reporter hw_npa_err
259             state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true
260           reporter hw_npa_ras
261             state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true
262           reporter hw_nix_intr
263             state healthy error 1121 recover 1121 last_dump_date 2021-01-19 last_dump_time 05:42:26 grace_period 0 auto_recover true auto_dump true
264           reporter hw_nix_gen
265             state healthy error 949 recover 949 last_dump_date 2021-01-19 last_dump_time 05:42:43 grace_period 0 auto_recover true auto_dump true
266           reporter hw_nix_err
267             state healthy error 1147 recover 1147 last_dump_date 2021-01-19 last_dump_time 05:42:59 grace_period 0 auto_recover true auto_dump true
268           reporter hw_nix_ras
269             state healthy error 409 recover 409 last_dump_date 2021-01-19 last_dump_time 05:43:16 grace_period 0 auto_recover true auto_dump true
270 
271 Each reporter dumps the
272 
273  - Error Type
274  - Error Register value
275  - Reason in words
276 
277 For example::
278 
279         ~# devlink health dump show pci/0002:01:00.0 reporter hw_nix_intr
280          NIX_AF_RVU:
281                 NIX RVU Interrupt Reg : 1
282                 Unmap Slot Error
283         ~# devlink health dump show pci/0002:01:00.0 reporter hw_nix_gen
284          NIX_AF_GENERAL:
285                 NIX General Interrupt Reg : 1
286                 Rx multicast pkt drop
287         ~# devlink health dump show pci/0002:01:00.0 reporter hw_nix_err
288          NIX_AF_ERR:
289                 NIX Error Interrupt Reg : 64
290                 Rx on unmapped PF_FUNC
291 
292 
293 Quality of service
294 ==================
295 
296 
297 Hardware algorithms used in scheduling
298 --------------------------------------
299 
300 octeontx2 silicon and CN10K transmit interface consists of five transmit levels
301 starting from SMQ/MDQ, TL4 to TL1. Each packet will traverse MDQ, TL4 to TL1
302 levels. Each level contains an array of queues to support scheduling and shaping.
303 The hardware uses the below algorithms depending on the priority of scheduler queues.
304 once the usercreates tc classes with different priorities, the driver configures
305 schedulers allocated to the class with specified priority along with rate-limiting
306 configuration.
307 
308 1. Strict Priority
309 
310       -  Once packets are submitted to MDQ, hardware picks all active MDQs having different priority
311          using strict priority.
312 
313 2. Round Robin
314 
315       - Active MDQs having the same priority level are chosen using round robin.
316 
317 
318 Setup HTB offload
319 -----------------
320 
321 1. Enable HW TC offload on the interface::
322 
323         # ethtool -K <interface> hw-tc-offload on
324 
325 2. Crate htb root::
326 
327         # tc qdisc add dev <interface> clsact
328         # tc qdisc replace dev <interface> root handle 1: htb offload
329 
330 3. Create tc classes with different priorities::
331 
332         # tc class add dev <interface> parent 1: classid 1:1 htb rate 10Gbit prio 1
333 
334         # tc class add dev <interface> parent 1: classid 1:2 htb rate 10Gbit prio 7
335 
336 4. Create tc classes with same priorities and different quantum::
337 
338         # tc class add dev <interface> parent 1: classid 1:1 htb rate 10Gbit prio 2 quantum 409600
339 
340         # tc class add dev <interface> parent 1: classid 1:2 htb rate 10Gbit prio 2 quantum 188416
341 
342         # tc class add dev <interface> parent 1: classid 1:3 htb rate 10Gbit prio 2 quantum 32768

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php