1 .. SPDX-License-Identifier: GPL-2.0 2 .. include:: <isonum.txt> 3 4 =============== 5 Multi-PF Netdev 6 =============== 7 8 Contents 9 ======== 10 11 - `Background`_ 12 - `Overview`_ 13 - `mlx5 implementation`_ 14 - `Channels distribution`_ 15 - `Observability`_ 16 - `Steering`_ 17 - `Mutually exclusive features`_ 18 19 Background 20 ========== 21 22 The Multi-PF NIC technology enables several CP 23 the network, each through its own dedicated PC 24 splits the PCIe lanes between two cards or by 25 results in eliminating the network traffic tra 26 significantly reducing overhead and latency, i 27 network throughput. 28 29 Overview 30 ======== 31 32 The feature adds support for combining multipl 33 one netdev instance. It is implemented in the 34 sysfs entry, and devlink are kept separate. 35 Passing traffic through different devices belo 36 traffic and allows apps running on the same ne 37 proximity to the device and achieve improved p 38 39 mlx5 implementation 40 =================== 41 42 Multi-PF or Socket-direct in mlx5 is achieved 43 NIC and has the socket-direct property enabled 44 to represent all of them, symmetrically, we de 45 46 The netdev network channels are distributed be 47 the correct close NUMA node when working on a 48 49 We pick one PF to be a primary (leader), and i 50 (secondaries) are disconnected from the networ 51 mode, no south <-> north traffic flowing direc 52 the leader PF (east <-> west traffic) to funct 53 to/from the secondaries. 54 55 Currently, we limit the support to PFs only, a 56 57 Channels distribution 58 ===================== 59 60 We distribute the channels between the differe 61 on multiple NUMA nodes. 62 63 Each combined channel works against one specif 64 distribute channels to PFs in a round-robin po 65 66 :: 67 68 Example for 2 PFs and 5 channels: 69 +--------+--------+ 70 | ch idx | PF idx | 71 +--------+--------+ 72 | 0 | 0 | 73 | 1 | 1 | 74 | 2 | 0 | 75 | 3 | 1 | 76 | 4 | 0 | 77 +--------+--------+ 78 79 80 The reason we prefer round-robin is, it is les 81 mapping between a channel index and a PF is fi 82 As the channel stats are persistent across cha 83 would turn the accumulative stats less represe 84 85 This is achieved by using the correct core dev 86 all using the same instance under "priv->mdev" 87 88 Observability 89 ============= 90 The relation between PF, irq, napi, and queue 91 92 $ ./tools/net/ynl/cli.py --spec Documentatio 93 [{'id': 0, 'ifindex': 13, 'napi-id': 539, 't 94 {'id': 1, 'ifindex': 13, 'napi-id': 540, 't 95 {'id': 2, 'ifindex': 13, 'napi-id': 541, 't 96 {'id': 3, 'ifindex': 13, 'napi-id': 542, 't 97 {'id': 4, 'ifindex': 13, 'napi-id': 543, 't 98 {'id': 0, 'ifindex': 13, 'napi-id': 539, 't 99 {'id': 1, 'ifindex': 13, 'napi-id': 540, 't 100 {'id': 2, 'ifindex': 13, 'napi-id': 541, 't 101 {'id': 3, 'ifindex': 13, 'napi-id': 542, 't 102 {'id': 4, 'ifindex': 13, 'napi-id': 543, 't 103 104 $ ./tools/net/ynl/cli.py --spec Documentatio 105 [{'id': 543, 'ifindex': 13, 'irq': 42}, 106 {'id': 542, 'ifindex': 13, 'irq': 41}, 107 {'id': 541, 'ifindex': 13, 'irq': 40}, 108 {'id': 540, 'ifindex': 13, 'irq': 39}, 109 {'id': 539, 'ifindex': 13, 'irq': 36}] 110 111 Here you can clearly observe our channels dist 112 113 $ ls /proc/irq/{36,39,40,41,42}/mlx5* -d -1 114 /proc/irq/36/mlx5_comp0@pci:0000:08:00.0 115 /proc/irq/39/mlx5_comp0@pci:0000:09:00.0 116 /proc/irq/40/mlx5_comp1@pci:0000:08:00.0 117 /proc/irq/41/mlx5_comp1@pci:0000:09:00.0 118 /proc/irq/42/mlx5_comp2@pci:0000:08:00.0 119 120 Steering 121 ======== 122 Secondary PFs are set to "silent" mode, meanin 123 124 In Rx, the steering tables belong to the prima 125 traffic to other PFs, via cross-vhca steering 126 that is capable of pointing to the receive que 127 128 In Tx, the primary PF creates a new Tx flow ta 129 go out to the network through it. 130 131 In addition, we set default XPS configuration 132 PF on the same node as the CPU. 133 134 XPS default config example: 135 136 NUMA node(s): 2 137 NUMA node0 CPU(s): 0-11 138 NUMA node1 CPU(s): 12-23 139 140 PF0 on node0, PF1 on node1. 141 142 - /sys/class/net/eth2/queues/tx-0/xps_cpus:000 143 - /sys/class/net/eth2/queues/tx-1/xps_cpus:001 144 - /sys/class/net/eth2/queues/tx-2/xps_cpus:000 145 - /sys/class/net/eth2/queues/tx-3/xps_cpus:002 146 - /sys/class/net/eth2/queues/tx-4/xps_cpus:000 147 - /sys/class/net/eth2/queues/tx-5/xps_cpus:004 148 - /sys/class/net/eth2/queues/tx-6/xps_cpus:000 149 - /sys/class/net/eth2/queues/tx-7/xps_cpus:008 150 - /sys/class/net/eth2/queues/tx-8/xps_cpus:000 151 - /sys/class/net/eth2/queues/tx-9/xps_cpus:010 152 - /sys/class/net/eth2/queues/tx-10/xps_cpus:00 153 - /sys/class/net/eth2/queues/tx-11/xps_cpus:02 154 - /sys/class/net/eth2/queues/tx-12/xps_cpus:00 155 - /sys/class/net/eth2/queues/tx-13/xps_cpus:04 156 - /sys/class/net/eth2/queues/tx-14/xps_cpus:00 157 - /sys/class/net/eth2/queues/tx-15/xps_cpus:08 158 - /sys/class/net/eth2/queues/tx-16/xps_cpus:00 159 - /sys/class/net/eth2/queues/tx-17/xps_cpus:10 160 - /sys/class/net/eth2/queues/tx-18/xps_cpus:00 161 - /sys/class/net/eth2/queues/tx-19/xps_cpus:20 162 - /sys/class/net/eth2/queues/tx-20/xps_cpus:00 163 - /sys/class/net/eth2/queues/tx-21/xps_cpus:40 164 - /sys/class/net/eth2/queues/tx-22/xps_cpus:00 165 - /sys/class/net/eth2/queues/tx-23/xps_cpus:80 166 167 Mutually exclusive features 168 =========================== 169 170 The nature of Multi-PF, where different channe 171 stateful features where the state is maintaine 172 For example, in the TLS device-offload feature 173 and maintained in the PF. Transitioning betwe 174 we disable this combination for now.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.