1 .. SPDX-License-Identifier: GPL-2.0 1 .. SPDX-License-Identifier: GPL-2.0 2 2 3 =================== 3 =================== 4 ice devlink support 4 ice devlink support 5 =================== 5 =================== 6 6 7 This document describes the devlink features i 7 This document describes the devlink features implemented by the ``ice`` 8 device driver. 8 device driver. 9 9 10 Parameters 10 Parameters 11 ========== 11 ========== 12 12 13 .. list-table:: Generic parameters implemented 13 .. list-table:: Generic parameters implemented 14 :widths: 5 5 90 14 :widths: 5 5 90 15 15 16 * - Name 16 * - Name 17 - Mode 17 - Mode 18 - Notes 18 - Notes 19 * - ``enable_roce`` 19 * - ``enable_roce`` 20 - runtime 20 - runtime 21 - mutually exclusive with ``enable_iwarp` 21 - mutually exclusive with ``enable_iwarp`` 22 * - ``enable_iwarp`` 22 * - ``enable_iwarp`` 23 - runtime 23 - runtime 24 - mutually exclusive with ``enable_roce`` 24 - mutually exclusive with ``enable_roce`` 25 * - ``tx_scheduling_layers`` 25 * - ``tx_scheduling_layers`` 26 - permanent 26 - permanent 27 - The ice hardware uses hierarchical sche 27 - The ice hardware uses hierarchical scheduling for Tx with a fixed 28 number of layers in the scheduling tree 28 number of layers in the scheduling tree. Each of them are decision 29 points. Root node represents a port, wh 29 points. Root node represents a port, while all the leaves represent 30 the queues. This way of configuring the 30 the queues. This way of configuring the Tx scheduler allows features 31 like DCB or devlink-rate (documented be 31 like DCB or devlink-rate (documented below) to configure how much 32 bandwidth is given to any given queue o 32 bandwidth is given to any given queue or group of queues, enabling 33 fine-grained control because scheduling 33 fine-grained control because scheduling parameters can be configured 34 at any given layer of the tree. 34 at any given layer of the tree. 35 35 36 The default 9-layer tree topology was d 36 The default 9-layer tree topology was deemed best for most workloads, 37 as it gives an optimal ratio of perform 37 as it gives an optimal ratio of performance to configurability. However, 38 for some specific cases, this 9-layer t 38 for some specific cases, this 9-layer topology might not be desired. 39 One example would be sending traffic to 39 One example would be sending traffic to queues that are not a multiple 40 of 8. Because the maximum radix is limi 40 of 8. Because the maximum radix is limited to 8 in 9-layer topology, 41 the 9th queue has a different parent th 41 the 9th queue has a different parent than the rest, and it's given 42 more bandwidth credits. This causes a p 42 more bandwidth credits. This causes a problem when the system is 43 sending traffic to 9 queues: 43 sending traffic to 9 queues: 44 44 45 | tx_queue_0_packets: 24163396 45 | tx_queue_0_packets: 24163396 46 | tx_queue_1_packets: 24164623 46 | tx_queue_1_packets: 24164623 47 | tx_queue_2_packets: 24163188 47 | tx_queue_2_packets: 24163188 48 | tx_queue_3_packets: 24163701 48 | tx_queue_3_packets: 24163701 49 | tx_queue_4_packets: 24163683 49 | tx_queue_4_packets: 24163683 50 | tx_queue_5_packets: 24164668 50 | tx_queue_5_packets: 24164668 51 | tx_queue_6_packets: 23327200 51 | tx_queue_6_packets: 23327200 52 | tx_queue_7_packets: 24163853 52 | tx_queue_7_packets: 24163853 53 | tx_queue_8_packets: 91101417 < Too mu 53 | tx_queue_8_packets: 91101417 < Too much traffic is sent from 9th 54 54 55 To address this need, you can switch to 55 To address this need, you can switch to a 5-layer topology, which 56 changes the maximum topology radix to 5 56 changes the maximum topology radix to 512. With this enhancement, 57 the performance characteristic is equal 57 the performance characteristic is equal as all queues can be assigned 58 to the same parent in the tree. The obv 58 to the same parent in the tree. The obvious drawback of this solution 59 is a lower configuration depth of the t 59 is a lower configuration depth of the tree. 60 60 61 Use the ``tx_scheduling_layer`` paramet 61 Use the ``tx_scheduling_layer`` parameter with the devlink command 62 to change the transmit scheduler topolo 62 to change the transmit scheduler topology. To use 5-layer topology, 63 use a value of 5. For example: 63 use a value of 5. For example: 64 $ devlink dev param set pci/0000:16:00. 64 $ devlink dev param set pci/0000:16:00.0 name tx_scheduling_layers 65 value 5 cmode permanent 65 value 5 cmode permanent 66 Use a value of 9 to set it back to the 66 Use a value of 9 to set it back to the default value. 67 67 68 You must do PCI slot powercycle for the 68 You must do PCI slot powercycle for the selected topology to take effect. 69 69 70 To verify that value has been set: 70 To verify that value has been set: 71 $ devlink dev param show pci/0000:16:00 71 $ devlink dev param show pci/0000:16:00.0 name tx_scheduling_layers 72 .. list-table:: Driver specific parameters imp 72 .. list-table:: Driver specific parameters implemented 73 :widths: 5 5 90 73 :widths: 5 5 90 74 74 75 * - Name 75 * - Name 76 - Mode 76 - Mode 77 - Description 77 - Description 78 * - ``local_forwarding`` 78 * - ``local_forwarding`` 79 - runtime 79 - runtime 80 - Controls loopback behavior by tuning s 80 - Controls loopback behavior by tuning scheduler bandwidth. 81 It impacts all kinds of functions: phy 81 It impacts all kinds of functions: physical, virtual and 82 subfunctions. 82 subfunctions. 83 Supported values are: 83 Supported values are: 84 84 85 ``enabled`` - loopback traffic is allo 85 ``enabled`` - loopback traffic is allowed on port 86 86 87 ``disabled`` - loopback traffic is not 87 ``disabled`` - loopback traffic is not allowed on this port 88 88 89 ``prioritized`` - loopback traffic is 89 ``prioritized`` - loopback traffic is prioritized on this port 90 90 91 Default value of ``local_forwarding`` 91 Default value of ``local_forwarding`` parameter is ``enabled``. 92 ``prioritized`` provides ability to ad 92 ``prioritized`` provides ability to adjust loopback traffic rate to increase 93 one port capacity at cost of the anoth 93 one port capacity at cost of the another. User needs to disable 94 local forwarding on one of the ports i 94 local forwarding on one of the ports in order have increased capacity 95 on the ``prioritized`` port. 95 on the ``prioritized`` port. 96 96 97 Info versions 97 Info versions 98 ============= 98 ============= 99 99 100 The ``ice`` driver reports the following versi 100 The ``ice`` driver reports the following versions 101 101 102 .. list-table:: devlink info versions implemen 102 .. list-table:: devlink info versions implemented 103 :widths: 5 5 5 90 103 :widths: 5 5 5 90 104 104 105 * - Name 105 * - Name 106 - Type 106 - Type 107 - Example 107 - Example 108 - Description 108 - Description 109 * - ``board.id`` 109 * - ``board.id`` 110 - fixed 110 - fixed 111 - K65390-000 111 - K65390-000 112 - The Product Board Assembly (PBA) ident 112 - The Product Board Assembly (PBA) identifier of the board. 113 * - ``cgu.id`` 113 * - ``cgu.id`` 114 - fixed 114 - fixed 115 - 36 115 - 36 116 - The Clock Generation Unit (CGU) hardwa 116 - The Clock Generation Unit (CGU) hardware revision identifier. 117 * - ``fw.mgmt`` 117 * - ``fw.mgmt`` 118 - running 118 - running 119 - 2.1.7 119 - 2.1.7 120 - 3-digit version number of the manageme 120 - 3-digit version number of the management firmware running on the 121 Embedded Management Processor of the d 121 Embedded Management Processor of the device. It controls the PHY, 122 link, access to device resources, etc. 122 link, access to device resources, etc. Intel documentation refers to 123 this as the EMP firmware. 123 this as the EMP firmware. 124 * - ``fw.mgmt.api`` 124 * - ``fw.mgmt.api`` 125 - running 125 - running 126 - 1.5.1 126 - 1.5.1 127 - 3-digit version number (major.minor.pa 127 - 3-digit version number (major.minor.patch) of the API exported over 128 the AdminQ by the management firmware. 128 the AdminQ by the management firmware. Used by the driver to 129 identify what commands are supported. 129 identify what commands are supported. Historical versions of the 130 kernel only displayed a 2-digit versio 130 kernel only displayed a 2-digit version number (major.minor). 131 * - ``fw.mgmt.build`` 131 * - ``fw.mgmt.build`` 132 - running 132 - running 133 - 0x305d955f 133 - 0x305d955f 134 - Unique identifier of the source for th 134 - Unique identifier of the source for the management firmware. 135 * - ``fw.undi`` 135 * - ``fw.undi`` 136 - running 136 - running 137 - 1.2581.0 137 - 1.2581.0 138 - Version of the Option ROM containing t 138 - Version of the Option ROM containing the UEFI driver. The version is 139 reported in ``major.minor.patch`` form 139 reported in ``major.minor.patch`` format. The major version is 140 incremented whenever a major breaking 140 incremented whenever a major breaking change occurs, or when the 141 minor version would overflow. The mino 141 minor version would overflow. The minor version is incremented for 142 non-breaking changes and reset to 1 wh 142 non-breaking changes and reset to 1 when the major version is 143 incremented. The patch version is norm 143 incremented. The patch version is normally 0 but is incremented when 144 a fix is delivered as a patch against 144 a fix is delivered as a patch against an older base Option ROM. 145 * - ``fw.psid.api`` 145 * - ``fw.psid.api`` 146 - running 146 - running 147 - 0.80 147 - 0.80 148 - Version defining the format of the fla 148 - Version defining the format of the flash contents. 149 * - ``fw.bundle_id`` 149 * - ``fw.bundle_id`` 150 - running 150 - running 151 - 0x80002ec0 151 - 0x80002ec0 152 - Unique identifier of the firmware imag 152 - Unique identifier of the firmware image file that was loaded onto 153 the device. Also referred to as the EE 153 the device. Also referred to as the EETRACK identifier of the NVM. 154 * - ``fw.app.name`` 154 * - ``fw.app.name`` 155 - running 155 - running 156 - ICE OS Default Package 156 - ICE OS Default Package 157 - The name of the DDP package that is ac 157 - The name of the DDP package that is active in the device. The DDP 158 package is loaded by the driver during 158 package is loaded by the driver during initialization. Each 159 variation of the DDP package has a uni 159 variation of the DDP package has a unique name. 160 * - ``fw.app`` 160 * - ``fw.app`` 161 - running 161 - running 162 - 1.3.1.0 162 - 1.3.1.0 163 - The version of the DDP package that is 163 - The version of the DDP package that is active in the device. Note 164 that both the name (as reported by ``f 164 that both the name (as reported by ``fw.app.name``) and version are 165 required to uniquely identify the pack 165 required to uniquely identify the package. 166 * - ``fw.app.bundle_id`` 166 * - ``fw.app.bundle_id`` 167 - running 167 - running 168 - 0xc0000001 168 - 0xc0000001 169 - Unique identifier for the DDP package 169 - Unique identifier for the DDP package loaded in the device. Also 170 referred to as the DDP Track ID. Can b 170 referred to as the DDP Track ID. Can be used to uniquely identify 171 the specific DDP package. 171 the specific DDP package. 172 * - ``fw.netlist`` 172 * - ``fw.netlist`` 173 - running 173 - running 174 - 1.1.2000-6.7.0 174 - 1.1.2000-6.7.0 175 - The version of the netlist module. Thi 175 - The version of the netlist module. This module defines the device's 176 Ethernet capabilities and default sett 176 Ethernet capabilities and default settings, and is used by the 177 management firmware as part of managin 177 management firmware as part of managing link and device 178 connectivity. 178 connectivity. 179 * - ``fw.netlist.build`` 179 * - ``fw.netlist.build`` 180 - running 180 - running 181 - 0xee16ced7 181 - 0xee16ced7 182 - The first 4 bytes of the hash of the n 182 - The first 4 bytes of the hash of the netlist module contents. 183 * - ``fw.cgu`` 183 * - ``fw.cgu`` 184 - running 184 - running 185 - 8032.16973825.6021 185 - 8032.16973825.6021 186 - The version of Clock Generation Unit ( 186 - The version of Clock Generation Unit (CGU). Format: 187 <CGU type>.<configuration version>.<fi 187 <CGU type>.<configuration version>.<firmware version>. 188 188 189 Flash Update 189 Flash Update 190 ============ 190 ============ 191 191 192 The ``ice`` driver implements support for flas 192 The ``ice`` driver implements support for flash update using the 193 ``devlink-flash`` interface. It supports updat 193 ``devlink-flash`` interface. It supports updating the device flash using a 194 combined flash image that contains the ``fw.mg 194 combined flash image that contains the ``fw.mgmt``, ``fw.undi``, and 195 ``fw.netlist`` components. 195 ``fw.netlist`` components. 196 196 197 .. list-table:: List of supported overwrite mo 197 .. list-table:: List of supported overwrite modes 198 :widths: 5 95 198 :widths: 5 95 199 199 200 * - Bits 200 * - Bits 201 - Behavior 201 - Behavior 202 * - ``DEVLINK_FLASH_OVERWRITE_SETTINGS`` 202 * - ``DEVLINK_FLASH_OVERWRITE_SETTINGS`` 203 - Do not preserve settings stored in the 203 - Do not preserve settings stored in the flash components being 204 updated. This includes overwriting the 204 updated. This includes overwriting the port configuration that 205 determines the number of physical funct 205 determines the number of physical functions the device will 206 initialize with. 206 initialize with. 207 * - ``DEVLINK_FLASH_OVERWRITE_SETTINGS`` an 207 * - ``DEVLINK_FLASH_OVERWRITE_SETTINGS`` and ``DEVLINK_FLASH_OVERWRITE_IDENTIFIERS`` 208 - Do not preserve either settings or iden 208 - Do not preserve either settings or identifiers. Overwrite everything 209 in the flash with the contents from the 209 in the flash with the contents from the provided image, without 210 performing any preservation. This inclu 210 performing any preservation. This includes overwriting device 211 identifying fields such as the MAC addr 211 identifying fields such as the MAC address, VPD area, and device 212 serial number. It is expected that this 212 serial number. It is expected that this combination be used with an 213 image customized for the specific devic 213 image customized for the specific device. 214 214 215 The ice hardware does not support overwriting 215 The ice hardware does not support overwriting only identifiers while 216 preserving settings, and thus ``DEVLINK_FLASH_ 216 preserving settings, and thus ``DEVLINK_FLASH_OVERWRITE_IDENTIFIERS`` on its 217 own will be rejected. If no overwrite mask is 217 own will be rejected. If no overwrite mask is provided, the firmware will be 218 instructed to preserve all settings and identi 218 instructed to preserve all settings and identifying fields when updating. 219 219 220 Reload 220 Reload 221 ====== 221 ====== 222 222 223 The ``ice`` driver supports activating new fir 223 The ``ice`` driver supports activating new firmware after a flash update 224 using ``DEVLINK_CMD_RELOAD`` with the ``DEVLIN 224 using ``DEVLINK_CMD_RELOAD`` with the ``DEVLINK_RELOAD_ACTION_FW_ACTIVATE`` 225 action. 225 action. 226 226 227 .. code:: shell 227 .. code:: shell 228 228 229 $ devlink dev reload pci/0000:01:00.0 relo 229 $ devlink dev reload pci/0000:01:00.0 reload action fw_activate 230 230 231 The new firmware is activated by issuing a dev 231 The new firmware is activated by issuing a device specific Embedded 232 Management Processor reset which requests the 232 Management Processor reset which requests the device to reset and reload the 233 EMP firmware image. 233 EMP firmware image. 234 234 235 The driver does not currently support reloadin 235 The driver does not currently support reloading the driver via 236 ``DEVLINK_RELOAD_ACTION_DRIVER_REINIT``. 236 ``DEVLINK_RELOAD_ACTION_DRIVER_REINIT``. 237 237 238 Port split 238 Port split 239 ========== 239 ========== 240 240 241 The ``ice`` driver supports port splitting onl 241 The ``ice`` driver supports port splitting only for port 0, as the FW has 242 a predefined set of available port split optio 242 a predefined set of available port split options for the whole device. 243 243 244 A system reboot is required for port split to 244 A system reboot is required for port split to be applied. 245 245 246 The following command will select the port spl 246 The following command will select the port split option with 4 ports: 247 247 248 .. code:: shell 248 .. code:: shell 249 249 250 $ devlink port split pci/0000:16:00.0/0 co 250 $ devlink port split pci/0000:16:00.0/0 count 4 251 251 252 The list of all available port options will be 252 The list of all available port options will be printed to dynamic debug after 253 each ``split`` and ``unsplit`` command. The fi 253 each ``split`` and ``unsplit`` command. The first option is the default. 254 254 255 .. code:: shell 255 .. code:: shell 256 256 257 ice 0000:16:00.0: Available port split opt 257 ice 0000:16:00.0: Available port split options and max port speeds (Gbps): 258 ice 0000:16:00.0: Status Split Quad 258 ice 0000:16:00.0: Status Split Quad 0 Quad 1 259 ice 0000:16:00.0: count L0 L1 L 259 ice 0000:16:00.0: count L0 L1 L2 L3 L4 L5 L6 L7 260 ice 0000:16:00.0: Active 2 100 - 260 ice 0000:16:00.0: Active 2 100 - - - 100 - - - 261 ice 0000:16:00.0: 2 50 - 5 261 ice 0000:16:00.0: 2 50 - 50 - - - - - 262 ice 0000:16:00.0: Pending 4 25 25 2 262 ice 0000:16:00.0: Pending 4 25 25 25 25 - - - - 263 ice 0000:16:00.0: 4 25 25 263 ice 0000:16:00.0: 4 25 25 - - 25 25 - - 264 ice 0000:16:00.0: 8 10 10 1 264 ice 0000:16:00.0: 8 10 10 10 10 10 10 10 10 265 ice 0000:16:00.0: 1 100 - 265 ice 0000:16:00.0: 1 100 - - - - - - - 266 266 267 There could be multiple FW port options with t 267 There could be multiple FW port options with the same port split count. When 268 the same port split count request is issued ag 268 the same port split count request is issued again, the next FW port option with 269 the same port split count will be selected. 269 the same port split count will be selected. 270 270 271 ``devlink port unsplit`` will select the optio 271 ``devlink port unsplit`` will select the option with a split count of 1. If 272 there is no FW option available with split cou 272 there is no FW option available with split count 1, you will receive an error. 273 273 274 Regions 274 Regions 275 ======= 275 ======= 276 276 277 The ``ice`` driver implements the following re 277 The ``ice`` driver implements the following regions for accessing internal 278 device data. 278 device data. 279 279 280 .. list-table:: regions implemented 280 .. list-table:: regions implemented 281 :widths: 15 85 281 :widths: 15 85 282 282 283 * - Name 283 * - Name 284 - Description 284 - Description 285 * - ``nvm-flash`` 285 * - ``nvm-flash`` 286 - The contents of the entire flash chip, 286 - The contents of the entire flash chip, sometimes referred to as 287 the device's Non Volatile Memory. 287 the device's Non Volatile Memory. 288 * - ``shadow-ram`` 288 * - ``shadow-ram`` 289 - The contents of the Shadow RAM, which 289 - The contents of the Shadow RAM, which is loaded from the beginning 290 of the flash. Although the contents ar 290 of the flash. Although the contents are primarily from the flash, 291 this area also contains data generated 291 this area also contains data generated during device boot which is 292 not stored in flash. 292 not stored in flash. 293 * - ``device-caps`` 293 * - ``device-caps`` 294 - The contents of the device firmware's 294 - The contents of the device firmware's capabilities buffer. Useful to 295 determine the current state and config 295 determine the current state and configuration of the device. 296 296 297 Both the ``nvm-flash`` and ``shadow-ram`` regi 297 Both the ``nvm-flash`` and ``shadow-ram`` regions can be accessed without a 298 snapshot. The ``device-caps`` region requires 298 snapshot. The ``device-caps`` region requires a snapshot as the contents are 299 sent by firmware and can't be split into separ 299 sent by firmware and can't be split into separate reads. 300 300 301 Users can request an immediate capture of a sn 301 Users can request an immediate capture of a snapshot for all three regions 302 via the ``DEVLINK_CMD_REGION_NEW`` command. 302 via the ``DEVLINK_CMD_REGION_NEW`` command. 303 303 304 .. code:: shell 304 .. code:: shell 305 305 306 $ devlink region show 306 $ devlink region show 307 pci/0000:01:00.0/nvm-flash: size 10485760 307 pci/0000:01:00.0/nvm-flash: size 10485760 snapshot [] max 1 308 pci/0000:01:00.0/device-caps: size 4096 sn 308 pci/0000:01:00.0/device-caps: size 4096 snapshot [] max 10 309 309 310 $ devlink region new pci/0000:01:00.0/nvm- 310 $ devlink region new pci/0000:01:00.0/nvm-flash snapshot 1 311 $ devlink region dump pci/0000:01:00.0/nvm 311 $ devlink region dump pci/0000:01:00.0/nvm-flash snapshot 1 312 312 313 $ devlink region dump pci/0000:01:00.0/nvm 313 $ devlink region dump pci/0000:01:00.0/nvm-flash snapshot 1 314 0000000000000000 0014 95dc 0014 9514 0035 314 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 315 0000000000000010 0000 0000 ffff ff04 0029 315 0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8 316 0000000000000020 0016 0bb8 0016 1720 0000 316 0000000000000020 0016 0bb8 0016 1720 0000 0000 c00f 3ffc 317 0000000000000030 bada cce5 bada cce5 bada 317 0000000000000030 bada cce5 bada cce5 bada cce5 bada cce5 318 318 319 $ devlink region read pci/0000:01:00.0/nvm 319 $ devlink region read pci/0000:01:00.0/nvm-flash snapshot 1 address 0 length 16 320 0000000000000000 0014 95dc 0014 9514 0035 320 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 321 321 322 $ devlink region delete pci/0000:01:00.0/n 322 $ devlink region delete pci/0000:01:00.0/nvm-flash snapshot 1 323 323 324 $ devlink region new pci/0000:01:00.0/devi 324 $ devlink region new pci/0000:01:00.0/device-caps snapshot 1 325 $ devlink region dump pci/0000:01:00.0/dev 325 $ devlink region dump pci/0000:01:00.0/device-caps snapshot 1 326 0000000000000000 01 00 01 00 00 00 00 00 0 326 0000000000000000 01 00 01 00 00 00 00 00 01 00 00 00 00 00 00 00 327 0000000000000010 00 00 00 00 00 00 00 00 0 327 0000000000000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 328 0000000000000020 02 00 02 01 32 03 00 00 0 328 0000000000000020 02 00 02 01 32 03 00 00 0a 00 00 00 25 00 00 00 329 0000000000000030 00 00 00 00 00 00 00 00 0 329 0000000000000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 330 0000000000000040 04 00 01 00 01 00 00 00 0 330 0000000000000040 04 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 331 0000000000000050 00 00 00 00 00 00 00 00 0 331 0000000000000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 332 0000000000000060 05 00 01 00 03 00 00 00 0 332 0000000000000060 05 00 01 00 03 00 00 00 00 00 00 00 00 00 00 00 333 0000000000000070 00 00 00 00 00 00 00 00 0 333 0000000000000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 334 0000000000000080 06 00 01 00 01 00 00 00 0 334 0000000000000080 06 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 335 0000000000000090 00 00 00 00 00 00 00 00 0 335 0000000000000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 336 00000000000000a0 08 00 01 00 00 00 00 00 0 336 00000000000000a0 08 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 337 00000000000000b0 00 00 00 00 00 00 00 00 0 337 00000000000000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 338 00000000000000c0 12 00 01 00 01 00 00 00 0 338 00000000000000c0 12 00 01 00 01 00 00 00 01 00 01 00 00 00 00 00 339 00000000000000d0 00 00 00 00 00 00 00 00 0 339 00000000000000d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 340 00000000000000e0 13 00 01 00 00 01 00 00 0 340 00000000000000e0 13 00 01 00 00 01 00 00 00 00 00 00 00 00 00 00 341 00000000000000f0 00 00 00 00 00 00 00 00 0 341 00000000000000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 342 0000000000000100 14 00 01 00 01 00 00 00 0 342 0000000000000100 14 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 343 0000000000000110 00 00 00 00 00 00 00 00 0 343 0000000000000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 344 0000000000000120 15 00 01 00 01 00 00 00 0 344 0000000000000120 15 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 345 0000000000000130 00 00 00 00 00 00 00 00 0 345 0000000000000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 346 0000000000000140 16 00 01 00 01 00 00 00 0 346 0000000000000140 16 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 347 0000000000000150 00 00 00 00 00 00 00 00 0 347 0000000000000150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 348 0000000000000160 17 00 01 00 06 00 00 00 0 348 0000000000000160 17 00 01 00 06 00 00 00 00 00 00 00 00 00 00 00 349 0000000000000170 00 00 00 00 00 00 00 00 0 349 0000000000000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 350 0000000000000180 18 00 01 00 01 00 00 00 0 350 0000000000000180 18 00 01 00 01 00 00 00 01 00 00 00 08 00 00 00 351 0000000000000190 00 00 00 00 00 00 00 00 0 351 0000000000000190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 352 00000000000001a0 22 00 01 00 01 00 00 00 0 352 00000000000001a0 22 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 353 00000000000001b0 00 00 00 00 00 00 00 00 0 353 00000000000001b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 354 00000000000001c0 40 00 01 00 00 08 00 00 0 354 00000000000001c0 40 00 01 00 00 08 00 00 08 00 00 00 00 00 00 00 355 00000000000001d0 00 00 00 00 00 00 00 00 0 355 00000000000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 356 00000000000001e0 41 00 01 00 00 08 00 00 0 356 00000000000001e0 41 00 01 00 00 08 00 00 00 00 00 00 00 00 00 00 357 00000000000001f0 00 00 00 00 00 00 00 00 0 357 00000000000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 358 0000000000000200 42 00 01 00 00 08 00 00 0 358 0000000000000200 42 00 01 00 00 08 00 00 00 00 00 00 00 00 00 00 359 0000000000000210 00 00 00 00 00 00 00 00 0 359 0000000000000210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 360 360 361 $ devlink region delete pci/0000:01:00.0/d 361 $ devlink region delete pci/0000:01:00.0/device-caps snapshot 1 362 362 363 Devlink Rate 363 Devlink Rate 364 ============ 364 ============ 365 365 366 The ``ice`` driver implements devlink-rate API 366 The ``ice`` driver implements devlink-rate API. It allows for offload of 367 the Hierarchical QoS to the hardware. It enabl 367 the Hierarchical QoS to the hardware. It enables user to group Virtual 368 Functions in a tree structure and assign suppo 368 Functions in a tree structure and assign supported parameters: tx_share, 369 tx_max, tx_priority and tx_weight to each node 369 tx_max, tx_priority and tx_weight to each node in a tree. So effectively 370 user gains an ability to control how much band 370 user gains an ability to control how much bandwidth is allocated for each 371 VF group. This is later enforced by the HW. 371 VF group. This is later enforced by the HW. 372 372 373 It is assumed that this feature is mutually ex 373 It is assumed that this feature is mutually exclusive with DCB performed 374 in FW and ADQ, or any driver feature that woul 374 in FW and ADQ, or any driver feature that would trigger changes in QoS, 375 for example creation of the new traffic class. 375 for example creation of the new traffic class. The driver will prevent DCB 376 or ADQ configuration if user started making an 376 or ADQ configuration if user started making any changes to the nodes using 377 devlink-rate API. To configure those features 377 devlink-rate API. To configure those features a driver reload is necessary. 378 Correspondingly if ADQ or DCB will get configu 378 Correspondingly if ADQ or DCB will get configured the driver won't export 379 hierarchy at all, or will remove the untouched 379 hierarchy at all, or will remove the untouched hierarchy if those 380 features are enabled after the hierarchy is ex 380 features are enabled after the hierarchy is exported, but before any 381 changes are made. 381 changes are made. 382 382 383 This feature is also dependent on switchdev be 383 This feature is also dependent on switchdev being enabled in the system. 384 It's required because devlink-rate requires de 384 It's required because devlink-rate requires devlink-port objects to be 385 present, and those objects are only created in 385 present, and those objects are only created in switchdev mode. 386 386 387 If the driver is set to the switchdev mode, it 387 If the driver is set to the switchdev mode, it will export internal 388 hierarchy the moment VF's are created. Root of 388 hierarchy the moment VF's are created. Root of the tree is always 389 represented by the node_0. This node can't be 389 represented by the node_0. This node can't be deleted by the user. Leaf 390 nodes and nodes with children also can't be de 390 nodes and nodes with children also can't be deleted. 391 391 392 .. list-table:: Attributes supported 392 .. list-table:: Attributes supported 393 :widths: 15 85 393 :widths: 15 85 394 394 395 * - Name 395 * - Name 396 - Description 396 - Description 397 * - ``tx_max`` 397 * - ``tx_max`` 398 - maximum bandwidth to be consumed by th 398 - maximum bandwidth to be consumed by the tree Node. Rate Limit is 399 an absolute number specifying a maximu 399 an absolute number specifying a maximum amount of bytes a Node may 400 consume during the course of one secon 400 consume during the course of one second. Rate limit guarantees 401 that a link will not oversaturate the 401 that a link will not oversaturate the receiver on the remote end 402 and also enforces an SLA between the s 402 and also enforces an SLA between the subscriber and network 403 provider. 403 provider. 404 * - ``tx_share`` 404 * - ``tx_share`` 405 - minimum bandwidth allocated to a tree 405 - minimum bandwidth allocated to a tree node when it is not blocked. 406 It specifies an absolute BW. While tx_ 406 It specifies an absolute BW. While tx_max defines the maximum 407 bandwidth the node may consume, the tx 407 bandwidth the node may consume, the tx_share marks committed BW 408 for the Node. 408 for the Node. 409 * - ``tx_priority`` 409 * - ``tx_priority`` 410 - allows for usage of strict priority ar 410 - allows for usage of strict priority arbiter among siblings. This 411 arbitration scheme attempts to schedul 411 arbitration scheme attempts to schedule nodes based on their 412 priority as long as the nodes remain w 412 priority as long as the nodes remain within their bandwidth limit. 413 Range 0-7. Nodes with priority 7 have 413 Range 0-7. Nodes with priority 7 have the highest priority and are 414 selected first, while nodes with prior 414 selected first, while nodes with priority 0 have the lowest 415 priority. Nodes that have the same pri 415 priority. Nodes that have the same priority are treated equally. 416 * - ``tx_weight`` 416 * - ``tx_weight`` 417 - allows for usage of Weighted Fair Queu 417 - allows for usage of Weighted Fair Queuing arbitration scheme among 418 siblings. This arbitration scheme can 418 siblings. This arbitration scheme can be used simultaneously with 419 the strict priority. Range 1-200. Only 419 the strict priority. Range 1-200. Only relative values matter for 420 arbitration. 420 arbitration. 421 421 422 ``tx_priority`` and ``tx_weight`` can be used 422 ``tx_priority`` and ``tx_weight`` can be used simultaneously. In that case 423 nodes with the same priority form a WFQ subgro 423 nodes with the same priority form a WFQ subgroup in the sibling group 424 and arbitration among them is based on assigne 424 and arbitration among them is based on assigned weights. 425 425 426 .. code:: shell 426 .. code:: shell 427 427 428 # enable switchdev 428 # enable switchdev 429 $ devlink dev eswitch set pci/0000:4b:00.0 429 $ devlink dev eswitch set pci/0000:4b:00.0 mode switchdev 430 430 431 # at this point driver should export inter 431 # at this point driver should export internal hierarchy 432 $ echo 2 > /sys/class/net/ens785np0/device 432 $ echo 2 > /sys/class/net/ens785np0/device/sriov_numvfs 433 433 434 $ devlink port function rate show 434 $ devlink port function rate show 435 pci/0000:4b:00.0/node_25: type node parent 435 pci/0000:4b:00.0/node_25: type node parent node_24 436 pci/0000:4b:00.0/node_24: type node parent 436 pci/0000:4b:00.0/node_24: type node parent node_0 437 pci/0000:4b:00.0/node_32: type node parent 437 pci/0000:4b:00.0/node_32: type node parent node_31 438 pci/0000:4b:00.0/node_31: type node parent 438 pci/0000:4b:00.0/node_31: type node parent node_30 439 pci/0000:4b:00.0/node_30: type node parent 439 pci/0000:4b:00.0/node_30: type node parent node_16 440 pci/0000:4b:00.0/node_19: type node parent 440 pci/0000:4b:00.0/node_19: type node parent node_18 441 pci/0000:4b:00.0/node_18: type node parent 441 pci/0000:4b:00.0/node_18: type node parent node_17 442 pci/0000:4b:00.0/node_17: type node parent 442 pci/0000:4b:00.0/node_17: type node parent node_16 443 pci/0000:4b:00.0/node_14: type node parent 443 pci/0000:4b:00.0/node_14: type node parent node_5 444 pci/0000:4b:00.0/node_5: type node parent 444 pci/0000:4b:00.0/node_5: type node parent node_3 445 pci/0000:4b:00.0/node_13: type node parent 445 pci/0000:4b:00.0/node_13: type node parent node_4 446 pci/0000:4b:00.0/node_12: type node parent 446 pci/0000:4b:00.0/node_12: type node parent node_4 447 pci/0000:4b:00.0/node_11: type node parent 447 pci/0000:4b:00.0/node_11: type node parent node_4 448 pci/0000:4b:00.0/node_10: type node parent 448 pci/0000:4b:00.0/node_10: type node parent node_4 449 pci/0000:4b:00.0/node_9: type node parent 449 pci/0000:4b:00.0/node_9: type node parent node_4 450 pci/0000:4b:00.0/node_8: type node parent 450 pci/0000:4b:00.0/node_8: type node parent node_4 451 pci/0000:4b:00.0/node_7: type node parent 451 pci/0000:4b:00.0/node_7: type node parent node_4 452 pci/0000:4b:00.0/node_6: type node parent 452 pci/0000:4b:00.0/node_6: type node parent node_4 453 pci/0000:4b:00.0/node_4: type node parent 453 pci/0000:4b:00.0/node_4: type node parent node_3 454 pci/0000:4b:00.0/node_3: type node parent 454 pci/0000:4b:00.0/node_3: type node parent node_16 455 pci/0000:4b:00.0/node_16: type node parent 455 pci/0000:4b:00.0/node_16: type node parent node_15 456 pci/0000:4b:00.0/node_15: type node parent 456 pci/0000:4b:00.0/node_15: type node parent node_0 457 pci/0000:4b:00.0/node_2: type node parent 457 pci/0000:4b:00.0/node_2: type node parent node_1 458 pci/0000:4b:00.0/node_1: type node parent 458 pci/0000:4b:00.0/node_1: type node parent node_0 459 pci/0000:4b:00.0/node_0: type node 459 pci/0000:4b:00.0/node_0: type node 460 pci/0000:4b:00.0/1: type leaf parent node_ 460 pci/0000:4b:00.0/1: type leaf parent node_25 461 pci/0000:4b:00.0/2: type leaf parent node_ 461 pci/0000:4b:00.0/2: type leaf parent node_25 462 462 463 # let's create some custom node 463 # let's create some custom node 464 $ devlink port function rate add pci/0000: 464 $ devlink port function rate add pci/0000:4b:00.0/node_custom parent node_0 465 465 466 # second custom node 466 # second custom node 467 $ devlink port function rate add pci/0000: 467 $ devlink port function rate add pci/0000:4b:00.0/node_custom_1 parent node_custom 468 468 469 # reassign second VF to newly created bran 469 # reassign second VF to newly created branch 470 $ devlink port function rate set pci/0000: 470 $ devlink port function rate set pci/0000:4b:00.0/2 parent node_custom_1 471 471 472 # assign tx_weight to the VF 472 # assign tx_weight to the VF 473 $ devlink port function rate set pci/0000: 473 $ devlink port function rate set pci/0000:4b:00.0/2 tx_weight 5 474 474 475 # assign tx_share to the VF 475 # assign tx_share to the VF 476 $ devlink port function rate set pci/0000: 476 $ devlink port function rate set pci/0000:4b:00.0/2 tx_share 500Mbps
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.