1 ============================================== 1 ================================================================= 2 Intel Omni-Path (OPA) Virtual Network Interfac 2 Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC) 3 ============================================== 3 ================================================================= 4 4 5 Intel Omni-Path (OPA) Virtual Network Interfac 5 Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC) feature 6 supports Ethernet functionality over Omni-Path 6 supports Ethernet functionality over Omni-Path fabric by encapsulating 7 the Ethernet packets between HFI nodes. 7 the Ethernet packets between HFI nodes. 8 8 9 Architecture 9 Architecture 10 ============= 10 ============= 11 The patterns of exchanges of Omni-Path encapsu 11 The patterns of exchanges of Omni-Path encapsulated Ethernet packets 12 involves one or more virtual Ethernet switches 12 involves one or more virtual Ethernet switches overlaid on the Omni-Path 13 fabric topology. A subset of HFI nodes on the 13 fabric topology. A subset of HFI nodes on the Omni-Path fabric are 14 permitted to exchange encapsulated Ethernet pa 14 permitted to exchange encapsulated Ethernet packets across a particular 15 virtual Ethernet switch. The virtual Ethernet 15 virtual Ethernet switch. The virtual Ethernet switches are logical 16 abstractions achieved by configuring the HFI n 16 abstractions achieved by configuring the HFI nodes on the fabric for 17 header generation and processing. In the simpl 17 header generation and processing. In the simplest configuration all HFI 18 nodes across the fabric exchange encapsulated 18 nodes across the fabric exchange encapsulated Ethernet packets over a 19 single virtual Ethernet switch. A virtual Ethe 19 single virtual Ethernet switch. A virtual Ethernet switch, is effectively 20 an independent Ethernet network. The configura 20 an independent Ethernet network. The configuration is performed by an 21 Ethernet Manager (EM) which is part of the tru 21 Ethernet Manager (EM) which is part of the trusted Fabric Manager (FM) 22 application. HFI nodes can have multiple VNICs 22 application. HFI nodes can have multiple VNICs each connected to a 23 different virtual Ethernet switch. The below d 23 different virtual Ethernet switch. The below diagram presents a case 24 of two virtual Ethernet switches with two HFI 24 of two virtual Ethernet switches with two HFI nodes:: 25 25 26 +-------------- 26 +-------------------+ 27 | Subnet/ 27 | Subnet/ | 28 | Ethernet 28 | Ethernet | 29 | Manager 29 | Manager | 30 +-------------- 30 +-------------------+ 31 / / 31 / / 32 / / 32 / / 33 / / 33 / / 34 / / 34 / / 35 +-----------------------------+ +---------- 35 +-----------------------------+ +------------------------------+ 36 | Virtual Ethernet Switch | | Virtual 36 | Virtual Ethernet Switch | | Virtual Ethernet Switch | 37 | +---------+ +---------+ | | +-------- 37 | +---------+ +---------+ | | +---------+ +---------+ | 38 | | VPORT | | VPORT | | | | VPORT 38 | | VPORT | | VPORT | | | | VPORT | | VPORT | | 39 +--+---------+----+---------+-+ +-+-------- 39 +--+---------+----+---------+-+ +-+---------+----+---------+---+ 40 | \ / 40 | \ / | 41 | \ / 41 | \ / | 42 | \/ 42 | \/ | 43 | / \ 43 | / \ | 44 | / \ 44 | / \ | 45 +-----------+------------+ +---------- 45 +-----------+------------+ +-----------+------------+ 46 | VNIC | VNIC | | VNIC 46 | VNIC | VNIC | | VNIC | VNIC | 47 +-----------+------------+ +---------- 47 +-----------+------------+ +-----------+------------+ 48 | HFI | | 48 | HFI | | HFI | 49 +------------------------+ +---------- 49 +------------------------+ +------------------------+ 50 50 51 51 52 The Omni-Path encapsulated Ethernet packet for 52 The Omni-Path encapsulated Ethernet packet format is as described below. 53 53 54 ==================== ========================= 54 ==================== ================================ 55 Bits Field 55 Bits Field 56 ==================== ========================= 56 ==================== ================================ 57 Quad Word 0: 57 Quad Word 0: 58 0-19 SLID (lower 20 bits) 58 0-19 SLID (lower 20 bits) 59 20-30 Length (in Quad Words) 59 20-30 Length (in Quad Words) 60 31 BECN bit 60 31 BECN bit 61 32-51 DLID (lower 20 bits) 61 32-51 DLID (lower 20 bits) 62 52-56 SC (Service Class) 62 52-56 SC (Service Class) 63 57-59 RC (Routing Control) 63 57-59 RC (Routing Control) 64 60 FECN bit 64 60 FECN bit 65 61-62 L2 (=10, 16B format) 65 61-62 L2 (=10, 16B format) 66 63 LT (=1, Link Transfer Hea 66 63 LT (=1, Link Transfer Head Flit) 67 67 68 Quad Word 1: 68 Quad Word 1: 69 0-7 L4 type (=0x78 ETHERNET) 69 0-7 L4 type (=0x78 ETHERNET) 70 8-11 SLID[23:20] 70 8-11 SLID[23:20] 71 12-15 DLID[23:20] 71 12-15 DLID[23:20] 72 16-31 PKEY 72 16-31 PKEY 73 32-47 Entropy 73 32-47 Entropy 74 48-63 Reserved 74 48-63 Reserved 75 75 76 Quad Word 2: 76 Quad Word 2: 77 0-15 Reserved 77 0-15 Reserved 78 16-31 L4 header 78 16-31 L4 header 79 32-63 Ethernet Packet 79 32-63 Ethernet Packet 80 80 81 Quad Words 3 to N-1: 81 Quad Words 3 to N-1: 82 0-63 Ethernet packet (pad exte 82 0-63 Ethernet packet (pad extended) 83 83 84 Quad Word N (last): 84 Quad Word N (last): 85 0-23 Ethernet packet (pad exte 85 0-23 Ethernet packet (pad extended) 86 24-55 ICRC 86 24-55 ICRC 87 56-61 Tail 87 56-61 Tail 88 62-63 LT (=01, Link Transfer Ta 88 62-63 LT (=01, Link Transfer Tail Flit) 89 ==================== ========================= 89 ==================== ================================ 90 90 91 Ethernet packet is padded on the transmit side 91 Ethernet packet is padded on the transmit side to ensure that the VNIC OPA 92 packet is quad word aligned. The 'Tail' field 92 packet is quad word aligned. The 'Tail' field contains the number of bytes 93 padded. On the receive side the 'Tail' field i 93 padded. On the receive side the 'Tail' field is read and the padding is 94 removed (along with ICRC, Tail and OPA header) 94 removed (along with ICRC, Tail and OPA header) before passing packet up 95 the network stack. 95 the network stack. 96 96 97 The L4 header field contains the virtual Ether 97 The L4 header field contains the virtual Ethernet switch id the VNIC port 98 belongs to. On the receive side, this field is 98 belongs to. On the receive side, this field is used to de-multiplex the 99 received VNIC packets to different VNIC ports. 99 received VNIC packets to different VNIC ports. 100 100 101 Driver Design 101 Driver Design 102 ============== 102 ============== 103 Intel OPA VNIC software design is presented in 103 Intel OPA VNIC software design is presented in the below diagram. 104 OPA VNIC functionality has a HW dependent comp 104 OPA VNIC functionality has a HW dependent component and a HW 105 independent component. 105 independent component. 106 106 107 The support has been added for IB device to al 107 The support has been added for IB device to allocate and free the RDMA 108 netdev devices. The RDMA netdev supports inter 108 netdev devices. The RDMA netdev supports interfacing with the network 109 stack thus creating standard network interface 109 stack thus creating standard network interfaces. OPA_VNIC is an RDMA 110 netdev device type. 110 netdev device type. 111 111 112 The HW dependent VNIC functionality is part of 112 The HW dependent VNIC functionality is part of the HFI1 driver. It 113 implements the verbs to allocate and free the 113 implements the verbs to allocate and free the OPA_VNIC RDMA netdev. 114 It involves HW resource allocation/management 114 It involves HW resource allocation/management for VNIC functionality. 115 It interfaces with the network stack and imple 115 It interfaces with the network stack and implements the required 116 net_device_ops functions. It expects Omni-Path 116 net_device_ops functions. It expects Omni-Path encapsulated Ethernet 117 packets in the transmit path and provides HW a 117 packets in the transmit path and provides HW access to them. It strips 118 the Omni-Path header from the received packets 118 the Omni-Path header from the received packets before passing them up 119 the network stack. It also implements the RDMA 119 the network stack. It also implements the RDMA netdev control operations. 120 120 121 The OPA VNIC module implements the HW independ 121 The OPA VNIC module implements the HW independent VNIC functionality. 122 It consists of two parts. The VNIC Ethernet Ma 122 It consists of two parts. The VNIC Ethernet Management Agent (VEMA) 123 registers itself with IB core as an IB client 123 registers itself with IB core as an IB client and interfaces with the 124 IB MAD stack. It exchanges the management info 124 IB MAD stack. It exchanges the management information with the Ethernet 125 Manager (EM) and the VNIC netdev. The VNIC net 125 Manager (EM) and the VNIC netdev. The VNIC netdev part allocates and frees 126 the OPA_VNIC RDMA netdev devices. It overrides 126 the OPA_VNIC RDMA netdev devices. It overrides the net_device_ops functions 127 set by HW dependent VNIC driver where required 127 set by HW dependent VNIC driver where required to accommodate any control 128 operation. It also handles the encapsulation o 128 operation. It also handles the encapsulation of Ethernet packets with an 129 Omni-Path header in the transmit path. For eac 129 Omni-Path header in the transmit path. For each VNIC interface, the 130 information required for encapsulation is conf 130 information required for encapsulation is configured by the EM via VEMA MAD 131 interface. It also passes any control informat 131 interface. It also passes any control information to the HW dependent driver 132 by invoking the RDMA netdev control operations 132 by invoking the RDMA netdev control operations:: 133 133 134 +-------------------+ +--------------- 134 +-------------------+ +----------------------+ 135 | | | Linux 135 | | | Linux | 136 | IB MAD | | Network 136 | IB MAD | | Network | 137 | | | Stack 137 | | | Stack | 138 +-------------------+ +--------------- 138 +-------------------+ +----------------------+ 139 | | | 139 | | | 140 | | | 140 | | | 141 +----------------------------+ | 141 +----------------------------+ | 142 | | | 142 | | | 143 | OPA VNIC Module | | 143 | OPA VNIC Module | | 144 | (OPA VNIC RDMA Netdev | | 144 | (OPA VNIC RDMA Netdev | | 145 | & EMA functions) | | 145 | & EMA functions) | | 146 | | | 146 | | | 147 +----------------------------+ | 147 +----------------------------+ | 148 | | 148 | | 149 | | 149 | | 150 +------------------+ | 150 +------------------+ | 151 | IB core | | 151 | IB core | | 152 +------------------+ | 152 +------------------+ | 153 | | 153 | | 154 | | 154 | | 155 +------------------------------------- 155 +--------------------------------------------+ 156 | 156 | | 157 | HFI1 Driver with VNIC support 157 | HFI1 Driver with VNIC support | 158 | 158 | | 159 +------------------------------------- 159 +--------------------------------------------+
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.