1 .. SPDX-License-Identifier: GPL-2.0+ 1 .. SPDX-License-Identifier: GPL-2.0+ 2 << 3 ============================================== 2 ====================================================== 4 IBM Virtual Management Channel Kernel Driver ( 3 IBM Virtual Management Channel Kernel Driver (IBMVMC) 5 ============================================== 4 ====================================================== 6 5 7 :Authors: 6 :Authors: 8 Dave Engebretsen <engebret@us.ibm.com>, 7 Dave Engebretsen <engebret@us.ibm.com>, 9 Adam Reznechek <adreznec@linux.vnet.ibm 8 Adam Reznechek <adreznec@linux.vnet.ibm.com>, 10 Steven Royer <seroyer@linux.vnet.ibm.co 9 Steven Royer <seroyer@linux.vnet.ibm.com>, 11 Bryant G. Ly <bryantly@linux.vnet.ibm.c 10 Bryant G. Ly <bryantly@linux.vnet.ibm.com>, 12 11 13 Introduction 12 Introduction 14 ============ 13 ============ 15 14 16 Note: Knowledge of virtualization technology i 15 Note: Knowledge of virtualization technology is required to understand 17 this document. 16 this document. 18 17 19 A good reference document would be: 18 A good reference document would be: 20 19 21 https://openpowerfoundation.org/wp-content/upl 20 https://openpowerfoundation.org/wp-content/uploads/2016/05/LoPAPR_DRAFT_v11_24March2016_cmt1.pdf 22 21 23 The Virtual Management Channel (VMC) is a logi 22 The Virtual Management Channel (VMC) is a logical device which provides an 24 interface between the hypervisor and a managem 23 interface between the hypervisor and a management partition. This interface 25 is like a message passing interface. This mana 24 is like a message passing interface. This management partition is intended 26 to provide an alternative to systems that use 25 to provide an alternative to systems that use a Hardware Management 27 Console (HMC) - based system management. 26 Console (HMC) - based system management. 28 27 29 The primary hardware management solution that 28 The primary hardware management solution that is developed by IBM relies 30 on an appliance server named the Hardware Mana 29 on an appliance server named the Hardware Management Console (HMC), 31 packaged as an external tower or rack-mounted 30 packaged as an external tower or rack-mounted personal computer. In a 32 Power Systems environment, a single HMC can ma 31 Power Systems environment, a single HMC can manage multiple POWER 33 processor-based systems. 32 processor-based systems. 34 33 35 Management Application 34 Management Application 36 ---------------------- 35 ---------------------- 37 36 38 In the management partition, a management appl 37 In the management partition, a management application exists which enables 39 a system administrator to configure the system 38 a system administrator to configure the system’s partitioning 40 characteristics via a command line interface ( 39 characteristics via a command line interface (CLI) or Representational 41 State Transfer Application (REST API's). 40 State Transfer Application (REST API's). 42 41 43 The management application runs on a Linux log 42 The management application runs on a Linux logical partition on a 44 POWER8 or newer processor-based server that is 43 POWER8 or newer processor-based server that is virtualized by PowerVM. 45 System configuration, maintenance, and control 44 System configuration, maintenance, and control functions which 46 traditionally require an HMC can be implemente 45 traditionally require an HMC can be implemented in the management 47 application using a combination of HMC to hype 46 application using a combination of HMC to hypervisor interfaces and 48 existing operating system methods. This tool p 47 existing operating system methods. This tool provides a subset of the 49 functions implemented by the HMC and enables b 48 functions implemented by the HMC and enables basic partition configuration. 50 The set of HMC to hypervisor messages supporte 49 The set of HMC to hypervisor messages supported by the management 51 application component are passed to the hyperv 50 application component are passed to the hypervisor over a VMC interface, 52 which is defined below. 51 which is defined below. 53 52 54 The VMC enables the management partition to pr 53 The VMC enables the management partition to provide basic partitioning 55 functions: 54 functions: 56 55 57 - Logical Partitioning Configuration 56 - Logical Partitioning Configuration 58 - Start, and stop actions for individual parti 57 - Start, and stop actions for individual partitions 59 - Display of partition status 58 - Display of partition status 60 - Management of virtual Ethernet 59 - Management of virtual Ethernet 61 - Management of virtual Storage 60 - Management of virtual Storage 62 - Basic system management 61 - Basic system management 63 62 64 Virtual Management Channel (VMC) 63 Virtual Management Channel (VMC) 65 -------------------------------- 64 -------------------------------- 66 65 67 A logical device, called the Virtual Managemen 66 A logical device, called the Virtual Management Channel (VMC), is defined 68 for communicating between the management appli 67 for communicating between the management application and the hypervisor. It 69 basically creates the pipes that enable virtua 68 basically creates the pipes that enable virtualization management 70 software. This device is presented to a design 69 software. This device is presented to a designated management partition as 71 a virtual device. 70 a virtual device. 72 71 73 This communication device uses Command/Respons 72 This communication device uses Command/Response Queue (CRQ) and the 74 Remote Direct Memory Access (RDMA) interfaces. 73 Remote Direct Memory Access (RDMA) interfaces. A three-way handshake is 75 defined that must take place to establish that 74 defined that must take place to establish that both the hypervisor and 76 management partition sides of the channel are 75 management partition sides of the channel are running prior to 77 sending/receiving any of the protocol messages 76 sending/receiving any of the protocol messages. 78 77 79 This driver also utilizes Transport Event CRQs 78 This driver also utilizes Transport Event CRQs. CRQ messages are sent 80 when the hypervisor detects one of the peer pa 79 when the hypervisor detects one of the peer partitions has abnormally 81 terminated, or one side has called H_FREE_CRQ 80 terminated, or one side has called H_FREE_CRQ to close their CRQ. 82 Two new classes of CRQ messages are introduced 81 Two new classes of CRQ messages are introduced for the VMC device. VMC 83 Administrative messages are used for each part 82 Administrative messages are used for each partition using the VMC to 84 communicate capabilities to their partner. HMC 83 communicate capabilities to their partner. HMC Interface messages are used 85 for the actual flow of HMC messages between th 84 for the actual flow of HMC messages between the management partition and 86 the hypervisor. As most HMC messages far excee 85 the hypervisor. As most HMC messages far exceed the size of a CRQ buffer, 87 a virtual DMA (RMDA) of the HMC message data i 86 a virtual DMA (RMDA) of the HMC message data is done prior to each HMC 88 Interface CRQ message. Only the management par 87 Interface CRQ message. Only the management partition drives RDMA 89 operations; hypervisors never directly cause t 88 operations; hypervisors never directly cause the movement of message data. 90 89 91 90 92 Terminology 91 Terminology 93 ----------- 92 ----------- 94 RDMA 93 RDMA 95 Remote Direct Memory Access is DMA tra 94 Remote Direct Memory Access is DMA transfer from the server to its 96 client or from the server to its partn 95 client or from the server to its partner partition. DMA refers 97 to both physical I/O to and from memor 96 to both physical I/O to and from memory operations and to memory 98 to memory move operations. 97 to memory move operations. 99 CRQ 98 CRQ 100 Command/Response Queue a facility whic 99 Command/Response Queue a facility which is used to communicate 101 between partner partitions. Transport 100 between partner partitions. Transport events which are signaled 102 from the hypervisor to partition are a 101 from the hypervisor to partition are also reported in this queue. 103 102 104 Example Management Partition VMC Driver Interf 103 Example Management Partition VMC Driver Interface 105 ============================================== 104 ================================================= 106 105 107 This section provides an example for the manag 106 This section provides an example for the management application 108 implementation where a device driver is used t 107 implementation where a device driver is used to interface to the VMC 109 device. This driver consists of a new device, 108 device. This driver consists of a new device, for example /dev/ibmvmc, 110 which provides interfaces to open, close, read 109 which provides interfaces to open, close, read, write, and perform 111 ioctl’s against the VMC device. 110 ioctl’s against the VMC device. 112 111 113 VMC Interface Initialization 112 VMC Interface Initialization 114 ---------------------------- 113 ---------------------------- 115 114 116 The device driver is responsible for initializ 115 The device driver is responsible for initializing the VMC when the driver 117 is loaded. It first creates and initializes th 116 is loaded. It first creates and initializes the CRQ. Next, an exchange of 118 VMC capabilities is performed to indicate the 117 VMC capabilities is performed to indicate the code version and number of 119 resources available in both the management par 118 resources available in both the management partition and the hypervisor. 120 Finally, the hypervisor requests that the mana 119 Finally, the hypervisor requests that the management partition create an 121 initial pool of VMC buffers, one buffer for ea 120 initial pool of VMC buffers, one buffer for each possible HMC connection, 122 which will be used for management application 121 which will be used for management application session initialization. 123 Prior to completion of this initialization seq 122 Prior to completion of this initialization sequence, the device returns 124 EBUSY to open() calls. EIO is returned for all 123 EBUSY to open() calls. EIO is returned for all open() failures. 125 124 126 :: 125 :: 127 126 128 Management Partition Hyperv 127 Management Partition Hypervisor 129 CRQ INIT 128 CRQ INIT 130 -------------------------------------- 129 ----------------------------------------> 131 CRQ INIT COMPLETE 130 CRQ INIT COMPLETE 132 <------------------------------------- 131 <---------------------------------------- 133 CAPABILITIES 132 CAPABILITIES 134 -------------------------------------- 133 ----------------------------------------> 135 CAPABILITIES RESPONSE 134 CAPABILITIES RESPONSE 136 <------------------------------------- 135 <---------------------------------------- 137 ADD BUFFER (HMC IDX=0,1,..) 136 ADD BUFFER (HMC IDX=0,1,..) _ 138 <------------------------------------- 137 <---------------------------------------- | 139 ADD BUFFER RESPONSE 138 ADD BUFFER RESPONSE | - Perform # HMCs Iterations 140 -------------------------------------- 139 ----------------------------------------> - 141 140 142 VMC Interface Open 141 VMC Interface Open 143 ------------------ 142 ------------------ 144 143 145 After the basic VMC channel has been initializ 144 After the basic VMC channel has been initialized, an HMC session level 146 connection can be established. The application 145 connection can be established. The application layer performs an open() to 147 the VMC device and executes an ioctl() against 146 the VMC device and executes an ioctl() against it, indicating the HMC ID 148 (32 bytes of data) for this session. If the VM 147 (32 bytes of data) for this session. If the VMC device is in an invalid 149 state, EIO will be returned for the ioctl(). T 148 state, EIO will be returned for the ioctl(). The device driver creates a 150 new HMC session value (ranging from 1 to 255) 149 new HMC session value (ranging from 1 to 255) and HMC index value (starting 151 at index 0 and ranging to 254) for this HMC ID 150 at index 0 and ranging to 254) for this HMC ID. The driver then does an 152 RDMA of the HMC ID to the hypervisor, and then 151 RDMA of the HMC ID to the hypervisor, and then sends an Interface Open 153 message to the hypervisor to establish the ses 152 message to the hypervisor to establish the session over the VMC. After the 154 hypervisor receives this information, it sends 153 hypervisor receives this information, it sends Add Buffer messages to the 155 management partition to seed an initial pool o 154 management partition to seed an initial pool of buffers for the new HMC 156 connection. Finally, the hypervisor sends an I 155 connection. Finally, the hypervisor sends an Interface Open Response 157 message, to indicate that it is ready for norm 156 message, to indicate that it is ready for normal runtime messaging. The 158 following illustrates this VMC flow: 157 following illustrates this VMC flow: 159 158 160 :: 159 :: 161 160 162 Management Partition Hyper 161 Management Partition Hypervisor 163 RDMA HMC ID 162 RDMA HMC ID 164 -------------------------------------- 163 ----------------------------------------> 165 Interface Open 164 Interface Open 166 -------------------------------------- 165 ----------------------------------------> 167 Add Buffer 166 Add Buffer _ 168 <------------------------------------- 167 <---------------------------------------- | 169 Add Buffer Response 168 Add Buffer Response | - Perform N Iterations 170 -------------------------------------- 169 ----------------------------------------> - 171 Interface Open Response 170 Interface Open Response 172 <------------------------------------- 171 <---------------------------------------- 173 172 174 VMC Interface Runtime 173 VMC Interface Runtime 175 --------------------- 174 --------------------- 176 175 177 During normal runtime, the management applicat 176 During normal runtime, the management application and the hypervisor 178 exchange HMC messages via the Signal VMC messa 177 exchange HMC messages via the Signal VMC message and RDMA operations. When 179 sending data to the hypervisor, the management 178 sending data to the hypervisor, the management application performs a 180 write() to the VMC device, and the driver RDMA 179 write() to the VMC device, and the driver RDMA’s the data to the hypervisor 181 and then sends a Signal Message. If a write() 180 and then sends a Signal Message. If a write() is attempted before VMC 182 device buffers have been made available by the 181 device buffers have been made available by the hypervisor, or no buffers 183 are currently available, EBUSY is returned in 182 are currently available, EBUSY is returned in response to the write(). A 184 write() will return EIO for all other errors, 183 write() will return EIO for all other errors, such as an invalid device 185 state. When the hypervisor sends a message to 184 state. When the hypervisor sends a message to the management, the data is 186 put into a VMC buffer and an Signal Message is 185 put into a VMC buffer and an Signal Message is sent to the VMC driver in 187 the management partition. The driver RDMA’s 186 the management partition. The driver RDMA’s the buffer into the partition 188 and passes the data up to the appropriate mana 187 and passes the data up to the appropriate management application via a 189 read() to the VMC device. The read() request b 188 read() to the VMC device. The read() request blocks if there is no buffer 190 available to read. The management application 189 available to read. The management application may use select() to wait for 191 the VMC device to become ready with data to re 190 the VMC device to become ready with data to read. 192 191 193 :: 192 :: 194 193 195 Management Partition Hyper 194 Management Partition Hypervisor 196 MSG RDMA 195 MSG RDMA 197 -------------------------------------- 196 ----------------------------------------> 198 SIGNAL MSG 197 SIGNAL MSG 199 -------------------------------------- 198 ----------------------------------------> 200 SIGNAL MSG 199 SIGNAL MSG 201 <------------------------------------- 200 <---------------------------------------- 202 MSG RDMA 201 MSG RDMA 203 <------------------------------------- 202 <---------------------------------------- 204 203 205 VMC Interface Close 204 VMC Interface Close 206 ------------------- 205 ------------------- 207 206 208 HMC session level connections are closed by th 207 HMC session level connections are closed by the management partition when 209 the application layer performs a close() again 208 the application layer performs a close() against the device. This action 210 results in an Interface Close message flowing 209 results in an Interface Close message flowing to the hypervisor, which 211 causes the session to be terminated. The devic 210 causes the session to be terminated. The device driver must free any 212 storage allocated for buffers for this HMC con 211 storage allocated for buffers for this HMC connection. 213 212 214 :: 213 :: 215 214 216 Management Partition Hyper 215 Management Partition Hypervisor 217 INTERFACE CLOSE 216 INTERFACE CLOSE 218 -------------------------------------- 217 ----------------------------------------> 219 INTERFACE CLOSE RESPONSE 218 INTERFACE CLOSE RESPONSE 220 <------------------------------------- 219 <---------------------------------------- 221 220 222 Additional Information 221 Additional Information 223 ====================== 222 ====================== 224 223 225 For more information on the documentation for 224 For more information on the documentation for CRQ Messages, VMC Messages, 226 HMC interface Buffers, and signal messages ple 225 HMC interface Buffers, and signal messages please refer to the Linux on 227 Power Architecture Platform Reference. Section 226 Power Architecture Platform Reference. Section F.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.