1 .. SPDX-License-Identifier: GPL-2.0 2 3 VMBus 4 ===== 5 VMBus is a software construct provided by Hype 6 consists of a control path and common faciliti 7 devices that Hyper-V presents to guest VMs. 8 used to offer synthetic devices to the guest V 9 to rescind those devices. The common facilit 10 channels for communicating between the device 11 and the synthetic device implementation that i 12 signaling primitives to allow Hyper-V and the 13 each other. 14 15 VMBus is modeled in Linux as a bus, with the e 16 entry in a running Linux guest. The VMBus dri 17 establishes the VMBus control path with the Hy 18 registers itself as a Linux bus driver. It im 19 bus functions for adding and removing devices 20 21 Most synthetic devices offered by Hyper-V have 22 device driver. These devices include: 23 24 * SCSI controller 25 * NIC 26 * Graphics frame buffer 27 * Keyboard 28 * Mouse 29 * PCI device pass-thru 30 * Heartbeat 31 * Time Sync 32 * Shutdown 33 * Memory balloon 34 * Key/Value Pair (KVP) exchange with Hyper-V 35 * Hyper-V online backup (a.k.a. VSS) 36 37 Guest VMs may have multiple instances of the s 38 controller, synthetic NIC, and PCI pass-thru d 39 synthetic devices are limited to a single inst 40 listed above are a small number of synthetic d 41 Hyper-V that are used only by Windows guests a 42 does not have a driver. 43 44 Hyper-V uses the terms "VSP" and "VSC" in desc 45 devices. "VSP" refers to the Hyper-V code tha 46 particular synthetic device, while "VSC" refer 47 the device in the guest VM. For example, the 48 synthetic NIC is referred to as "netvsc" and t 49 the synthetic SCSI controller is "storvsc". T 50 functions with names like "storvsc_connect_to_ 51 52 VMBus channels 53 -------------- 54 An instance of a synthetic device uses VMBus c 55 between the VSP and the VSC. Channels are bi- 56 for passing messages. Most synthetic devices 57 but the synthetic SCSI controller and syntheti 58 channels to achieve higher performance and gre 59 60 Each channel consists of two ring buffers. Th 61 buffers from a university data structures text 62 and writes pointers are equal, the ring buffer 63 empty, so a full ring buffer always has at lea 64 The "in" ring buffer is for messages from the 65 guest, and the "out" ring buffer is for messag 66 the Hyper-V host. In Linux, the "in" and "out 67 viewed by the guest side. The ring buffers ar 68 shared between the guest and the host, and the 69 paradigm where the memory is allocated by the 70 of GPAs that make up the ring buffer communica 71 ring buffer consists of a header page (4 Kbyte 72 write indices and some control flags, followed 73 actual ring. The size of the ring is determin 74 guest and is specific to each synthetic device 75 making up the ring is communicated to the Hype 76 VMBus control path as a GPA Descriptor List (G 77 vmbus_establish_gpadl(). 78 79 Each ring buffer is mapped into contiguous Lin 80 space in three parts: 1) the 4 Kbyte header p 81 that makes up the ring itself, and 3) a second 82 that makes up the ring itself. Because (2) an 83 in kernel virtual space, the code that copies 84 ring buffer need not be concerned with ring bu 85 Once a copy operation has completed, the read 86 need to be reset to point back into the first 87 actual data copy does not need to be broken in 88 approach also allows complex data structures t 89 directly in the ring without handling wrap-aro 90 91 On arm64 with page sizes > 4 Kbytes, the heade 92 passed to Hyper-V as a 4 Kbyte area. But the 93 ring must be aligned to PAGE_SIZE and have a s 94 of PAGE_SIZE so that the duplicate mapping tri 95 a portion of the header page is unused and not 96 Hyper-V. This case is handled by vmbus_establ 97 98 Hyper-V enforces a limit on the aggregate amou 99 that can be shared with the host via GPADLs. 100 that a rogue guest can't force the consumption 101 resources. For Windows Server 2019 and later, 102 approximately 1280 Mbytes. For versions prior 103 2019, the limit is approximately 384 Mbytes. 104 105 VMBus channel messages 106 ---------------------- 107 All messages sent in a VMBus channel have a st 108 the message length, the offset of the message 109 transactionID. The portion of the message aft 110 unique to each VSP/VSC pair. 111 112 Messages follow one of two patterns: 113 114 * Unidirectional: Either side sends a message 115 expect a response message 116 * Request/response: One side (usually the gue 117 and expects a response 118 119 The transactionID (a.k.a. "requestID") is for 120 responses. Some synthetic devices allow multi 121 flight simultaneously, so the guest specifies 122 sending a request. Hyper-V sends back the sam 123 matching response. 124 125 Messages passed between the VSP and VSC are co 126 example, a message sent from the storvsc drive 127 this SCSI command". If a message also implie 128 between the guest and the Hyper-V host, the ac 129 transferred may be embedded with the control m 130 specified as a separate data buffer that the H 131 access as a DMA operation. The former case is 132 the data is small and the cost of copying the 133 ring buffer is minimal. For example, time syn 134 Hyper-V host to the guest contain the actual t 135 data is larger, a separate data buffer is used 136 control message contains a list of GPAs that d 137 buffer. For example, the storvsc driver uses 138 specify the data buffers to/from which disk I/ 139 140 Three functions exist to send VMBus channel me 141 142 1. vmbus_sendpacket(): Control-only messages 143 embedded data -- no GPAs 144 2. vmbus_sendpacket_pagebuffer(): Message with 145 identifying data to transfer. An offset an 146 associated with each GPA so that multiple d 147 of guest memory can be targeted. 148 3. vmbus_sendpacket_mpb_desc(): Message with l 149 identifying data to transfer. A single off 150 associated with a list of GPAs. The GPAs m 151 single logical area of guest memory to be t 152 153 Historically, Linux guests have trusted Hyper- 154 and valid messages, and Linux drivers for synt 155 fully validate messages. With the introductio 156 technologies that fully encrypt guest memory a 157 guest to not trust the hypervisor (AMD SEV-SNP 158 the Hyper-V host is no longer a valid assumpti 159 VMBus synthetic devices are being updated to f 160 values read from memory that is shared with Hy 161 messages from VMBus devices. To facilitate su 162 messages read by the guest from the "in" ring 163 temporary buffer that is not shared with Hyper 164 performed in this temporary buffer without the 165 maliciously modifying the message after it is 166 it is used. 167 168 Synthetic Interrupt Controller (synic) 169 -------------------------------------- 170 Hyper-V provides each guest CPU with a synthet 171 that is used by VMBus for host-guest communica 172 defines 16 synthetic interrupts (SINT), Linux 173 (VMBUS_MESSAGE_SINT). All interrupts related t 174 the Hyper-V host and a guest CPU use that SINT 175 176 The SINT is mapped to a single per-CPU archite 177 an 8-bit x86/x64 interrupt vector, or an arm64 178 each CPU in the guest has a synic and may rece 179 they are best modeled in Linux as per-CPU inte 180 well on arm64 where a single per-CPU Linux IRQ 181 VMBUS_MESSAGE_SINT. This IRQ appears in /proc/ 182 "Hyper-V VMbus". Since x86/x64 lacks support f 183 interrupt vector is statically allocated (HYPE 184 across all CPUs and explicitly coded to call v 185 there's no Linux IRQ, and the interrupts are v 186 /proc/interrupts on the "HYP" line. 187 188 The synic provides the means to demultiplex th 189 one or more logical interrupts and route the l 190 VMBus handler in Linux. This demultiplexing is 191 related functions that access synic data struc 192 193 The synic is not modeled in Linux as an irq ch 194 and the demultiplexed logical interrupts are n 195 they don't appear in /proc/interrupts or /proc 196 affinity for one of these logical interrupts i 197 entry under /sys/bus/vmbus as described below. 198 199 VMBus interrupts 200 ---------------- 201 VMBus provides a mechanism for the guest to in 202 the guest has queued new messages in a ring bu 203 expects that the guest will send an interrupt 204 ring buffer transitions from empty to non-empt 205 interrupts at other times, the host deems such 206 unnecessary. If a guest sends an excessive nu 207 interrupts, the host may throttle that guest b 208 execution for a few seconds to prevent a denia 209 210 Similarly, the host will interrupt the guest v 211 it sends a new message on the VMBus control pa 212 channel "in" ring buffer transitions from empt 213 the host inserting a new VMBus channel message 214 and each VMBus channel "in" ring buffer are se 215 that are demultiplexed by vmbus_isr(). It demu 216 for channel interrupts by calling vmbus_chan_s 217 bitmap to determine which channels have pendin 218 If multiple channels have pending interrupts f 219 processed sequentially. When all channel inte 220 vmbus_isr() checks for and processes any messa 221 control path. 222 223 The guest CPU that a VMBus channel will interr 224 guest when the channel is created, and the hos 225 selection. VMBus devices are broadly grouped 226 227 1. "Slow" devices that need only one VMBus cha 228 (such as keyboard, mouse, heartbeat, and ti 229 relatively few interrupts. Their VMBus cha 230 assigned to interrupt the VMBUS_CONNECT_CPU 231 CPU 0. 232 233 2. "High speed" devices that may use multiple 234 higher parallelism and performance. These 235 synthetic SCSI controller and synthetic NIC 236 channels interrupts are assigned to CPUs th 237 among the available CPUs in the VM so that 238 multiple channels can be processed in paral 239 240 The assignment of VMBus channel interrupts to 241 function init_vp_index(). This assignment is 242 normal Linux interrupt affinity mechanism, so 243 neither "unmanaged" nor "managed" interrupts. 244 245 The CPU that a VMBus channel will interrupt ca 246 /sys/bus/vmbus/devices/<deviceGUID>/ channels/ 247 When running on later versions of Hyper-V, the 248 by writing a new value to this sysfs entry. Be 249 interrupts are not Linux IRQs, there are no en 250 or /proc/irq corresponding to individual VMBus 251 252 An online CPU in a Linux guest may not be take 253 VMBus channel interrupts assigned to it. Any 254 interrupts must first be manually reassigned t 255 described above. When no channel interrupts a 256 CPU, it can be taken offline. 257 258 The VMBus channel interrupt handling code is d 259 correctly even if an interrupt is received on 260 CPU assigned to the channel. Specifically, th 261 CPU-based exclusion for correctness. In norma 262 will interrupt the assigned CPU. But when the 263 channel is being changed via sysfs, the guest 264 when Hyper-V will make the transition. The co 265 even if there is a time lag before Hyper-V sta 266 new CPU. See comments in target_cpu_store(). 267 268 VMBus device creation/deletion 269 ------------------------------ 270 Hyper-V and the Linux guest have a separate me 271 that is used for synthetic device creation and 272 path does not use a VMBus channel. See vmbus_ 273 vmbus_on_msg_dpc(). 274 275 The first step is for the guest to connect to 276 Hyper-V VMBus mechanism. As part of establish 277 the guest and Hyper-V agree on a VMBus protoco 278 use. This negotiation allows newer Linux kern 279 Hyper-V versions, and vice versa. 280 281 The guest then tells Hyper-V to "send offers". 282 offer message to the guest for each synthetic 283 is configured to have. Each VMBus device type 284 known as the "class ID", and each VMBus device 285 identified by a GUID. The offer message from H 286 both GUIDs to uniquely (within the VM) identif 287 There is one offer message for each device ins 288 two synthetic NICs will get two offers message 289 class ID. The ordering of offer messages can v 290 and must not be assumed to be consistent in Li 291 messages may also arrive long after Linux has 292 because Hyper-V supports adding devices, such 293 to running VMs. A new offer message is process 294 vmbus_process_offer(), which indirectly invoke 295 296 Upon receipt of an offer message, the guest id 297 type based on the class ID, and invokes the co 298 the device. Driver/device matching is perform 299 Linux mechanism. 300 301 The device driver probe function opens the pri 302 the corresponding VSP. It allocates guest memo 303 ring buffers and shares the ring buffer with t 304 giving the host a list of GPAs for the ring bu 305 vmbus_establish_gpadl(). 306 307 Once the ring buffer is set up, the device dri 308 setup messages via the primary channel. These 309 negotiating the device protocol version to be 310 VSC and the VSP on the Hyper-V host. The setu 311 include creating additional VMBus channels, wh 312 mis-named as "sub-channels" since they are fun 313 equivalent to the primary channel once they ar 314 315 Finally, the device driver may create entries 316 any device driver. 317 318 The Hyper-V host can send a "rescind" message 319 remove a device that was previously offered. L 320 handle such a rescind message at any time. Res 321 invokes the device driver "remove" function to 322 down the device and remove it. Once a syntheti 323 rescinded, neither Hyper-V nor Linux retains a 324 its previous existence. Such a device might be 325 in which case it is treated as an entirely new 326 vmbus_onoffer_rescind().
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.