~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/virt/hyperv/vmbus.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/virt/hyperv/vmbus.rst (Architecture alpha) and /Documentation/virt/hyperv/vmbus.rst (Architecture mips)


  1 .. SPDX-License-Identifier: GPL-2.0                 1 .. SPDX-License-Identifier: GPL-2.0
  2                                                     2 
  3 VMBus                                               3 VMBus
  4 =====                                               4 =====
  5 VMBus is a software construct provided by Hype      5 VMBus is a software construct provided by Hyper-V to guest VMs.  It
  6 consists of a control path and common faciliti      6 consists of a control path and common facilities used by synthetic
  7 devices that Hyper-V presents to guest VMs.         7 devices that Hyper-V presents to guest VMs.   The control path is
  8 used to offer synthetic devices to the guest V      8 used to offer synthetic devices to the guest VM and, in some cases,
  9 to rescind those devices.   The common facilit      9 to rescind those devices.   The common facilities include software
 10 channels for communicating between the device      10 channels for communicating between the device driver in the guest VM
 11 and the synthetic device implementation that i     11 and the synthetic device implementation that is part of Hyper-V, and
 12 signaling primitives to allow Hyper-V and the      12 signaling primitives to allow Hyper-V and the guest to interrupt
 13 each other.                                        13 each other.
 14                                                    14 
 15 VMBus is modeled in Linux as a bus, with the e     15 VMBus is modeled in Linux as a bus, with the expected /sys/bus/vmbus
 16 entry in a running Linux guest.  The VMBus dri     16 entry in a running Linux guest.  The VMBus driver (drivers/hv/vmbus_drv.c)
 17 establishes the VMBus control path with the Hy     17 establishes the VMBus control path with the Hyper-V host, then
 18 registers itself as a Linux bus driver.  It im     18 registers itself as a Linux bus driver.  It implements the standard
 19 bus functions for adding and removing devices      19 bus functions for adding and removing devices to/from the bus.
 20                                                    20 
 21 Most synthetic devices offered by Hyper-V have     21 Most synthetic devices offered by Hyper-V have a corresponding Linux
 22 device driver.  These devices include:             22 device driver.  These devices include:
 23                                                    23 
 24 * SCSI controller                                  24 * SCSI controller
 25 * NIC                                              25 * NIC
 26 * Graphics frame buffer                            26 * Graphics frame buffer
 27 * Keyboard                                         27 * Keyboard
 28 * Mouse                                            28 * Mouse
 29 * PCI device pass-thru                             29 * PCI device pass-thru
 30 * Heartbeat                                        30 * Heartbeat
 31 * Time Sync                                        31 * Time Sync
 32 * Shutdown                                         32 * Shutdown
 33 * Memory balloon                                   33 * Memory balloon
 34 * Key/Value Pair (KVP) exchange with Hyper-V       34 * Key/Value Pair (KVP) exchange with Hyper-V
 35 * Hyper-V online backup (a.k.a. VSS)               35 * Hyper-V online backup (a.k.a. VSS)
 36                                                    36 
 37 Guest VMs may have multiple instances of the s     37 Guest VMs may have multiple instances of the synthetic SCSI
 38 controller, synthetic NIC, and PCI pass-thru d     38 controller, synthetic NIC, and PCI pass-thru devices.  Other
 39 synthetic devices are limited to a single inst     39 synthetic devices are limited to a single instance per VM.  Not
 40 listed above are a small number of synthetic d     40 listed above are a small number of synthetic devices offered by
 41 Hyper-V that are used only by Windows guests a     41 Hyper-V that are used only by Windows guests and for which Linux
 42 does not have a driver.                            42 does not have a driver.
 43                                                    43 
 44 Hyper-V uses the terms "VSP" and "VSC" in desc     44 Hyper-V uses the terms "VSP" and "VSC" in describing synthetic
 45 devices.  "VSP" refers to the Hyper-V code tha     45 devices.  "VSP" refers to the Hyper-V code that implements a
 46 particular synthetic device, while "VSC" refer     46 particular synthetic device, while "VSC" refers to the driver for
 47 the device in the guest VM.  For example, the      47 the device in the guest VM.  For example, the Linux driver for the
 48 synthetic NIC is referred to as "netvsc" and t     48 synthetic NIC is referred to as "netvsc" and the Linux driver for
 49 the synthetic SCSI controller is "storvsc".  T     49 the synthetic SCSI controller is "storvsc".  These drivers contain
 50 functions with names like "storvsc_connect_to_     50 functions with names like "storvsc_connect_to_vsp".
 51                                                    51 
 52 VMBus channels                                     52 VMBus channels
 53 --------------                                     53 --------------
 54 An instance of a synthetic device uses VMBus c     54 An instance of a synthetic device uses VMBus channels to communicate
 55 between the VSP and the VSC.  Channels are bi-     55 between the VSP and the VSC.  Channels are bi-directional and used
 56 for passing messages.   Most synthetic devices     56 for passing messages.   Most synthetic devices use a single channel,
 57 but the synthetic SCSI controller and syntheti     57 but the synthetic SCSI controller and synthetic NIC may use multiple
 58 channels to achieve higher performance and gre     58 channels to achieve higher performance and greater parallelism.
 59                                                    59 
 60 Each channel consists of two ring buffers.  Th     60 Each channel consists of two ring buffers.  These are classic ring
 61 buffers from a university data structures text     61 buffers from a university data structures textbook.  If the read
 62 and writes pointers are equal, the ring buffer     62 and writes pointers are equal, the ring buffer is considered to be
 63 empty, so a full ring buffer always has at lea     63 empty, so a full ring buffer always has at least one byte unused.
 64 The "in" ring buffer is for messages from the      64 The "in" ring buffer is for messages from the Hyper-V host to the
 65 guest, and the "out" ring buffer is for messag     65 guest, and the "out" ring buffer is for messages from the guest to
 66 the Hyper-V host.  In Linux, the "in" and "out     66 the Hyper-V host.  In Linux, the "in" and "out" designations are as
 67 viewed by the guest side.  The ring buffers ar     67 viewed by the guest side.  The ring buffers are memory that is
 68 shared between the guest and the host, and the     68 shared between the guest and the host, and they follow the standard
 69 paradigm where the memory is allocated by the      69 paradigm where the memory is allocated by the guest, with the list
 70 of GPAs that make up the ring buffer communica     70 of GPAs that make up the ring buffer communicated to the host.  Each
 71 ring buffer consists of a header page (4 Kbyte     71 ring buffer consists of a header page (4 Kbytes) with the read and
 72 write indices and some control flags, followed     72 write indices and some control flags, followed by the memory for the
 73 actual ring.  The size of the ring is determin     73 actual ring.  The size of the ring is determined by the VSC in the
 74 guest and is specific to each synthetic device     74 guest and is specific to each synthetic device.   The list of GPAs
 75 making up the ring is communicated to the Hype     75 making up the ring is communicated to the Hyper-V host over the
 76 VMBus control path as a GPA Descriptor List (G     76 VMBus control path as a GPA Descriptor List (GPADL).  See function
 77 vmbus_establish_gpadl().                           77 vmbus_establish_gpadl().
 78                                                    78 
 79 Each ring buffer is mapped into contiguous Lin     79 Each ring buffer is mapped into contiguous Linux kernel virtual
 80 space in three parts:  1) the 4 Kbyte header p     80 space in three parts:  1) the 4 Kbyte header page, 2) the memory
 81 that makes up the ring itself, and 3) a second     81 that makes up the ring itself, and 3) a second mapping of the memory
 82 that makes up the ring itself.  Because (2) an     82 that makes up the ring itself.  Because (2) and (3) are contiguous
 83 in kernel virtual space, the code that copies      83 in kernel virtual space, the code that copies data to and from the
 84 ring buffer need not be concerned with ring bu     84 ring buffer need not be concerned with ring buffer wrap-around.
 85 Once a copy operation has completed, the read      85 Once a copy operation has completed, the read or write index may
 86 need to be reset to point back into the first      86 need to be reset to point back into the first mapping, but the
 87 actual data copy does not need to be broken in     87 actual data copy does not need to be broken into two parts.  This
 88 approach also allows complex data structures t     88 approach also allows complex data structures to be easily accessed
 89 directly in the ring without handling wrap-aro     89 directly in the ring without handling wrap-around.
 90                                                    90 
 91 On arm64 with page sizes > 4 Kbytes, the heade     91 On arm64 with page sizes > 4 Kbytes, the header page must still be
 92 passed to Hyper-V as a 4 Kbyte area.  But the      92 passed to Hyper-V as a 4 Kbyte area.  But the memory for the actual
 93 ring must be aligned to PAGE_SIZE and have a s     93 ring must be aligned to PAGE_SIZE and have a size that is a multiple
 94 of PAGE_SIZE so that the duplicate mapping tri     94 of PAGE_SIZE so that the duplicate mapping trick can be done.  Hence
 95 a portion of the header page is unused and not     95 a portion of the header page is unused and not communicated to
 96 Hyper-V.  This case is handled by vmbus_establ     96 Hyper-V.  This case is handled by vmbus_establish_gpadl().
 97                                                    97 
 98 Hyper-V enforces a limit on the aggregate amou     98 Hyper-V enforces a limit on the aggregate amount of guest memory
 99 that can be shared with the host via GPADLs.       99 that can be shared with the host via GPADLs.  This limit ensures
100 that a rogue guest can't force the consumption    100 that a rogue guest can't force the consumption of excessive host
101 resources.  For Windows Server 2019 and later,    101 resources.  For Windows Server 2019 and later, this limit is
102 approximately 1280 Mbytes.  For versions prior    102 approximately 1280 Mbytes.  For versions prior to Windows Server
103 2019, the limit is approximately 384 Mbytes.      103 2019, the limit is approximately 384 Mbytes.
104                                                   104 
105 VMBus channel messages                            105 VMBus channel messages
106 ----------------------                            106 ----------------------
107 All messages sent in a VMBus channel have a st    107 All messages sent in a VMBus channel have a standard header that includes
108 the message length, the offset of the message     108 the message length, the offset of the message payload, some flags, and a
109 transactionID.  The portion of the message aft    109 transactionID.  The portion of the message after the header is
110 unique to each VSP/VSC pair.                      110 unique to each VSP/VSC pair.
111                                                   111 
112 Messages follow one of two patterns:              112 Messages follow one of two patterns:
113                                                   113 
114 * Unidirectional:  Either side sends a message    114 * Unidirectional:  Either side sends a message and does not
115   expect a response message                       115   expect a response message
116 * Request/response:  One side (usually the gue    116 * Request/response:  One side (usually the guest) sends a message
117   and expects a response                          117   and expects a response
118                                                   118 
119 The transactionID (a.k.a. "requestID") is for     119 The transactionID (a.k.a. "requestID") is for matching requests &
120 responses.  Some synthetic devices allow multi    120 responses.  Some synthetic devices allow multiple requests to be in-
121 flight simultaneously, so the guest specifies     121 flight simultaneously, so the guest specifies a transactionID when
122 sending a request.  Hyper-V sends back the sam    122 sending a request.  Hyper-V sends back the same transactionID in the
123 matching response.                                123 matching response.
124                                                   124 
125 Messages passed between the VSP and VSC are co    125 Messages passed between the VSP and VSC are control messages.  For
126 example, a message sent from the storvsc drive    126 example, a message sent from the storvsc driver might be "execute
127 this SCSI command".   If a message also implie    127 this SCSI command".   If a message also implies some data transfer
128 between the guest and the Hyper-V host, the ac    128 between the guest and the Hyper-V host, the actual data to be
129 transferred may be embedded with the control m    129 transferred may be embedded with the control message, or it may be
130 specified as a separate data buffer that the H    130 specified as a separate data buffer that the Hyper-V host will
131 access as a DMA operation.  The former case is    131 access as a DMA operation.  The former case is used when the size of
132 the data is small and the cost of copying the     132 the data is small and the cost of copying the data to and from the
133 ring buffer is minimal.  For example, time syn    133 ring buffer is minimal.  For example, time sync messages from the
134 Hyper-V host to the guest contain the actual t    134 Hyper-V host to the guest contain the actual time value.  When the
135 data is larger, a separate data buffer is used    135 data is larger, a separate data buffer is used.  In this case, the
136 control message contains a list of GPAs that d    136 control message contains a list of GPAs that describe the data
137 buffer.  For example, the storvsc driver uses     137 buffer.  For example, the storvsc driver uses this approach to
138 specify the data buffers to/from which disk I/    138 specify the data buffers to/from which disk I/O is done.
139                                                   139 
140 Three functions exist to send VMBus channel me    140 Three functions exist to send VMBus channel messages:
141                                                   141 
142 1. vmbus_sendpacket():  Control-only messages     142 1. vmbus_sendpacket():  Control-only messages and messages with
143    embedded data -- no GPAs                       143    embedded data -- no GPAs
144 2. vmbus_sendpacket_pagebuffer(): Message with    144 2. vmbus_sendpacket_pagebuffer(): Message with list of GPAs
145    identifying data to transfer.  An offset an    145    identifying data to transfer.  An offset and length is
146    associated with each GPA so that multiple d    146    associated with each GPA so that multiple discontinuous areas
147    of guest memory can be targeted.               147    of guest memory can be targeted.
148 3. vmbus_sendpacket_mpb_desc(): Message with l    148 3. vmbus_sendpacket_mpb_desc(): Message with list of GPAs
149    identifying data to transfer.  A single off    149    identifying data to transfer.  A single offset and length is
150    associated with a list of GPAs.  The GPAs m    150    associated with a list of GPAs.  The GPAs must describe a
151    single logical area of guest memory to be t    151    single logical area of guest memory to be targeted.
152                                                   152 
153 Historically, Linux guests have trusted Hyper-    153 Historically, Linux guests have trusted Hyper-V to send well-formed
154 and valid messages, and Linux drivers for synt    154 and valid messages, and Linux drivers for synthetic devices did not
155 fully validate messages.  With the introductio    155 fully validate messages.  With the introduction of processor
156 technologies that fully encrypt guest memory a    156 technologies that fully encrypt guest memory and that allow the
157 guest to not trust the hypervisor (AMD SEV-SNP    157 guest to not trust the hypervisor (AMD SEV-SNP, Intel TDX), trusting
158 the Hyper-V host is no longer a valid assumpti    158 the Hyper-V host is no longer a valid assumption.  The drivers for
159 VMBus synthetic devices are being updated to f    159 VMBus synthetic devices are being updated to fully validate any
160 values read from memory that is shared with Hy    160 values read from memory that is shared with Hyper-V, which includes
161 messages from VMBus devices.  To facilitate su    161 messages from VMBus devices.  To facilitate such validation,
162 messages read by the guest from the "in" ring     162 messages read by the guest from the "in" ring buffer are copied to a
163 temporary buffer that is not shared with Hyper    163 temporary buffer that is not shared with Hyper-V.  Validation is
164 performed in this temporary buffer without the    164 performed in this temporary buffer without the risk of Hyper-V
165 maliciously modifying the message after it is     165 maliciously modifying the message after it is validated but before
166 it is used.                                       166 it is used.
167                                                   167 
168 Synthetic Interrupt Controller (synic)            168 Synthetic Interrupt Controller (synic)
169 --------------------------------------            169 --------------------------------------
170 Hyper-V provides each guest CPU with a synthet    170 Hyper-V provides each guest CPU with a synthetic interrupt controller
171 that is used by VMBus for host-guest communica    171 that is used by VMBus for host-guest communication. While each synic
172 defines 16 synthetic interrupts (SINT), Linux     172 defines 16 synthetic interrupts (SINT), Linux uses only one of the 16
173 (VMBUS_MESSAGE_SINT). All interrupts related t    173 (VMBUS_MESSAGE_SINT). All interrupts related to communication between
174 the Hyper-V host and a guest CPU use that SINT    174 the Hyper-V host and a guest CPU use that SINT.
175                                                   175 
176 The SINT is mapped to a single per-CPU archite    176 The SINT is mapped to a single per-CPU architectural interrupt (i.e,
177 an 8-bit x86/x64 interrupt vector, or an arm64    177 an 8-bit x86/x64 interrupt vector, or an arm64 PPI INTID). Because
178 each CPU in the guest has a synic and may rece    178 each CPU in the guest has a synic and may receive VMBus interrupts,
179 they are best modeled in Linux as per-CPU inte    179 they are best modeled in Linux as per-CPU interrupts. This model works
180 well on arm64 where a single per-CPU Linux IRQ    180 well on arm64 where a single per-CPU Linux IRQ is allocated for
181 VMBUS_MESSAGE_SINT. This IRQ appears in /proc/    181 VMBUS_MESSAGE_SINT. This IRQ appears in /proc/interrupts as an IRQ labelled
182 "Hyper-V VMbus". Since x86/x64 lacks support f    182 "Hyper-V VMbus". Since x86/x64 lacks support for per-CPU IRQs, an x86
183 interrupt vector is statically allocated (HYPE    183 interrupt vector is statically allocated (HYPERVISOR_CALLBACK_VECTOR)
184 across all CPUs and explicitly coded to call v    184 across all CPUs and explicitly coded to call vmbus_isr(). In this case,
185 there's no Linux IRQ, and the interrupts are v    185 there's no Linux IRQ, and the interrupts are visible in aggregate in
186 /proc/interrupts on the "HYP" line.               186 /proc/interrupts on the "HYP" line.
187                                                   187 
188 The synic provides the means to demultiplex th    188 The synic provides the means to demultiplex the architectural interrupt into
189 one or more logical interrupts and route the l    189 one or more logical interrupts and route the logical interrupt to the proper
190 VMBus handler in Linux. This demultiplexing is    190 VMBus handler in Linux. This demultiplexing is done by vmbus_isr() and
191 related functions that access synic data struc    191 related functions that access synic data structures.
192                                                   192 
193 The synic is not modeled in Linux as an irq ch    193 The synic is not modeled in Linux as an irq chip or irq domain,
194 and the demultiplexed logical interrupts are n    194 and the demultiplexed logical interrupts are not Linux IRQs. As such,
195 they don't appear in /proc/interrupts or /proc    195 they don't appear in /proc/interrupts or /proc/irq. The CPU
196 affinity for one of these logical interrupts i    196 affinity for one of these logical interrupts is controlled via an
197 entry under /sys/bus/vmbus as described below.    197 entry under /sys/bus/vmbus as described below.
198                                                   198 
199 VMBus interrupts                                  199 VMBus interrupts
200 ----------------                                  200 ----------------
201 VMBus provides a mechanism for the guest to in    201 VMBus provides a mechanism for the guest to interrupt the host when
202 the guest has queued new messages in a ring bu    202 the guest has queued new messages in a ring buffer.  The host
203 expects that the guest will send an interrupt     203 expects that the guest will send an interrupt only when an "out"
204 ring buffer transitions from empty to non-empt    204 ring buffer transitions from empty to non-empty.  If the guest sends
205 interrupts at other times, the host deems such    205 interrupts at other times, the host deems such interrupts to be
206 unnecessary.  If a guest sends an excessive nu    206 unnecessary.  If a guest sends an excessive number of unnecessary
207 interrupts, the host may throttle that guest b    207 interrupts, the host may throttle that guest by suspending its
208 execution for a few seconds to prevent a denia    208 execution for a few seconds to prevent a denial-of-service attack.
209                                                   209 
210 Similarly, the host will interrupt the guest v    210 Similarly, the host will interrupt the guest via the synic when
211 it sends a new message on the VMBus control pa    211 it sends a new message on the VMBus control path, or when a VMBus
212 channel "in" ring buffer transitions from empt    212 channel "in" ring buffer transitions from empty to non-empty due to
213 the host inserting a new VMBus channel message    213 the host inserting a new VMBus channel message. The control message stream
214 and each VMBus channel "in" ring buffer are se    214 and each VMBus channel "in" ring buffer are separate logical interrupts
215 that are demultiplexed by vmbus_isr(). It demu    215 that are demultiplexed by vmbus_isr(). It demultiplexes by first checking
216 for channel interrupts by calling vmbus_chan_s    216 for channel interrupts by calling vmbus_chan_sched(), which looks at a synic
217 bitmap to determine which channels have pendin    217 bitmap to determine which channels have pending interrupts on this CPU.
218 If multiple channels have pending interrupts f    218 If multiple channels have pending interrupts for this CPU, they are
219 processed sequentially.  When all channel inte    219 processed sequentially.  When all channel interrupts have been processed,
220 vmbus_isr() checks for and processes any messa    220 vmbus_isr() checks for and processes any messages received on the VMBus
221 control path.                                     221 control path.
222                                                   222 
223 The guest CPU that a VMBus channel will interr    223 The guest CPU that a VMBus channel will interrupt is selected by the
224 guest when the channel is created, and the hos    224 guest when the channel is created, and the host is informed of that
225 selection.  VMBus devices are broadly grouped     225 selection.  VMBus devices are broadly grouped into two categories:
226                                                   226 
227 1. "Slow" devices that need only one VMBus cha    227 1. "Slow" devices that need only one VMBus channel.  The devices
228    (such as keyboard, mouse, heartbeat, and ti    228    (such as keyboard, mouse, heartbeat, and timesync) generate
229    relatively few interrupts.  Their VMBus cha    229    relatively few interrupts.  Their VMBus channels are all
230    assigned to interrupt the VMBUS_CONNECT_CPU    230    assigned to interrupt the VMBUS_CONNECT_CPU, which is always
231    CPU 0.                                         231    CPU 0.
232                                                   232 
233 2. "High speed" devices that may use multiple     233 2. "High speed" devices that may use multiple VMBus channels for
234    higher parallelism and performance.  These     234    higher parallelism and performance.  These devices include the
235    synthetic SCSI controller and synthetic NIC    235    synthetic SCSI controller and synthetic NIC.  Their VMBus
236    channels interrupts are assigned to CPUs th    236    channels interrupts are assigned to CPUs that are spread out
237    among the available CPUs in the VM so that     237    among the available CPUs in the VM so that interrupts on
238    multiple channels can be processed in paral    238    multiple channels can be processed in parallel.
239                                                   239 
240 The assignment of VMBus channel interrupts to     240 The assignment of VMBus channel interrupts to CPUs is done in the
241 function init_vp_index().  This assignment is     241 function init_vp_index().  This assignment is done outside of the
242 normal Linux interrupt affinity mechanism, so     242 normal Linux interrupt affinity mechanism, so the interrupts are
243 neither "unmanaged" nor "managed" interrupts.     243 neither "unmanaged" nor "managed" interrupts.
244                                                   244 
245 The CPU that a VMBus channel will interrupt ca    245 The CPU that a VMBus channel will interrupt can be seen in
246 /sys/bus/vmbus/devices/<deviceGUID>/ channels/    246 /sys/bus/vmbus/devices/<deviceGUID>/ channels/<channelRelID>/cpu.
247 When running on later versions of Hyper-V, the    247 When running on later versions of Hyper-V, the CPU can be changed
248 by writing a new value to this sysfs entry. Be    248 by writing a new value to this sysfs entry. Because VMBus channel
249 interrupts are not Linux IRQs, there are no en    249 interrupts are not Linux IRQs, there are no entries in /proc/interrupts
250 or /proc/irq corresponding to individual VMBus    250 or /proc/irq corresponding to individual VMBus channel interrupts.
251                                                   251 
252 An online CPU in a Linux guest may not be take    252 An online CPU in a Linux guest may not be taken offline if it has
253 VMBus channel interrupts assigned to it.  Any     253 VMBus channel interrupts assigned to it.  Any such channel
254 interrupts must first be manually reassigned t    254 interrupts must first be manually reassigned to another CPU as
255 described above.  When no channel interrupts a    255 described above.  When no channel interrupts are assigned to the
256 CPU, it can be taken offline.                     256 CPU, it can be taken offline.
257                                                   257 
258 The VMBus channel interrupt handling code is d    258 The VMBus channel interrupt handling code is designed to work
259 correctly even if an interrupt is received on     259 correctly even if an interrupt is received on a CPU other than the
260 CPU assigned to the channel.  Specifically, th    260 CPU assigned to the channel.  Specifically, the code does not use
261 CPU-based exclusion for correctness.  In norma    261 CPU-based exclusion for correctness.  In normal operation, Hyper-V
262 will interrupt the assigned CPU.  But when the    262 will interrupt the assigned CPU.  But when the CPU assigned to a
263 channel is being changed via sysfs, the guest     263 channel is being changed via sysfs, the guest doesn't know exactly
264 when Hyper-V will make the transition.  The co    264 when Hyper-V will make the transition.  The code must work correctly
265 even if there is a time lag before Hyper-V sta    265 even if there is a time lag before Hyper-V starts interrupting the
266 new CPU.  See comments in target_cpu_store().     266 new CPU.  See comments in target_cpu_store().
267                                                   267 
268 VMBus device creation/deletion                    268 VMBus device creation/deletion
269 ------------------------------                    269 ------------------------------
270 Hyper-V and the Linux guest have a separate me    270 Hyper-V and the Linux guest have a separate message-passing path
271 that is used for synthetic device creation and    271 that is used for synthetic device creation and deletion. This
272 path does not use a VMBus channel.  See vmbus_    272 path does not use a VMBus channel.  See vmbus_post_msg() and
273 vmbus_on_msg_dpc().                               273 vmbus_on_msg_dpc().
274                                                   274 
275 The first step is for the guest to connect to     275 The first step is for the guest to connect to the generic
276 Hyper-V VMBus mechanism.  As part of establish    276 Hyper-V VMBus mechanism.  As part of establishing this connection,
277 the guest and Hyper-V agree on a VMBus protoco    277 the guest and Hyper-V agree on a VMBus protocol version they will
278 use.  This negotiation allows newer Linux kern    278 use.  This negotiation allows newer Linux kernels to run on older
279 Hyper-V versions, and vice versa.                 279 Hyper-V versions, and vice versa.
280                                                   280 
281 The guest then tells Hyper-V to "send offers".    281 The guest then tells Hyper-V to "send offers".  Hyper-V sends an
282 offer message to the guest for each synthetic     282 offer message to the guest for each synthetic device that the VM
283 is configured to have. Each VMBus device type     283 is configured to have. Each VMBus device type has a fixed GUID
284 known as the "class ID", and each VMBus device    284 known as the "class ID", and each VMBus device instance is also
285 identified by a GUID. The offer message from H    285 identified by a GUID. The offer message from Hyper-V contains
286 both GUIDs to uniquely (within the VM) identif    286 both GUIDs to uniquely (within the VM) identify the device.
287 There is one offer message for each device ins    287 There is one offer message for each device instance, so a VM with
288 two synthetic NICs will get two offers message    288 two synthetic NICs will get two offers messages with the NIC
289 class ID. The ordering of offer messages can v    289 class ID. The ordering of offer messages can vary from boot-to-boot
290 and must not be assumed to be consistent in Li    290 and must not be assumed to be consistent in Linux code. Offer
291 messages may also arrive long after Linux has     291 messages may also arrive long after Linux has initially booted
292 because Hyper-V supports adding devices, such     292 because Hyper-V supports adding devices, such as synthetic NICs,
293 to running VMs. A new offer message is process    293 to running VMs. A new offer message is processed by
294 vmbus_process_offer(), which indirectly invoke    294 vmbus_process_offer(), which indirectly invokes vmbus_add_channel_work().
295                                                   295 
296 Upon receipt of an offer message, the guest id    296 Upon receipt of an offer message, the guest identifies the device
297 type based on the class ID, and invokes the co    297 type based on the class ID, and invokes the correct driver to set up
298 the device.  Driver/device matching is perform    298 the device.  Driver/device matching is performed using the standard
299 Linux mechanism.                                  299 Linux mechanism.
300                                                   300 
301 The device driver probe function opens the pri    301 The device driver probe function opens the primary VMBus channel to
302 the corresponding VSP. It allocates guest memo    302 the corresponding VSP. It allocates guest memory for the channel
303 ring buffers and shares the ring buffer with t    303 ring buffers and shares the ring buffer with the Hyper-V host by
304 giving the host a list of GPAs for the ring bu    304 giving the host a list of GPAs for the ring buffer memory.  See
305 vmbus_establish_gpadl().                          305 vmbus_establish_gpadl().
306                                                   306 
307 Once the ring buffer is set up, the device dri    307 Once the ring buffer is set up, the device driver and VSP exchange
308 setup messages via the primary channel.  These    308 setup messages via the primary channel.  These messages may include
309 negotiating the device protocol version to be     309 negotiating the device protocol version to be used between the Linux
310 VSC and the VSP on the Hyper-V host.  The setu    310 VSC and the VSP on the Hyper-V host.  The setup messages may also
311 include creating additional VMBus channels, wh    311 include creating additional VMBus channels, which are somewhat
312 mis-named as "sub-channels" since they are fun    312 mis-named as "sub-channels" since they are functionally
313 equivalent to the primary channel once they ar    313 equivalent to the primary channel once they are created.
314                                                   314 
315 Finally, the device driver may create entries     315 Finally, the device driver may create entries in /dev as with
316 any device driver.                                316 any device driver.
317                                                   317 
318 The Hyper-V host can send a "rescind" message     318 The Hyper-V host can send a "rescind" message to the guest to
319 remove a device that was previously offered. L    319 remove a device that was previously offered. Linux drivers must
320 handle such a rescind message at any time. Res    320 handle such a rescind message at any time. Rescinding a device
321 invokes the device driver "remove" function to    321 invokes the device driver "remove" function to cleanly shut
322 down the device and remove it. Once a syntheti    322 down the device and remove it. Once a synthetic device is
323 rescinded, neither Hyper-V nor Linux retains a    323 rescinded, neither Hyper-V nor Linux retains any state about
324 its previous existence. Such a device might be    324 its previous existence. Such a device might be re-added later,
325 in which case it is treated as an entirely new    325 in which case it is treated as an entirely new device. See
326 vmbus_onoffer_rescind().                          326 vmbus_onoffer_rescind().
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php