~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/virt/hyperv/vpci.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/virt/hyperv/vpci.rst (Architecture m68k) and /Documentation/virt/hyperv/vpci.rst (Architecture sparc)


  1 .. SPDX-License-Identifier: GPL-2.0                 1 .. SPDX-License-Identifier: GPL-2.0
  2                                                     2 
  3 PCI pass-thru devices                               3 PCI pass-thru devices
  4 =========================                           4 =========================
  5 In a Hyper-V guest VM, PCI pass-thru devices (      5 In a Hyper-V guest VM, PCI pass-thru devices (also called
  6 virtual PCI devices, or vPCI devices) are phys      6 virtual PCI devices, or vPCI devices) are physical PCI devices
  7 that are mapped directly into the VM's physica      7 that are mapped directly into the VM's physical address space.
  8 Guest device drivers can interact directly wit      8 Guest device drivers can interact directly with the hardware
  9 without intermediation by the host hypervisor.      9 without intermediation by the host hypervisor.  This approach
 10 provides higher bandwidth access to the device     10 provides higher bandwidth access to the device with lower
 11 latency, compared with devices that are virtua     11 latency, compared with devices that are virtualized by the
 12 hypervisor.  The device should appear to the g     12 hypervisor.  The device should appear to the guest just as it
 13 would when running on bare metal, so no change     13 would when running on bare metal, so no changes are required
 14 to the Linux device drivers for the device.        14 to the Linux device drivers for the device.
 15                                                    15 
 16 Hyper-V terminology for vPCI devices is "Discr     16 Hyper-V terminology for vPCI devices is "Discrete Device
 17 Assignment" (DDA).  Public documentation for H     17 Assignment" (DDA).  Public documentation for Hyper-V DDA is
 18 available here: `DDA`_                             18 available here: `DDA`_
 19                                                    19 
 20 .. _DDA: https://learn.microsoft.com/en-us/win     20 .. _DDA: https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/plan/plan-for-deploying-devices-using-discrete-device-assignment
 21                                                    21 
 22 DDA is typically used for storage controllers,     22 DDA is typically used for storage controllers, such as NVMe,
 23 and for GPUs.  A similar mechanism for NICs is     23 and for GPUs.  A similar mechanism for NICs is called SR-IOV
 24 and produces the same benefits by allowing a g     24 and produces the same benefits by allowing a guest device
 25 driver to interact directly with the hardware.     25 driver to interact directly with the hardware.  See Hyper-V
 26 public documentation here: `SR-IOV`_               26 public documentation here: `SR-IOV`_
 27                                                    27 
 28 .. _SR-IOV: https://learn.microsoft.com/en-us/     28 .. _SR-IOV: https://learn.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-single-root-i-o-virtualization--sr-iov-
 29                                                    29 
 30 This discussion of vPCI devices includes DDA a     30 This discussion of vPCI devices includes DDA and SR-IOV
 31 devices.                                           31 devices.
 32                                                    32 
 33 Device Presentation                                33 Device Presentation
 34 -------------------                                34 -------------------
 35 Hyper-V provides full PCI functionality for a      35 Hyper-V provides full PCI functionality for a vPCI device when
 36 it is operating, so the Linux device driver fo     36 it is operating, so the Linux device driver for the device can
 37 be used unchanged, provided it uses the correc     37 be used unchanged, provided it uses the correct Linux kernel
 38 APIs for accessing PCI config space and for ot     38 APIs for accessing PCI config space and for other integration
 39 with Linux.  But the initial detection of the      39 with Linux.  But the initial detection of the PCI device and
 40 its integration with the Linux PCI subsystem m     40 its integration with the Linux PCI subsystem must use Hyper-V
 41 specific mechanisms.  Consequently, vPCI devic     41 specific mechanisms.  Consequently, vPCI devices on Hyper-V
 42 have a dual identity.  They are initially pres     42 have a dual identity.  They are initially presented to Linux
 43 guests as VMBus devices via the standard VMBus     43 guests as VMBus devices via the standard VMBus "offer"
 44 mechanism, so they have a VMBus identity and a     44 mechanism, so they have a VMBus identity and appear under
 45 /sys/bus/vmbus/devices.  The VMBus vPCI driver     45 /sys/bus/vmbus/devices.  The VMBus vPCI driver in Linux at
 46 drivers/pci/controller/pci-hyperv.c handles a      46 drivers/pci/controller/pci-hyperv.c handles a newly introduced
 47 vPCI device by fabricating a PCI bus topology      47 vPCI device by fabricating a PCI bus topology and creating all
 48 the normal PCI device data structures in Linux     48 the normal PCI device data structures in Linux that would
 49 exist if the PCI device were discovered via AC     49 exist if the PCI device were discovered via ACPI on a bare-
 50 metal system.  Once those data structures are      50 metal system.  Once those data structures are set up, the
 51 device also has a normal PCI identity in Linux     51 device also has a normal PCI identity in Linux, and the normal
 52 Linux device driver for the vPCI device can fu     52 Linux device driver for the vPCI device can function as if it
 53 were running in Linux on bare-metal.  Because      53 were running in Linux on bare-metal.  Because vPCI devices are
 54 presented dynamically through the VMBus offer      54 presented dynamically through the VMBus offer mechanism, they
 55 do not appear in the Linux guest's ACPI tables     55 do not appear in the Linux guest's ACPI tables.  vPCI devices
 56 may be added to a VM or removed from a VM at a     56 may be added to a VM or removed from a VM at any time during
 57 the life of the VM, and not just during initia     57 the life of the VM, and not just during initial boot.
 58                                                    58 
 59 With this approach, the vPCI device is a VMBus     59 With this approach, the vPCI device is a VMBus device and a
 60 PCI device at the same time.  In response to t     60 PCI device at the same time.  In response to the VMBus offer
 61 message, the hv_pci_probe() function runs and      61 message, the hv_pci_probe() function runs and establishes a
 62 VMBus connection to the vPCI VSP on the Hyper-     62 VMBus connection to the vPCI VSP on the Hyper-V host.  That
 63 connection has a single VMBus channel.  The ch     63 connection has a single VMBus channel.  The channel is used to
 64 exchange messages with the vPCI VSP for the pu     64 exchange messages with the vPCI VSP for the purpose of setting
 65 up and configuring the vPCI device in Linux.       65 up and configuring the vPCI device in Linux.  Once the device
 66 is fully configured in Linux as a PCI device,      66 is fully configured in Linux as a PCI device, the VMBus
 67 channel is used only if Linux changes the vCPU     67 channel is used only if Linux changes the vCPU to be interrupted
 68 in the guest, or if the vPCI device is removed     68 in the guest, or if the vPCI device is removed from
 69 the VM while the VM is running.  The ongoing o     69 the VM while the VM is running.  The ongoing operation of the
 70 device happens directly between the Linux devi     70 device happens directly between the Linux device driver for
 71 the device and the hardware, with VMBus and th     71 the device and the hardware, with VMBus and the VMBus channel
 72 playing no role.                                   72 playing no role.
 73                                                    73 
 74 PCI Device Setup                                   74 PCI Device Setup
 75 ----------------                                   75 ----------------
 76 PCI device setup follows a sequence that Hyper     76 PCI device setup follows a sequence that Hyper-V originally
 77 created for Windows guests, and that can be il     77 created for Windows guests, and that can be ill-suited for
 78 Linux guests due to differences in the overall     78 Linux guests due to differences in the overall structure of
 79 the Linux PCI subsystem compared with Windows.     79 the Linux PCI subsystem compared with Windows.  Nonetheless,
 80 with a bit of hackery in the Hyper-V virtual P     80 with a bit of hackery in the Hyper-V virtual PCI driver for
 81 Linux, the virtual PCI device is setup in Linu     81 Linux, the virtual PCI device is setup in Linux so that
 82 generic Linux PCI subsystem code and the Linux     82 generic Linux PCI subsystem code and the Linux driver for the
 83 device "just work".                                83 device "just work".
 84                                                    84 
 85 Each vPCI device is set up in Linux to be in i     85 Each vPCI device is set up in Linux to be in its own PCI
 86 domain with a host bridge.  The PCI domainID i     86 domain with a host bridge.  The PCI domainID is derived from
 87 bytes 4 and 5 of the instance GUID assigned to     87 bytes 4 and 5 of the instance GUID assigned to the VMBus vPCI
 88 device.  The Hyper-V host does not guarantee t     88 device.  The Hyper-V host does not guarantee that these bytes
 89 are unique, so hv_pci_probe() has an algorithm     89 are unique, so hv_pci_probe() has an algorithm to resolve
 90 collisions.  The collision resolution is inten     90 collisions.  The collision resolution is intended to be stable
 91 across reboots of the same VM so that the PCI      91 across reboots of the same VM so that the PCI domainIDs don't
 92 change, as the domainID appears in the user sp     92 change, as the domainID appears in the user space
 93 configuration of some devices.                     93 configuration of some devices.
 94                                                    94 
 95 hv_pci_probe() allocates a guest MMIO range to     95 hv_pci_probe() allocates a guest MMIO range to be used as PCI
 96 config space for the device.  This MMIO range      96 config space for the device.  This MMIO range is communicated
 97 to the Hyper-V host over the VMBus channel as      97 to the Hyper-V host over the VMBus channel as part of telling
 98 the host that the device is ready to enter d0.     98 the host that the device is ready to enter d0.  See
 99 hv_pci_enter_d0().  When the guest subsequentl     99 hv_pci_enter_d0().  When the guest subsequently accesses this
100 MMIO range, the Hyper-V host intercepts the ac    100 MMIO range, the Hyper-V host intercepts the accesses and maps
101 them to the physical device PCI config space.     101 them to the physical device PCI config space.
102                                                   102 
103 hv_pci_probe() also gets BAR information for t    103 hv_pci_probe() also gets BAR information for the device from
104 the Hyper-V host, and uses this information to    104 the Hyper-V host, and uses this information to allocate MMIO
105 space for the BARs.  That MMIO space is then s    105 space for the BARs.  That MMIO space is then setup to be
106 associated with the host bridge so that it wor    106 associated with the host bridge so that it works when generic
107 PCI subsystem code in Linux processes the BARs    107 PCI subsystem code in Linux processes the BARs.
108                                                   108 
109 Finally, hv_pci_probe() creates the root PCI b    109 Finally, hv_pci_probe() creates the root PCI bus.  At this
110 point the Hyper-V virtual PCI driver hackery i    110 point the Hyper-V virtual PCI driver hackery is done, and the
111 normal Linux PCI machinery for scanning the ro    111 normal Linux PCI machinery for scanning the root bus works to
112 detect the device, to perform driver matching,    112 detect the device, to perform driver matching, and to
113 initialize the driver and device.                 113 initialize the driver and device.
114                                                   114 
115 PCI Device Removal                                115 PCI Device Removal
116 ------------------                                116 ------------------
117 A Hyper-V host may initiate removal of a vPCI     117 A Hyper-V host may initiate removal of a vPCI device from a
118 guest VM at any time during the life of the VM    118 guest VM at any time during the life of the VM.  The removal
119 is instigated by an admin action taken on the     119 is instigated by an admin action taken on the Hyper-V host and
120 is not under the control of the guest OS.         120 is not under the control of the guest OS.
121                                                   121 
122 A guest VM is notified of the removal by an un    122 A guest VM is notified of the removal by an unsolicited
123 "Eject" message sent from the host to the gues    123 "Eject" message sent from the host to the guest over the VMBus
124 channel associated with the vPCI device.  Upon    124 channel associated with the vPCI device.  Upon receipt of such
125 a message, the Hyper-V virtual PCI driver in L    125 a message, the Hyper-V virtual PCI driver in Linux
126 asynchronously invokes Linux kernel PCI subsys    126 asynchronously invokes Linux kernel PCI subsystem calls to
127 shutdown and remove the device.  When those ca    127 shutdown and remove the device.  When those calls are
128 complete, an "Ejection Complete" message is se    128 complete, an "Ejection Complete" message is sent back to
129 Hyper-V over the VMBus channel indicating that    129 Hyper-V over the VMBus channel indicating that the device has
130 been removed.  At this point, Hyper-V sends a     130 been removed.  At this point, Hyper-V sends a VMBus rescind
131 message to the Linux guest, which the VMBus dr    131 message to the Linux guest, which the VMBus driver in Linux
132 processes by removing the VMBus identity for t    132 processes by removing the VMBus identity for the device.  Once
133 that processing is complete, all vestiges of t    133 that processing is complete, all vestiges of the device having
134 been present are gone from the Linux kernel.      134 been present are gone from the Linux kernel.  The rescind
135 message also indicates to the guest that Hyper    135 message also indicates to the guest that Hyper-V has stopped
136 providing support for the vPCI device in the g    136 providing support for the vPCI device in the guest.  If the
137 guest were to attempt to access that device's     137 guest were to attempt to access that device's MMIO space, it
138 would be an invalid reference. Hypercalls affe    138 would be an invalid reference. Hypercalls affecting the device
139 return errors, and any further messages sent i    139 return errors, and any further messages sent in the VMBus
140 channel are ignored.                              140 channel are ignored.
141                                                   141 
142 After sending the Eject message, Hyper-V allow    142 After sending the Eject message, Hyper-V allows the guest VM
143 60 seconds to cleanly shutdown the device and     143 60 seconds to cleanly shutdown the device and respond with
144 Ejection Complete before sending the VMBus res    144 Ejection Complete before sending the VMBus rescind
145 message.  If for any reason the Eject steps do    145 message.  If for any reason the Eject steps don't complete
146 within the allowed 60 seconds, the Hyper-V hos    146 within the allowed 60 seconds, the Hyper-V host forcibly
147 performs the rescind steps, which will likely     147 performs the rescind steps, which will likely result in
148 cascading errors in the guest because the devi    148 cascading errors in the guest because the device is now no
149 longer present from the guest standpoint and a    149 longer present from the guest standpoint and accessing the
150 device MMIO space will fail.                      150 device MMIO space will fail.
151                                                   151 
152 Because ejection is asynchronous and can happe    152 Because ejection is asynchronous and can happen at any point
153 during the guest VM lifecycle, proper synchron    153 during the guest VM lifecycle, proper synchronization in the
154 Hyper-V virtual PCI driver is very tricky.  Ej    154 Hyper-V virtual PCI driver is very tricky.  Ejection has been
155 observed even before a newly offered vPCI devi    155 observed even before a newly offered vPCI device has been
156 fully setup.  The Hyper-V virtual PCI driver h    156 fully setup.  The Hyper-V virtual PCI driver has been updated
157 several times over the years to fix race condi    157 several times over the years to fix race conditions when
158 ejections happen at inopportune times. Care mu    158 ejections happen at inopportune times. Care must be taken when
159 modifying this code to prevent re-introducing     159 modifying this code to prevent re-introducing such problems.
160 See comments in the code.                         160 See comments in the code.
161                                                   161 
162 Interrupt Assignment                              162 Interrupt Assignment
163 --------------------                              163 --------------------
164 The Hyper-V virtual PCI driver supports vPCI d    164 The Hyper-V virtual PCI driver supports vPCI devices using
165 MSI, multi-MSI, or MSI-X.  Assigning the guest    165 MSI, multi-MSI, or MSI-X.  Assigning the guest vCPU that will
166 receive the interrupt for a particular MSI or     166 receive the interrupt for a particular MSI or MSI-X message is
167 complex because of the way the Linux setup of     167 complex because of the way the Linux setup of IRQs maps onto
168 the Hyper-V interfaces.  For the single-MSI an    168 the Hyper-V interfaces.  For the single-MSI and MSI-X cases,
169 Linux calls hv_compse_msi_msg() twice, with th    169 Linux calls hv_compse_msi_msg() twice, with the first call
170 containing a dummy vCPU and the second call co    170 containing a dummy vCPU and the second call containing the
171 real vCPU.  Furthermore, hv_irq_unmask() is fi    171 real vCPU.  Furthermore, hv_irq_unmask() is finally called
172 (on x86) or the GICD registers are set (on arm    172 (on x86) or the GICD registers are set (on arm64) to specify
173 the real vCPU again.  Each of these three call    173 the real vCPU again.  Each of these three calls interact
174 with Hyper-V, which must decide which physical    174 with Hyper-V, which must decide which physical CPU should
175 receive the interrupt before it is forwarded t    175 receive the interrupt before it is forwarded to the guest VM.
176 Unfortunately, the Hyper-V decision-making pro    176 Unfortunately, the Hyper-V decision-making process is a bit
177 limited, and can result in concentrating the p    177 limited, and can result in concentrating the physical
178 interrupts on a single CPU, causing a performa    178 interrupts on a single CPU, causing a performance bottleneck.
179 See details about how this is resolved in the     179 See details about how this is resolved in the extensive
180 comment above the function hv_compose_msi_req_    180 comment above the function hv_compose_msi_req_get_cpu().
181                                                   181 
182 The Hyper-V virtual PCI driver implements the     182 The Hyper-V virtual PCI driver implements the
183 irq_chip.irq_compose_msi_msg function as hv_co    183 irq_chip.irq_compose_msi_msg function as hv_compose_msi_msg().
184 Unfortunately, on Hyper-V the implementation r    184 Unfortunately, on Hyper-V the implementation requires sending
185 a VMBus message to the Hyper-V host and awaiti    185 a VMBus message to the Hyper-V host and awaiting an interrupt
186 indicating receipt of a reply message.  Since     186 indicating receipt of a reply message.  Since
187 irq_chip.irq_compose_msi_msg can be called wit    187 irq_chip.irq_compose_msi_msg can be called with IRQ locks
188 held, it doesn't work to do the normal sleep u    188 held, it doesn't work to do the normal sleep until awakened by
189 the interrupt. Instead hv_compose_msi_msg() mu    189 the interrupt. Instead hv_compose_msi_msg() must send the
190 VMBus message, and then poll for the completio    190 VMBus message, and then poll for the completion message. As
191 further complexity, the vPCI device could be e    191 further complexity, the vPCI device could be ejected/rescinded
192 while the polling is in progress, so this scen    192 while the polling is in progress, so this scenario must be
193 detected as well.  See comments in the code re    193 detected as well.  See comments in the code regarding this
194 very tricky area.                                 194 very tricky area.
195                                                   195 
196 Most of the code in the Hyper-V virtual PCI dr    196 Most of the code in the Hyper-V virtual PCI driver (pci-
197 hyperv.c) applies to Hyper-V and Linux guests     197 hyperv.c) applies to Hyper-V and Linux guests running on x86
198 and on arm64 architectures.  But there are dif    198 and on arm64 architectures.  But there are differences in how
199 interrupt assignments are managed.  On x86, th    199 interrupt assignments are managed.  On x86, the Hyper-V
200 virtual PCI driver in the guest must make a hy    200 virtual PCI driver in the guest must make a hypercall to tell
201 Hyper-V which guest vCPU should be interrupted    201 Hyper-V which guest vCPU should be interrupted by each
202 MSI/MSI-X interrupt, and the x86 interrupt vec    202 MSI/MSI-X interrupt, and the x86 interrupt vector number that
203 the x86_vector IRQ domain has picked for the i    203 the x86_vector IRQ domain has picked for the interrupt.  This
204 hypercall is made by hv_arch_irq_unmask().  On    204 hypercall is made by hv_arch_irq_unmask().  On arm64, the
205 Hyper-V virtual PCI driver manages the allocat    205 Hyper-V virtual PCI driver manages the allocation of an SPI
206 for each MSI/MSI-X interrupt.  The Hyper-V vir    206 for each MSI/MSI-X interrupt.  The Hyper-V virtual PCI driver
207 stores the allocated SPI in the architectural     207 stores the allocated SPI in the architectural GICD registers,
208 which Hyper-V emulates, so no hypercall is nec    208 which Hyper-V emulates, so no hypercall is necessary as with
209 x86.  Hyper-V does not support using LPIs for     209 x86.  Hyper-V does not support using LPIs for vPCI devices in
210 arm64 guest VMs because it does not emulate a     210 arm64 guest VMs because it does not emulate a GICv3 ITS.
211                                                   211 
212 The Hyper-V virtual PCI driver in Linux suppor    212 The Hyper-V virtual PCI driver in Linux supports vPCI devices
213 whose drivers create managed or unmanaged Linu    213 whose drivers create managed or unmanaged Linux IRQs.  If the
214 smp_affinity for an unmanaged IRQ is updated v    214 smp_affinity for an unmanaged IRQ is updated via the /proc/irq
215 interface, the Hyper-V virtual PCI driver is c    215 interface, the Hyper-V virtual PCI driver is called to tell
216 the Hyper-V host to change the interrupt targe    216 the Hyper-V host to change the interrupt targeting and
217 everything works properly.  However, on x86 if    217 everything works properly.  However, on x86 if the x86_vector
218 IRQ domain needs to reassign an interrupt vect    218 IRQ domain needs to reassign an interrupt vector due to
219 running out of vectors on a CPU, there's no pa    219 running out of vectors on a CPU, there's no path to inform the
220 Hyper-V host of the change, and things break.     220 Hyper-V host of the change, and things break.  Fortunately,
221 guest VMs operate in a constrained device envi    221 guest VMs operate in a constrained device environment where
222 using all the vectors on a CPU doesn't happen.    222 using all the vectors on a CPU doesn't happen. Since such a
223 problem is only a theoretical concern rather t    223 problem is only a theoretical concern rather than a practical
224 concern, it has been left unaddressed.            224 concern, it has been left unaddressed.
225                                                   225 
226 DMA                                               226 DMA
227 ---                                               227 ---
228 By default, Hyper-V pins all guest VM memory i    228 By default, Hyper-V pins all guest VM memory in the host
229 when the VM is created, and programs the physi    229 when the VM is created, and programs the physical IOMMU to
230 allow the VM to have DMA access to all its mem    230 allow the VM to have DMA access to all its memory.  Hence
231 it is safe to assign PCI devices to the VM, an    231 it is safe to assign PCI devices to the VM, and allow the
232 guest operating system to program the DMA tran    232 guest operating system to program the DMA transfers.  The
233 physical IOMMU prevents a malicious guest from    233 physical IOMMU prevents a malicious guest from initiating
234 DMA to memory belonging to the host or to othe    234 DMA to memory belonging to the host or to other VMs on the
235 host. From the Linux guest standpoint, such DM    235 host. From the Linux guest standpoint, such DMA transfers
236 are in "direct" mode since Hyper-V does not pr    236 are in "direct" mode since Hyper-V does not provide a virtual
237 IOMMU in the guest.                               237 IOMMU in the guest.
238                                                   238 
239 Hyper-V assumes that physical PCI devices alwa    239 Hyper-V assumes that physical PCI devices always perform
240 cache-coherent DMA.  When running on x86, this    240 cache-coherent DMA.  When running on x86, this behavior is
241 required by the architecture.  When running on    241 required by the architecture.  When running on arm64, the
242 architecture allows for both cache-coherent an    242 architecture allows for both cache-coherent and
243 non-cache-coherent devices, with the behavior     243 non-cache-coherent devices, with the behavior of each device
244 specified in the ACPI DSDT.  But when a PCI de    244 specified in the ACPI DSDT.  But when a PCI device is assigned
245 to a guest VM, that device does not appear in     245 to a guest VM, that device does not appear in the DSDT, so the
246 Hyper-V VMBus driver propagates cache-coherenc    246 Hyper-V VMBus driver propagates cache-coherency information
247 from the VMBus node in the ACPI DSDT to all VM    247 from the VMBus node in the ACPI DSDT to all VMBus devices,
248 including vPCI devices (since they have a dual    248 including vPCI devices (since they have a dual identity as a VMBus
249 device and as a PCI device).  See vmbus_dma_co    249 device and as a PCI device).  See vmbus_dma_configure().
250 Current Hyper-V versions always indicate that     250 Current Hyper-V versions always indicate that the VMBus is
251 cache coherent, so vPCI devices on arm64 alway    251 cache coherent, so vPCI devices on arm64 always get marked as
252 cache coherent and the CPU does not perform an    252 cache coherent and the CPU does not perform any sync
253 operations as part of dma_map/unmap_*() calls.    253 operations as part of dma_map/unmap_*() calls.
254                                                   254 
255 vPCI protocol versions                            255 vPCI protocol versions
256 ----------------------                            256 ----------------------
257 As previously described, during vPCI device se    257 As previously described, during vPCI device setup and teardown
258 messages are passed over a VMBus channel betwe    258 messages are passed over a VMBus channel between the Hyper-V
259 host and the Hyper-v vPCI driver in the Linux     259 host and the Hyper-v vPCI driver in the Linux guest.  Some
260 messages have been revised in newer versions o    260 messages have been revised in newer versions of Hyper-V, so
261 the guest and host must agree on the vPCI prot    261 the guest and host must agree on the vPCI protocol version to
262 be used.  The version is negotiated when commu    262 be used.  The version is negotiated when communication over
263 the VMBus channel is first established.  See      263 the VMBus channel is first established.  See
264 hv_pci_protocol_negotiation(). Newer versions     264 hv_pci_protocol_negotiation(). Newer versions of the protocol
265 extend support to VMs with more than 64 vCPUs,    265 extend support to VMs with more than 64 vCPUs, and provide
266 additional information about the vPCI device,     266 additional information about the vPCI device, such as the
267 guest virtual NUMA node to which it is most cl    267 guest virtual NUMA node to which it is most closely affined in
268 the underlying hardware.                          268 the underlying hardware.
269                                                   269 
270 Guest NUMA node affinity                          270 Guest NUMA node affinity
271 ------------------------                          271 ------------------------
272 When the vPCI protocol version provides it, th    272 When the vPCI protocol version provides it, the guest NUMA
273 node affinity of the vPCI device is stored as     273 node affinity of the vPCI device is stored as part of the Linux
274 device information for subsequent use by the L    274 device information for subsequent use by the Linux driver. See
275 hv_pci_assign_numa_node().  If the negotiated     275 hv_pci_assign_numa_node().  If the negotiated protocol version
276 does not support the host providing NUMA affin    276 does not support the host providing NUMA affinity information,
277 the Linux guest defaults the device NUMA node     277 the Linux guest defaults the device NUMA node to 0.  But even
278 when the negotiated protocol version includes     278 when the negotiated protocol version includes NUMA affinity
279 information, the ability of the host to provid    279 information, the ability of the host to provide such
280 information depends on certain host configurat    280 information depends on certain host configuration options.  If
281 the guest receives NUMA node value "0", it cou    281 the guest receives NUMA node value "0", it could mean NUMA
282 node 0, or it could mean "no information is av    282 node 0, or it could mean "no information is available".
283 Unfortunately it is not possible to distinguis    283 Unfortunately it is not possible to distinguish the two cases
284 from the guest side.                              284 from the guest side.
285                                                   285 
286 PCI config space access in a CoCo VM              286 PCI config space access in a CoCo VM
287 ------------------------------------              287 ------------------------------------
288 Linux PCI device drivers access PCI config spa    288 Linux PCI device drivers access PCI config space using a
289 standard set of functions provided by the Linu    289 standard set of functions provided by the Linux PCI subsystem.
290 In Hyper-V guests these standard functions map    290 In Hyper-V guests these standard functions map to functions
291 hv_pcifront_read_config() and hv_pcifront_writ    291 hv_pcifront_read_config() and hv_pcifront_write_config()
292 in the Hyper-V virtual PCI driver.  In normal     292 in the Hyper-V virtual PCI driver.  In normal VMs,
293 these hv_pcifront_*() functions directly acces    293 these hv_pcifront_*() functions directly access the PCI config
294 space, and the accesses trap to Hyper-V to be     294 space, and the accesses trap to Hyper-V to be handled.
295 But in CoCo VMs, memory encryption prevents Hy    295 But in CoCo VMs, memory encryption prevents Hyper-V
296 from reading the guest instruction stream to e    296 from reading the guest instruction stream to emulate the
297 access, so the hv_pcifront_*() functions must     297 access, so the hv_pcifront_*() functions must invoke
298 hypercalls with explicit arguments describing     298 hypercalls with explicit arguments describing the access to be
299 made.                                             299 made.
300                                                   300 
301 Config Block back-channel                         301 Config Block back-channel
302 -------------------------                         302 -------------------------
303 The Hyper-V host and Hyper-V virtual PCI drive    303 The Hyper-V host and Hyper-V virtual PCI driver in Linux
304 together implement a non-standard back-channel    304 together implement a non-standard back-channel communication
305 path between the host and guest.  The back-cha    305 path between the host and guest.  The back-channel path uses
306 messages sent over the VMBus channel associate    306 messages sent over the VMBus channel associated with the vPCI
307 device.  The functions hyperv_read_cfg_blk() a    307 device.  The functions hyperv_read_cfg_blk() and
308 hyperv_write_cfg_blk() are the primary interfa    308 hyperv_write_cfg_blk() are the primary interfaces provided to
309 other parts of the Linux kernel.  As of this w    309 other parts of the Linux kernel.  As of this writing, these
310 interfaces are used only by the Mellanox mlx5     310 interfaces are used only by the Mellanox mlx5 driver to pass
311 diagnostic data to a Hyper-V host running in t    311 diagnostic data to a Hyper-V host running in the Azure public
312 cloud.  The functions hyperv_read_cfg_blk() an    312 cloud.  The functions hyperv_read_cfg_blk() and
313 hyperv_write_cfg_blk() are implemented in a se    313 hyperv_write_cfg_blk() are implemented in a separate module
314 (pci-hyperv-intf.c, under CONFIG_PCI_HYPERV_IN    314 (pci-hyperv-intf.c, under CONFIG_PCI_HYPERV_INTERFACE) that
315 effectively stubs them out when running in non    315 effectively stubs them out when running in non-Hyper-V
316 environments.                                     316 environments.
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php