1 .. SPDX-License-Identifier: GPL-2.0 1 .. SPDX-License-Identifier: GPL-2.0 2 2 3 Confidential Computing VMs 3 Confidential Computing VMs 4 ========================== 4 ========================== 5 Hyper-V can create and run Linux guests that a 5 Hyper-V can create and run Linux guests that are Confidential Computing 6 (CoCo) VMs. Such VMs cooperate with the physic 6 (CoCo) VMs. Such VMs cooperate with the physical processor to better protect 7 the confidentiality and integrity of data in t 7 the confidentiality and integrity of data in the VM's memory, even in the 8 face of a hypervisor/VMM that has been comprom 8 face of a hypervisor/VMM that has been compromised and may behave maliciously. 9 CoCo VMs on Hyper-V share the generic CoCo VM 9 CoCo VMs on Hyper-V share the generic CoCo VM threat model and security 10 objectives described in Documentation/security 10 objectives described in Documentation/security/snp-tdx-threat-model.rst. Note 11 that Hyper-V specific code in Linux refers to 11 that Hyper-V specific code in Linux refers to CoCo VMs as "isolated VMs" or 12 "isolation VMs". 12 "isolation VMs". 13 13 14 A Linux CoCo VM on Hyper-V requires the cooper 14 A Linux CoCo VM on Hyper-V requires the cooperation and interaction of the 15 following: 15 following: 16 16 17 * Physical hardware with a processor that supp 17 * Physical hardware with a processor that supports CoCo VMs 18 18 19 * The hardware runs a version of Windows/Hyper 19 * The hardware runs a version of Windows/Hyper-V with support for CoCo VMs 20 20 21 * The VM runs a version of Linux that supports 21 * The VM runs a version of Linux that supports being a CoCo VM 22 22 23 The physical hardware requirements are as foll 23 The physical hardware requirements are as follows: 24 24 25 * AMD processor with SEV-SNP. Hyper-V does not 25 * AMD processor with SEV-SNP. Hyper-V does not run guest VMs with AMD SME, 26 SEV, or SEV-ES encryption, and such encrypti 26 SEV, or SEV-ES encryption, and such encryption is not sufficient for a CoCo 27 VM on Hyper-V. 27 VM on Hyper-V. 28 28 29 * Intel processor with TDX 29 * Intel processor with TDX 30 30 31 To create a CoCo VM, the "Isolated VM" attribu 31 To create a CoCo VM, the "Isolated VM" attribute must be specified to Hyper-V 32 when the VM is created. A VM cannot be changed 32 when the VM is created. A VM cannot be changed from a CoCo VM to a normal VM, 33 or vice versa, after it is created. 33 or vice versa, after it is created. 34 34 35 Operational Modes 35 Operational Modes 36 ----------------- 36 ----------------- 37 Hyper-V CoCo VMs can run in two modes. The mod 37 Hyper-V CoCo VMs can run in two modes. The mode is selected when the VM is 38 created and cannot be changed during the life 38 created and cannot be changed during the life of the VM. 39 39 40 * Fully-enlightened mode. In this mode, the gu 40 * Fully-enlightened mode. In this mode, the guest operating system is 41 enlightened to understand and manage all asp 41 enlightened to understand and manage all aspects of running as a CoCo VM. 42 42 43 * Paravisor mode. In this mode, a paravisor la 43 * Paravisor mode. In this mode, a paravisor layer between the guest and the 44 host provides some operations needed to run 44 host provides some operations needed to run as a CoCo VM. The guest operating 45 system can have fewer CoCo enlightenments th 45 system can have fewer CoCo enlightenments than is required in the 46 fully-enlightened case. 46 fully-enlightened case. 47 47 48 Conceptually, fully-enlightened mode and parav 48 Conceptually, fully-enlightened mode and paravisor mode may be treated as 49 points on a spectrum spanning the degree of gu 49 points on a spectrum spanning the degree of guest enlightenment needed to run 50 as a CoCo VM. Fully-enlightened mode is one en 50 as a CoCo VM. Fully-enlightened mode is one end of the spectrum. A full 51 implementation of paravisor mode is the other 51 implementation of paravisor mode is the other end of the spectrum, where all 52 aspects of running as a CoCo VM are handled by 52 aspects of running as a CoCo VM are handled by the paravisor, and a normal 53 guest OS with no knowledge of memory encryptio 53 guest OS with no knowledge of memory encryption or other aspects of CoCo VMs 54 can run successfully. However, the Hyper-V imp 54 can run successfully. However, the Hyper-V implementation of paravisor mode 55 does not go this far, and is somewhere in the 55 does not go this far, and is somewhere in the middle of the spectrum. Some 56 aspects of CoCo VMs are handled by the Hyper-V 56 aspects of CoCo VMs are handled by the Hyper-V paravisor while the guest OS 57 must be enlightened for other aspects. Unfortu 57 must be enlightened for other aspects. Unfortunately, there is no 58 standardized enumeration of feature/functions 58 standardized enumeration of feature/functions that might be provided in the 59 paravisor, and there is no standardized mechan 59 paravisor, and there is no standardized mechanism for a guest OS to query the 60 paravisor for the feature/functions it provide 60 paravisor for the feature/functions it provides. The understanding of what 61 the paravisor provides is hard-coded in the gu 61 the paravisor provides is hard-coded in the guest OS. 62 62 63 Paravisor mode has similarities to the `Coconu 63 Paravisor mode has similarities to the `Coconut project`_, which aims to provide 64 a limited paravisor to provide services to the 64 a limited paravisor to provide services to the guest such as a virtual TPM. 65 However, the Hyper-V paravisor generally handl 65 However, the Hyper-V paravisor generally handles more aspects of CoCo VMs 66 than is currently envisioned for Coconut, and 66 than is currently envisioned for Coconut, and so is further toward the "no 67 guest enlightenments required" end of the spec 67 guest enlightenments required" end of the spectrum. 68 68 69 .. _Coconut project: https://github.com/coconu 69 .. _Coconut project: https://github.com/coconut-svsm/svsm 70 70 71 In the CoCo VM threat model, the paravisor is 71 In the CoCo VM threat model, the paravisor is in the guest security domain 72 and must be trusted by the guest OS. By implic 72 and must be trusted by the guest OS. By implication, the hypervisor/VMM must 73 protect itself against a potentially malicious 73 protect itself against a potentially malicious paravisor just like it 74 protects against a potentially malicious guest 74 protects against a potentially malicious guest. 75 75 76 The hardware architectural approach to fully-e 76 The hardware architectural approach to fully-enlightened vs. paravisor mode 77 varies depending on the underlying processor. 77 varies depending on the underlying processor. 78 78 79 * With AMD SEV-SNP processors, in fully-enligh 79 * With AMD SEV-SNP processors, in fully-enlightened mode the guest OS runs in 80 VMPL 0 and has full control of the guest con 80 VMPL 0 and has full control of the guest context. In paravisor mode, the 81 guest OS runs in VMPL 2 and the paravisor ru 81 guest OS runs in VMPL 2 and the paravisor runs in VMPL 0. The paravisor 82 running in VMPL 0 has privileges that the gu 82 running in VMPL 0 has privileges that the guest OS in VMPL 2 does not have. 83 Certain operations require the guest to invo 83 Certain operations require the guest to invoke the paravisor. Furthermore, in 84 paravisor mode the guest OS operates in "vir 84 paravisor mode the guest OS operates in "virtual Top Of Memory" (vTOM) mode 85 as defined by the SEV-SNP architecture. This 85 as defined by the SEV-SNP architecture. This mode simplifies guest management 86 of memory encryption when a paravisor is use 86 of memory encryption when a paravisor is used. 87 87 88 * With Intel TDX processor, in fully-enlighten 88 * With Intel TDX processor, in fully-enlightened mode the guest OS runs in an 89 L1 VM. In paravisor mode, TD partitioning is 89 L1 VM. In paravisor mode, TD partitioning is used. The paravisor runs in the 90 L1 VM, and the guest OS runs in a nested L2 90 L1 VM, and the guest OS runs in a nested L2 VM. 91 91 92 Hyper-V exposes a synthetic MSR to guests that 92 Hyper-V exposes a synthetic MSR to guests that describes the CoCo mode. This 93 MSR indicates if the underlying processor uses 93 MSR indicates if the underlying processor uses AMD SEV-SNP or Intel TDX, and 94 whether a paravisor is being used. It is strai 94 whether a paravisor is being used. It is straightforward to build a single 95 kernel image that can boot and run properly on 95 kernel image that can boot and run properly on either architecture, and in 96 either mode. 96 either mode. 97 97 98 Paravisor Effects 98 Paravisor Effects 99 ----------------- 99 ----------------- 100 Running in paravisor mode affects the followin 100 Running in paravisor mode affects the following areas of generic Linux kernel 101 CoCo VM functionality: 101 CoCo VM functionality: 102 102 103 * Initial guest memory setup. When a new VM is 103 * Initial guest memory setup. When a new VM is created in paravisor mode, the 104 paravisor runs first and sets up the guest p 104 paravisor runs first and sets up the guest physical memory as encrypted. The 105 guest Linux does normal memory initializatio 105 guest Linux does normal memory initialization, except for explicitly marking 106 appropriate ranges as decrypted (shared). In 106 appropriate ranges as decrypted (shared). In paravisor mode, Linux does not 107 perform the early boot memory setup steps th 107 perform the early boot memory setup steps that are particularly tricky with 108 AMD SEV-SNP in fully-enlightened mode. 108 AMD SEV-SNP in fully-enlightened mode. 109 109 110 * #VC/#VE exception handling. In paravisor mod 110 * #VC/#VE exception handling. In paravisor mode, Hyper-V configures the guest 111 CoCo VM to route #VC and #VE exceptions to V 111 CoCo VM to route #VC and #VE exceptions to VMPL 0 and the L1 VM, 112 respectively, and not the guest Linux. Conse 112 respectively, and not the guest Linux. Consequently, these exception handlers 113 do not run in the guest Linux and are not a 113 do not run in the guest Linux and are not a required enlightenment for a 114 Linux guest in paravisor mode. 114 Linux guest in paravisor mode. 115 115 116 * CPUID flags. Both AMD SEV-SNP and Intel TDX 116 * CPUID flags. Both AMD SEV-SNP and Intel TDX provide a CPUID flag in the 117 guest indicating that the VM is operating wi 117 guest indicating that the VM is operating with the respective hardware 118 support. While these CPUID flags are visible 118 support. While these CPUID flags are visible in fully-enlightened CoCo VMs, 119 the paravisor filters out these flags and th 119 the paravisor filters out these flags and the guest Linux does not see them. 120 Throughout the Linux kernel, explicitly test 120 Throughout the Linux kernel, explicitly testing these flags has mostly been 121 eliminated in favor of the cc_platform_has() 121 eliminated in favor of the cc_platform_has() function, with the goal of 122 abstracting the differences between SEV-SNP 122 abstracting the differences between SEV-SNP and TDX. But the 123 cc_platform_has() abstraction also allows th 123 cc_platform_has() abstraction also allows the Hyper-V paravisor configuration 124 to selectively enable aspects of CoCo VM fun 124 to selectively enable aspects of CoCo VM functionality even when the CPUID 125 flags are not set. The exception is early bo 125 flags are not set. The exception is early boot memory setup on SEV-SNP, which 126 tests the CPUID SEV-SNP flag. But not having 126 tests the CPUID SEV-SNP flag. But not having the flag in Hyper-V paravisor 127 mode VM achieves the desired effect or not r 127 mode VM achieves the desired effect or not running SEV-SNP specific early 128 boot memory setup. 128 boot memory setup. 129 129 130 * Device emulation. In paravisor mode, the Hyp 130 * Device emulation. In paravisor mode, the Hyper-V paravisor provides 131 emulation of devices such as the IO-APIC and 131 emulation of devices such as the IO-APIC and TPM. Because the emulation 132 happens in the paravisor in the guest contex 132 happens in the paravisor in the guest context (instead of the hypervisor/VMM 133 context), MMIO accesses to these devices mus 133 context), MMIO accesses to these devices must be encrypted references instead 134 of the decrypted references that would be us 134 of the decrypted references that would be used in a fully-enlightened CoCo 135 VM. The __ioremap_caller() function has been 135 VM. The __ioremap_caller() function has been enhanced to make a callback to 136 check whether a particular address range sho 136 check whether a particular address range should be treated as encrypted 137 (private). See the "is_private_mmio" callbac 137 (private). See the "is_private_mmio" callback. 138 138 139 * Encrypt/decrypt memory transitions. In a CoC 139 * Encrypt/decrypt memory transitions. In a CoCo VM, transitioning guest 140 memory between encrypted and decrypted requi 140 memory between encrypted and decrypted requires coordinating with the 141 hypervisor/VMM. This is done via callbacks i 141 hypervisor/VMM. This is done via callbacks invoked from 142 __set_memory_enc_pgtable(). In fully-enlight 142 __set_memory_enc_pgtable(). In fully-enlightened mode, the normal SEV-SNP and 143 TDX implementations of these callbacks are u 143 TDX implementations of these callbacks are used. In paravisor mode, a Hyper-V 144 specific set of callbacks is used. These cal 144 specific set of callbacks is used. These callbacks invoke the paravisor so 145 that the paravisor can coordinate the transi 145 that the paravisor can coordinate the transitions and inform the hypervisor 146 as necessary. See hv_vtom_init() where these 146 as necessary. See hv_vtom_init() where these callback are set up. 147 147 148 * Interrupt injection. In fully enlightened mo 148 * Interrupt injection. In fully enlightened mode, a malicious hypervisor 149 could inject interrupts into the guest OS at 149 could inject interrupts into the guest OS at times that violate x86/x64 150 architectural rules. For full protection, th 150 architectural rules. For full protection, the guest OS should include 151 enlightenments that use the interrupt inject 151 enlightenments that use the interrupt injection management features provided 152 by CoCo-capable processors. In paravisor mod 152 by CoCo-capable processors. In paravisor mode, the paravisor mediates 153 interrupt injection into the guest OS, and e 153 interrupt injection into the guest OS, and ensures that the guest OS only 154 sees interrupts that are "legal". The paravi 154 sees interrupts that are "legal". The paravisor uses the interrupt injection 155 management features provided by the CoCo-cap 155 management features provided by the CoCo-capable physical processor, thereby 156 masking these complexities from the guest OS 156 masking these complexities from the guest OS. 157 157 158 Hyper-V Hypercalls 158 Hyper-V Hypercalls 159 ------------------ 159 ------------------ 160 When in fully-enlightened mode, hypercalls mad 160 When in fully-enlightened mode, hypercalls made by the Linux guest are routed 161 directly to the hypervisor, just as in a non-C 161 directly to the hypervisor, just as in a non-CoCo VM. But in paravisor mode, 162 normal hypercalls trap to the paravisor first, 162 normal hypercalls trap to the paravisor first, which may in turn invoke the 163 hypervisor. But the paravisor is idiosyncratic 163 hypervisor. But the paravisor is idiosyncratic in this regard, and a few 164 hypercalls made by the Linux guest must always 164 hypercalls made by the Linux guest must always be routed directly to the 165 hypervisor. These hypercall sites test for a p 165 hypervisor. These hypercall sites test for a paravisor being present, and use 166 a special invocation sequence. See hv_post_mes 166 a special invocation sequence. See hv_post_message(), for example. 167 167 168 Guest communication with Hyper-V 168 Guest communication with Hyper-V 169 -------------------------------- 169 -------------------------------- 170 Separate from the generic Linux kernel handlin 170 Separate from the generic Linux kernel handling of memory encryption in Linux 171 CoCo VMs, Hyper-V has VMBus and VMBus devices 171 CoCo VMs, Hyper-V has VMBus and VMBus devices that communicate using memory 172 shared between the Linux guest and the host. T 172 shared between the Linux guest and the host. This shared memory must be 173 marked decrypted to enable communication. Furt 173 marked decrypted to enable communication. Furthermore, since the threat model 174 includes a compromised and potentially malicio 174 includes a compromised and potentially malicious host, the guest must guard 175 against leaking any unintended data to the hos 175 against leaking any unintended data to the host through this shared memory. 176 176 177 These Hyper-V and VMBus memory pages are marke 177 These Hyper-V and VMBus memory pages are marked as decrypted: 178 178 179 * VMBus monitor pages 179 * VMBus monitor pages 180 180 181 * Synthetic interrupt controller (synic) relat 181 * Synthetic interrupt controller (synic) related pages (unless supplied by 182 the paravisor) 182 the paravisor) 183 183 184 * Per-cpu hypercall input and output pages (un 184 * Per-cpu hypercall input and output pages (unless running with a paravisor) 185 185 186 * VMBus ring buffers. The direct mapping is ma 186 * VMBus ring buffers. The direct mapping is marked decrypted in 187 __vmbus_establish_gpadl(). The secondary map 187 __vmbus_establish_gpadl(). The secondary mapping created in 188 hv_ringbuffer_init() must also include the " 188 hv_ringbuffer_init() must also include the "decrypted" attribute. 189 189 190 When the guest writes data to memory that is s 190 When the guest writes data to memory that is shared with the host, it must 191 ensure that only the intended data is written. 191 ensure that only the intended data is written. Padding or unused fields must 192 be initialized to zeros before copying into th 192 be initialized to zeros before copying into the shared memory so that random 193 kernel data is not inadvertently given to the 193 kernel data is not inadvertently given to the host. 194 194 195 Similarly, when the guest reads memory that is 195 Similarly, when the guest reads memory that is shared with the host, it must 196 validate the data before acting on it so that 196 validate the data before acting on it so that a malicious host cannot induce 197 the guest to expose unintended data. Doing suc 197 the guest to expose unintended data. Doing such validation can be tricky 198 because the host can modify the shared memory 198 because the host can modify the shared memory areas even while or after 199 validation is performed. For messages passed f 199 validation is performed. For messages passed from the host to the guest in a 200 VMBus ring buffer, the length of the message i 200 VMBus ring buffer, the length of the message is validated, and the message is 201 copied into a temporary (encrypted) buffer for 201 copied into a temporary (encrypted) buffer for further validation and 202 processing. The copying adds a small amount of 202 processing. The copying adds a small amount of overhead, but is the only way 203 to protect against a malicious host. See hv_pk 203 to protect against a malicious host. See hv_pkt_iter_first(). 204 204 205 Many drivers for VMBus devices have been "hard 205 Many drivers for VMBus devices have been "hardened" by adding code to fully 206 validate messages received over VMBus, instead 206 validate messages received over VMBus, instead of assuming that Hyper-V is 207 acting cooperatively. Such drivers are marked 207 acting cooperatively. Such drivers are marked as "allowed_in_isolated" in the 208 vmbus_devs[] table. Other drivers for VMBus de 208 vmbus_devs[] table. Other drivers for VMBus devices that are not needed in a 209 CoCo VM have not been hardened, and they are n 209 CoCo VM have not been hardened, and they are not allowed to load in a CoCo 210 VM. See vmbus_is_valid_offer() where such devi 210 VM. See vmbus_is_valid_offer() where such devices are excluded. 211 211 212 Two VMBus devices depend on the Hyper-V host t 212 Two VMBus devices depend on the Hyper-V host to do DMA data transfers: 213 storvsc for disk I/O and netvsc for network I/ 213 storvsc for disk I/O and netvsc for network I/O. storvsc uses the normal 214 Linux kernel DMA APIs, and so bounce buffering 214 Linux kernel DMA APIs, and so bounce buffering through decrypted swiotlb 215 memory is done implicitly. netvsc has two mode 215 memory is done implicitly. netvsc has two modes for data transfers. The first 216 mode goes through send and receive buffer spac 216 mode goes through send and receive buffer space that is explicitly allocated 217 by the netvsc driver, and is used for most sma 217 by the netvsc driver, and is used for most smaller packets. These send and 218 receive buffers are marked decrypted by __vmbu 218 receive buffers are marked decrypted by __vmbus_establish_gpadl(). Because 219 the netvsc driver explicitly copies packets to 219 the netvsc driver explicitly copies packets to/from these buffers, the 220 equivalent of bounce buffering between encrypt 220 equivalent of bounce buffering between encrypted and decrypted memory is 221 already part of the data path. The second mode 221 already part of the data path. The second mode uses the normal Linux kernel 222 DMA APIs, and is bounce buffered through swiot 222 DMA APIs, and is bounce buffered through swiotlb memory implicitly like in 223 storvsc. 223 storvsc. 224 224 225 Finally, the VMBus virtual PCI driver needs sp 225 Finally, the VMBus virtual PCI driver needs special handling in a CoCo VM. 226 Linux PCI device drivers access PCI config spa 226 Linux PCI device drivers access PCI config space using standard APIs provided 227 by the Linux PCI subsystem. On Hyper-V, these 227 by the Linux PCI subsystem. On Hyper-V, these functions directly access MMIO 228 space, and the access traps to Hyper-V for emu 228 space, and the access traps to Hyper-V for emulation. But in CoCo VMs, memory 229 encryption prevents Hyper-V from reading the g 229 encryption prevents Hyper-V from reading the guest instruction stream to 230 emulate the access. So in a CoCo VM, these fun 230 emulate the access. So in a CoCo VM, these functions must make a hypercall 231 with arguments explicitly describing the acces 231 with arguments explicitly describing the access. See 232 _hv_pcifront_read_config() and _hv_pcifront_wr 232 _hv_pcifront_read_config() and _hv_pcifront_write_config() and the 233 "use_calls" flag indicating to use hypercalls. 233 "use_calls" flag indicating to use hypercalls. 234 234 235 load_unaligned_zeropad() 235 load_unaligned_zeropad() 236 ------------------------ 236 ------------------------ 237 When transitioning memory between encrypted an 237 When transitioning memory between encrypted and decrypted, the caller of 238 set_memory_encrypted() or set_memory_decrypted 238 set_memory_encrypted() or set_memory_decrypted() is responsible for ensuring 239 the memory isn't in use and isn't referenced w 239 the memory isn't in use and isn't referenced while the transition is in 240 progress. The transition has multiple steps, a 240 progress. The transition has multiple steps, and includes interaction with 241 the Hyper-V host. The memory is in an inconsis 241 the Hyper-V host. The memory is in an inconsistent state until all steps are 242 complete. A reference while the state is incon 242 complete. A reference while the state is inconsistent could result in an 243 exception that can't be cleanly fixed up. 243 exception that can't be cleanly fixed up. 244 244 245 However, the kernel load_unaligned_zeropad() m 245 However, the kernel load_unaligned_zeropad() mechanism may make stray 246 references that can't be prevented by the call 246 references that can't be prevented by the caller of set_memory_encrypted() or 247 set_memory_decrypted(), so there's specific co 247 set_memory_decrypted(), so there's specific code in the #VC or #VE exception 248 handler to fixup this case. But a CoCo VM runn 248 handler to fixup this case. But a CoCo VM running on Hyper-V may be 249 configured to run with a paravisor, with the # 249 configured to run with a paravisor, with the #VC or #VE exception routed to 250 the paravisor. There's no architectural way to 250 the paravisor. There's no architectural way to forward the exceptions back to 251 the guest kernel, and in such a case, the load 251 the guest kernel, and in such a case, the load_unaligned_zeropad() fixup code 252 in the #VC/#VE handlers doesn't run. 252 in the #VC/#VE handlers doesn't run. 253 253 254 To avoid this problem, the Hyper-V specific fu 254 To avoid this problem, the Hyper-V specific functions for notifying the 255 hypervisor of the transition mark pages as "no 255 hypervisor of the transition mark pages as "not present" while a transition 256 is in progress. If load_unaligned_zeropad() ca 256 is in progress. If load_unaligned_zeropad() causes a stray reference, a 257 normal page fault is generated instead of #VC 257 normal page fault is generated instead of #VC or #VE, and the page-fault- 258 based handlers for load_unaligned_zeropad() fi 258 based handlers for load_unaligned_zeropad() fixup the reference. When the 259 encrypted/decrypted transition is complete, th 259 encrypted/decrypted transition is complete, the pages are marked as "present" 260 again. See hv_vtom_clear_present() and hv_vtom 260 again. See hv_vtom_clear_present() and hv_vtom_set_host_visibility().
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.