~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/virt/kvm/x86/nested-vmx.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/virt/kvm/x86/nested-vmx.rst (Version linux-6.12-rc7) and /Documentation/virt/kvm/x86/nested-vmx.rst (Version linux-6.2.16)


  1 .. SPDX-License-Identifier: GPL-2.0                 1 .. SPDX-License-Identifier: GPL-2.0
  2                                                     2 
  3 ==========                                          3 ==========
  4 Nested VMX                                          4 Nested VMX
  5 ==========                                          5 ==========
  6                                                     6 
  7 Overview                                            7 Overview
  8 ---------                                           8 ---------
  9                                                     9 
 10 On Intel processors, KVM uses Intel's VMX (Vir     10 On Intel processors, KVM uses Intel's VMX (Virtual-Machine eXtensions)
 11 to easily and efficiently run guest operating      11 to easily and efficiently run guest operating systems. Normally, these guests
 12 *cannot* themselves be hypervisors running the     12 *cannot* themselves be hypervisors running their own guests, because in VMX,
 13 guests cannot use VMX instructions.                13 guests cannot use VMX instructions.
 14                                                    14 
 15 The "Nested VMX" feature adds this missing cap     15 The "Nested VMX" feature adds this missing capability - of running guest
 16 hypervisors (which use VMX) with their own nes     16 hypervisors (which use VMX) with their own nested guests. It does so by
 17 allowing a guest to use VMX instructions, and      17 allowing a guest to use VMX instructions, and correctly and efficiently
 18 emulating them using the single level of VMX a     18 emulating them using the single level of VMX available in the hardware.
 19                                                    19 
 20 We describe in much greater detail the theory      20 We describe in much greater detail the theory behind the nested VMX feature,
 21 its implementation and its performance charact     21 its implementation and its performance characteristics, in the OSDI 2010 paper
 22 "The Turtles Project: Design and Implementatio     22 "The Turtles Project: Design and Implementation of Nested Virtualization",
 23 available at:                                      23 available at:
 24                                                    24 
 25         https://www.usenix.org/events/osdi10/t     25         https://www.usenix.org/events/osdi10/tech/full_papers/Ben-Yehuda.pdf
 26                                                    26 
 27                                                    27 
 28 Terminology                                        28 Terminology
 29 -----------                                        29 -----------
 30                                                    30 
 31 Single-level virtualization has two levels - t     31 Single-level virtualization has two levels - the host (KVM) and the guests.
 32 In nested virtualization, we have three levels     32 In nested virtualization, we have three levels: The host (KVM), which we call
 33 L0, the guest hypervisor, which we call L1, an     33 L0, the guest hypervisor, which we call L1, and its nested guest, which we
 34 call L2.                                           34 call L2.
 35                                                    35 
 36                                                    36 
 37 Running nested VMX                                 37 Running nested VMX
 38 ------------------                                 38 ------------------
 39                                                    39 
 40 The nested VMX feature is enabled by default s     40 The nested VMX feature is enabled by default since Linux kernel v4.20. For
 41 older Linux kernel, it can be enabled by givin     41 older Linux kernel, it can be enabled by giving the "nested=1" option to the
 42 kvm-intel module.                                  42 kvm-intel module.
 43                                                    43 
 44                                                    44 
 45 No modifications are required to user space (q     45 No modifications are required to user space (qemu). However, qemu's default
 46 emulated CPU type (qemu64) does not list the "     46 emulated CPU type (qemu64) does not list the "VMX" CPU feature, so it must be
 47 explicitly enabled, by giving qemu one of the      47 explicitly enabled, by giving qemu one of the following options:
 48                                                    48 
 49      - cpu host              (emulated CPU has     49      - cpu host              (emulated CPU has all features of the real CPU)
 50                                                    50 
 51      - cpu qemu64,+vmx       (add just the vmx     51      - cpu qemu64,+vmx       (add just the vmx feature to a named CPU type)
 52                                                    52 
 53                                                    53 
 54 ABIs                                               54 ABIs
 55 ----                                               55 ----
 56                                                    56 
 57 Nested VMX aims to present a standard and (eve     57 Nested VMX aims to present a standard and (eventually) fully-functional VMX
 58 implementation for the a guest hypervisor to u     58 implementation for the a guest hypervisor to use. As such, the official
 59 specification of the ABI that it provides is I     59 specification of the ABI that it provides is Intel's VMX specification,
 60 namely volume 3B of their "Intel 64 and IA-32      60 namely volume 3B of their "Intel 64 and IA-32 Architectures Software
 61 Developer's Manual". Not all of VMX's features     61 Developer's Manual". Not all of VMX's features are currently fully supported,
 62 but the goal is to eventually support them all     62 but the goal is to eventually support them all, starting with the VMX features
 63 which are used in practice by popular hypervis     63 which are used in practice by popular hypervisors (KVM and others).
 64                                                    64 
 65 As a VMX implementation, nested VMX presents a     65 As a VMX implementation, nested VMX presents a VMCS structure to L1.
 66 As mandated by the spec, other than the two fi     66 As mandated by the spec, other than the two fields revision_id and abort,
 67 this structure is *opaque* to its user, who is     67 this structure is *opaque* to its user, who is not supposed to know or care
 68 about its internal structure. Rather, the stru     68 about its internal structure. Rather, the structure is accessed through the
 69 VMREAD and VMWRITE instructions.                   69 VMREAD and VMWRITE instructions.
 70 Still, for debugging purposes, KVM developers      70 Still, for debugging purposes, KVM developers might be interested to know the
 71 internals of this structure; This is struct vm     71 internals of this structure; This is struct vmcs12 from arch/x86/kvm/vmx.c.
 72                                                    72 
 73 The name "vmcs12" refers to the VMCS that L1 b     73 The name "vmcs12" refers to the VMCS that L1 builds for L2. In the code we
 74 also have "vmcs01", the VMCS that L0 built for     74 also have "vmcs01", the VMCS that L0 built for L1, and "vmcs02" is the VMCS
 75 which L0 builds to actually run L2 - how this      75 which L0 builds to actually run L2 - how this is done is explained in the
 76 aforementioned paper.                              76 aforementioned paper.
 77                                                    77 
 78 For convenience, we repeat the content of stru     78 For convenience, we repeat the content of struct vmcs12 here. If the internals
 79 of this structure changes, this can break live     79 of this structure changes, this can break live migration across KVM versions.
 80 VMCS12_REVISION (from vmx.c) should be changed     80 VMCS12_REVISION (from vmx.c) should be changed if struct vmcs12 or its inner
 81 struct shadow_vmcs is ever changed.                81 struct shadow_vmcs is ever changed.
 82                                                    82 
 83 ::                                                 83 ::
 84                                                    84 
 85         typedef u64 natural_width;                 85         typedef u64 natural_width;
 86         struct __packed vmcs12 {                   86         struct __packed vmcs12 {
 87                 /* According to the Intel spec     87                 /* According to the Intel spec, a VMCS region must start with
 88                  * these two user-visible fiel     88                  * these two user-visible fields */
 89                 u32 revision_id;                   89                 u32 revision_id;
 90                 u32 abort;                         90                 u32 abort;
 91                                                    91 
 92                 u32 launch_state; /* set to 0      92                 u32 launch_state; /* set to 0 by VMCLEAR, to 1 by VMLAUNCH */
 93                 u32 padding[7]; /* room for fu     93                 u32 padding[7]; /* room for future expansion */
 94                                                    94 
 95                 u64 io_bitmap_a;                   95                 u64 io_bitmap_a;
 96                 u64 io_bitmap_b;                   96                 u64 io_bitmap_b;
 97                 u64 msr_bitmap;                    97                 u64 msr_bitmap;
 98                 u64 vm_exit_msr_store_addr;        98                 u64 vm_exit_msr_store_addr;
 99                 u64 vm_exit_msr_load_addr;         99                 u64 vm_exit_msr_load_addr;
100                 u64 vm_entry_msr_load_addr;       100                 u64 vm_entry_msr_load_addr;
101                 u64 tsc_offset;                   101                 u64 tsc_offset;
102                 u64 virtual_apic_page_addr;       102                 u64 virtual_apic_page_addr;
103                 u64 apic_access_addr;             103                 u64 apic_access_addr;
104                 u64 ept_pointer;                  104                 u64 ept_pointer;
105                 u64 guest_physical_address;       105                 u64 guest_physical_address;
106                 u64 vmcs_link_pointer;            106                 u64 vmcs_link_pointer;
107                 u64 guest_ia32_debugctl;          107                 u64 guest_ia32_debugctl;
108                 u64 guest_ia32_pat;               108                 u64 guest_ia32_pat;
109                 u64 guest_ia32_efer;              109                 u64 guest_ia32_efer;
110                 u64 guest_pdptr0;                 110                 u64 guest_pdptr0;
111                 u64 guest_pdptr1;                 111                 u64 guest_pdptr1;
112                 u64 guest_pdptr2;                 112                 u64 guest_pdptr2;
113                 u64 guest_pdptr3;                 113                 u64 guest_pdptr3;
114                 u64 host_ia32_pat;                114                 u64 host_ia32_pat;
115                 u64 host_ia32_efer;               115                 u64 host_ia32_efer;
116                 u64 padding64[8]; /* room for     116                 u64 padding64[8]; /* room for future expansion */
117                 natural_width cr0_guest_host_m    117                 natural_width cr0_guest_host_mask;
118                 natural_width cr4_guest_host_m    118                 natural_width cr4_guest_host_mask;
119                 natural_width cr0_read_shadow;    119                 natural_width cr0_read_shadow;
120                 natural_width cr4_read_shadow;    120                 natural_width cr4_read_shadow;
121                 natural_width dead_space[4]; /    121                 natural_width dead_space[4]; /* Last remnants of cr3_target_value[0-3]. */
122                 natural_width exit_qualificati    122                 natural_width exit_qualification;
123                 natural_width guest_linear_add    123                 natural_width guest_linear_address;
124                 natural_width guest_cr0;          124                 natural_width guest_cr0;
125                 natural_width guest_cr3;          125                 natural_width guest_cr3;
126                 natural_width guest_cr4;          126                 natural_width guest_cr4;
127                 natural_width guest_es_base;      127                 natural_width guest_es_base;
128                 natural_width guest_cs_base;      128                 natural_width guest_cs_base;
129                 natural_width guest_ss_base;      129                 natural_width guest_ss_base;
130                 natural_width guest_ds_base;      130                 natural_width guest_ds_base;
131                 natural_width guest_fs_base;      131                 natural_width guest_fs_base;
132                 natural_width guest_gs_base;      132                 natural_width guest_gs_base;
133                 natural_width guest_ldtr_base;    133                 natural_width guest_ldtr_base;
134                 natural_width guest_tr_base;      134                 natural_width guest_tr_base;
135                 natural_width guest_gdtr_base;    135                 natural_width guest_gdtr_base;
136                 natural_width guest_idtr_base;    136                 natural_width guest_idtr_base;
137                 natural_width guest_dr7;          137                 natural_width guest_dr7;
138                 natural_width guest_rsp;          138                 natural_width guest_rsp;
139                 natural_width guest_rip;          139                 natural_width guest_rip;
140                 natural_width guest_rflags;       140                 natural_width guest_rflags;
141                 natural_width guest_pending_db    141                 natural_width guest_pending_dbg_exceptions;
142                 natural_width guest_sysenter_e    142                 natural_width guest_sysenter_esp;
143                 natural_width guest_sysenter_e    143                 natural_width guest_sysenter_eip;
144                 natural_width host_cr0;           144                 natural_width host_cr0;
145                 natural_width host_cr3;           145                 natural_width host_cr3;
146                 natural_width host_cr4;           146                 natural_width host_cr4;
147                 natural_width host_fs_base;       147                 natural_width host_fs_base;
148                 natural_width host_gs_base;       148                 natural_width host_gs_base;
149                 natural_width host_tr_base;       149                 natural_width host_tr_base;
150                 natural_width host_gdtr_base;     150                 natural_width host_gdtr_base;
151                 natural_width host_idtr_base;     151                 natural_width host_idtr_base;
152                 natural_width host_ia32_sysent    152                 natural_width host_ia32_sysenter_esp;
153                 natural_width host_ia32_sysent    153                 natural_width host_ia32_sysenter_eip;
154                 natural_width host_rsp;           154                 natural_width host_rsp;
155                 natural_width host_rip;           155                 natural_width host_rip;
156                 natural_width paddingl[8]; /*     156                 natural_width paddingl[8]; /* room for future expansion */
157                 u32 pin_based_vm_exec_control;    157                 u32 pin_based_vm_exec_control;
158                 u32 cpu_based_vm_exec_control;    158                 u32 cpu_based_vm_exec_control;
159                 u32 exception_bitmap;             159                 u32 exception_bitmap;
160                 u32 page_fault_error_code_mask    160                 u32 page_fault_error_code_mask;
161                 u32 page_fault_error_code_matc    161                 u32 page_fault_error_code_match;
162                 u32 cr3_target_count;             162                 u32 cr3_target_count;
163                 u32 vm_exit_controls;             163                 u32 vm_exit_controls;
164                 u32 vm_exit_msr_store_count;      164                 u32 vm_exit_msr_store_count;
165                 u32 vm_exit_msr_load_count;       165                 u32 vm_exit_msr_load_count;
166                 u32 vm_entry_controls;            166                 u32 vm_entry_controls;
167                 u32 vm_entry_msr_load_count;      167                 u32 vm_entry_msr_load_count;
168                 u32 vm_entry_intr_info_field;     168                 u32 vm_entry_intr_info_field;
169                 u32 vm_entry_exception_error_c    169                 u32 vm_entry_exception_error_code;
170                 u32 vm_entry_instruction_len;     170                 u32 vm_entry_instruction_len;
171                 u32 tpr_threshold;                171                 u32 tpr_threshold;
172                 u32 secondary_vm_exec_control;    172                 u32 secondary_vm_exec_control;
173                 u32 vm_instruction_error;         173                 u32 vm_instruction_error;
174                 u32 vm_exit_reason;               174                 u32 vm_exit_reason;
175                 u32 vm_exit_intr_info;            175                 u32 vm_exit_intr_info;
176                 u32 vm_exit_intr_error_code;      176                 u32 vm_exit_intr_error_code;
177                 u32 idt_vectoring_info_field;     177                 u32 idt_vectoring_info_field;
178                 u32 idt_vectoring_error_code;     178                 u32 idt_vectoring_error_code;
179                 u32 vm_exit_instruction_len;      179                 u32 vm_exit_instruction_len;
180                 u32 vmx_instruction_info;         180                 u32 vmx_instruction_info;
181                 u32 guest_es_limit;               181                 u32 guest_es_limit;
182                 u32 guest_cs_limit;               182                 u32 guest_cs_limit;
183                 u32 guest_ss_limit;               183                 u32 guest_ss_limit;
184                 u32 guest_ds_limit;               184                 u32 guest_ds_limit;
185                 u32 guest_fs_limit;               185                 u32 guest_fs_limit;
186                 u32 guest_gs_limit;               186                 u32 guest_gs_limit;
187                 u32 guest_ldtr_limit;             187                 u32 guest_ldtr_limit;
188                 u32 guest_tr_limit;               188                 u32 guest_tr_limit;
189                 u32 guest_gdtr_limit;             189                 u32 guest_gdtr_limit;
190                 u32 guest_idtr_limit;             190                 u32 guest_idtr_limit;
191                 u32 guest_es_ar_bytes;            191                 u32 guest_es_ar_bytes;
192                 u32 guest_cs_ar_bytes;            192                 u32 guest_cs_ar_bytes;
193                 u32 guest_ss_ar_bytes;            193                 u32 guest_ss_ar_bytes;
194                 u32 guest_ds_ar_bytes;            194                 u32 guest_ds_ar_bytes;
195                 u32 guest_fs_ar_bytes;            195                 u32 guest_fs_ar_bytes;
196                 u32 guest_gs_ar_bytes;            196                 u32 guest_gs_ar_bytes;
197                 u32 guest_ldtr_ar_bytes;          197                 u32 guest_ldtr_ar_bytes;
198                 u32 guest_tr_ar_bytes;            198                 u32 guest_tr_ar_bytes;
199                 u32 guest_interruptibility_inf    199                 u32 guest_interruptibility_info;
200                 u32 guest_activity_state;         200                 u32 guest_activity_state;
201                 u32 guest_sysenter_cs;            201                 u32 guest_sysenter_cs;
202                 u32 host_ia32_sysenter_cs;        202                 u32 host_ia32_sysenter_cs;
203                 u32 padding32[8]; /* room for     203                 u32 padding32[8]; /* room for future expansion */
204                 u16 virtual_processor_id;         204                 u16 virtual_processor_id;
205                 u16 guest_es_selector;            205                 u16 guest_es_selector;
206                 u16 guest_cs_selector;            206                 u16 guest_cs_selector;
207                 u16 guest_ss_selector;            207                 u16 guest_ss_selector;
208                 u16 guest_ds_selector;            208                 u16 guest_ds_selector;
209                 u16 guest_fs_selector;            209                 u16 guest_fs_selector;
210                 u16 guest_gs_selector;            210                 u16 guest_gs_selector;
211                 u16 guest_ldtr_selector;          211                 u16 guest_ldtr_selector;
212                 u16 guest_tr_selector;            212                 u16 guest_tr_selector;
213                 u16 host_es_selector;             213                 u16 host_es_selector;
214                 u16 host_cs_selector;             214                 u16 host_cs_selector;
215                 u16 host_ss_selector;             215                 u16 host_ss_selector;
216                 u16 host_ds_selector;             216                 u16 host_ds_selector;
217                 u16 host_fs_selector;             217                 u16 host_fs_selector;
218                 u16 host_gs_selector;             218                 u16 host_gs_selector;
219                 u16 host_tr_selector;             219                 u16 host_tr_selector;
220         };                                        220         };
221                                                   221 
222                                                   222 
223 Authors                                           223 Authors
224 -------                                           224 -------
225                                                   225 
226 These patches were written by:                    226 These patches were written by:
227     - Abel Gordon, abelg <at> il.ibm.com          227     - Abel Gordon, abelg <at> il.ibm.com
228     - Nadav Har'El, nyh <at> il.ibm.com           228     - Nadav Har'El, nyh <at> il.ibm.com
229     - Orit Wasserman, oritw <at> il.ibm.com       229     - Orit Wasserman, oritw <at> il.ibm.com
230     - Ben-Ami Yassor, benami <at> il.ibm.com      230     - Ben-Ami Yassor, benami <at> il.ibm.com
231     - Muli Ben-Yehuda, muli <at> il.ibm.com       231     - Muli Ben-Yehuda, muli <at> il.ibm.com
232                                                   232 
233 With contributions by:                            233 With contributions by:
234     - Anthony Liguori, aliguori <at> us.ibm.co    234     - Anthony Liguori, aliguori <at> us.ibm.com
235     - Mike Day, mdday <at> us.ibm.com             235     - Mike Day, mdday <at> us.ibm.com
236     - Michael Factor, factor <at> il.ibm.com      236     - Michael Factor, factor <at> il.ibm.com
237     - Zvi Dubitzky, dubi <at> il.ibm.com          237     - Zvi Dubitzky, dubi <at> il.ibm.com
238                                                   238 
239 And valuable reviews by:                          239 And valuable reviews by:
240     - Avi Kivity, avi <at> redhat.com             240     - Avi Kivity, avi <at> redhat.com
241     - Gleb Natapov, gleb <at> redhat.com          241     - Gleb Natapov, gleb <at> redhat.com
242     - Marcelo Tosatti, mtosatti <at> redhat.co    242     - Marcelo Tosatti, mtosatti <at> redhat.com
243     - Kevin Tian, kevin.tian <at> intel.com       243     - Kevin Tian, kevin.tian <at> intel.com
244     - and others.                                 244     - and others.
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php