1 L1TF - L1 Terminal Fault 1 L1TF - L1 Terminal Fault 2 ======================== 2 ======================== 3 3 4 L1 Terminal Fault is a hardware vulnerability 4 L1 Terminal Fault is a hardware vulnerability which allows unprivileged 5 speculative access to data which is available 5 speculative access to data which is available in the Level 1 Data Cache 6 when the page table entry controlling the virt 6 when the page table entry controlling the virtual address, which is used 7 for the access, has the Present bit cleared or 7 for the access, has the Present bit cleared or other reserved bits set. 8 8 9 Affected processors 9 Affected processors 10 ------------------- 10 ------------------- 11 11 12 This vulnerability affects a wide range of Int 12 This vulnerability affects a wide range of Intel processors. The 13 vulnerability is not present on: 13 vulnerability is not present on: 14 14 15 - Processors from AMD, Centaur and other no 15 - Processors from AMD, Centaur and other non Intel vendors 16 16 17 - Older processor models, where the CPU fam 17 - Older processor models, where the CPU family is < 6 18 18 19 - A range of Intel ATOM processors (Cedarvi 19 - A range of Intel ATOM processors (Cedarview, Cloverview, Lincroft, 20 Penwell, Pineview, Silvermont, Airmont, M 20 Penwell, Pineview, Silvermont, Airmont, Merrifield) 21 21 22 - The Intel XEON PHI family 22 - The Intel XEON PHI family 23 23 24 - Intel processors which have the ARCH_CAP_ 24 - Intel processors which have the ARCH_CAP_RDCL_NO bit set in the 25 IA32_ARCH_CAPABILITIES MSR. If the bit is 25 IA32_ARCH_CAPABILITIES MSR. If the bit is set the CPU is not affected 26 by the Meltdown vulnerability either. The 26 by the Meltdown vulnerability either. These CPUs should become 27 available by end of 2018. 27 available by end of 2018. 28 28 29 Whether a processor is affected or not can be 29 Whether a processor is affected or not can be read out from the L1TF 30 vulnerability file in sysfs. See :ref:`l1tf_sy 30 vulnerability file in sysfs. See :ref:`l1tf_sys_info`. 31 31 32 Related CVEs 32 Related CVEs 33 ------------ 33 ------------ 34 34 35 The following CVE entries are related to the L 35 The following CVE entries are related to the L1TF vulnerability: 36 36 37 ============= ================= ========= 37 ============= ================= ============================== 38 CVE-2018-3615 L1 Terminal Fault SGX relat 38 CVE-2018-3615 L1 Terminal Fault SGX related aspects 39 CVE-2018-3620 L1 Terminal Fault OS, SMM r 39 CVE-2018-3620 L1 Terminal Fault OS, SMM related aspects 40 CVE-2018-3646 L1 Terminal Fault Virtualiz 40 CVE-2018-3646 L1 Terminal Fault Virtualization related aspects 41 ============= ================= ========= 41 ============= ================= ============================== 42 42 43 Problem 43 Problem 44 ------- 44 ------- 45 45 46 If an instruction accesses a virtual address f 46 If an instruction accesses a virtual address for which the relevant page 47 table entry (PTE) has the Present bit cleared 47 table entry (PTE) has the Present bit cleared or other reserved bits set, 48 then speculative execution ignores the invalid 48 then speculative execution ignores the invalid PTE and loads the referenced 49 data if it is present in the Level 1 Data Cach 49 data if it is present in the Level 1 Data Cache, as if the page referenced 50 by the address bits in the PTE was still prese 50 by the address bits in the PTE was still present and accessible. 51 51 52 While this is a purely speculative mechanism a 52 While this is a purely speculative mechanism and the instruction will raise 53 a page fault when it is retired eventually, th 53 a page fault when it is retired eventually, the pure act of loading the 54 data and making it available to other speculat 54 data and making it available to other speculative instructions opens up the 55 opportunity for side channel attacks to unpriv 55 opportunity for side channel attacks to unprivileged malicious code, 56 similar to the Meltdown attack. 56 similar to the Meltdown attack. 57 57 58 While Meltdown breaks the user space to kernel 58 While Meltdown breaks the user space to kernel space protection, L1TF 59 allows to attack any physical memory address i 59 allows to attack any physical memory address in the system and the attack 60 works across all protection domains. It allows 60 works across all protection domains. It allows an attack of SGX and also 61 works from inside virtual machines because the 61 works from inside virtual machines because the speculation bypasses the 62 extended page table (EPT) protection mechanism 62 extended page table (EPT) protection mechanism. 63 63 64 64 65 Attack scenarios 65 Attack scenarios 66 ---------------- 66 ---------------- 67 67 68 1. Malicious user space 68 1. Malicious user space 69 ^^^^^^^^^^^^^^^^^^^^^^^ 69 ^^^^^^^^^^^^^^^^^^^^^^^ 70 70 71 Operating Systems store arbitrary informati 71 Operating Systems store arbitrary information in the address bits of a 72 PTE which is marked non present. This allow 72 PTE which is marked non present. This allows a malicious user space 73 application to attack the physical memory t 73 application to attack the physical memory to which these PTEs resolve. 74 In some cases user-space can maliciously in 74 In some cases user-space can maliciously influence the information 75 encoded in the address bits of the PTE, thu 75 encoded in the address bits of the PTE, thus making attacks more 76 deterministic and more practical. 76 deterministic and more practical. 77 77 78 The Linux kernel contains a mitigation for 78 The Linux kernel contains a mitigation for this attack vector, PTE 79 inversion, which is permanently enabled and 79 inversion, which is permanently enabled and has no performance 80 impact. The kernel ensures that the address 80 impact. The kernel ensures that the address bits of PTEs, which are not 81 marked present, never point to cacheable ph 81 marked present, never point to cacheable physical memory space. 82 82 83 A system with an up to date kernel is prote 83 A system with an up to date kernel is protected against attacks from 84 malicious user space applications. 84 malicious user space applications. 85 85 86 2. Malicious guest in a virtual machine 86 2. Malicious guest in a virtual machine 87 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 87 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 88 88 89 The fact that L1TF breaks all domain protec 89 The fact that L1TF breaks all domain protections allows malicious guest 90 OSes, which can control the PTEs directly, 90 OSes, which can control the PTEs directly, and malicious guest user 91 space applications, which run on an unprote 91 space applications, which run on an unprotected guest kernel lacking the 92 PTE inversion mitigation for L1TF, to attac 92 PTE inversion mitigation for L1TF, to attack physical host memory. 93 93 94 A special aspect of L1TF in the context of 94 A special aspect of L1TF in the context of virtualization is symmetric 95 multi threading (SMT). The Intel implementa 95 multi threading (SMT). The Intel implementation of SMT is called 96 HyperThreading. The fact that Hyperthreads 96 HyperThreading. The fact that Hyperthreads on the affected processors 97 share the L1 Data Cache (L1D) is important 97 share the L1 Data Cache (L1D) is important for this. As the flaw allows 98 only to attack data which is present in L1D 98 only to attack data which is present in L1D, a malicious guest running 99 on one Hyperthread can attack the data whic 99 on one Hyperthread can attack the data which is brought into the L1D by 100 the context which runs on the sibling Hyper 100 the context which runs on the sibling Hyperthread of the same physical 101 core. This context can be host OS, host use 101 core. This context can be host OS, host user space or a different guest. 102 102 103 If the processor does not support Extended 103 If the processor does not support Extended Page Tables, the attack is 104 only possible, when the hypervisor does not 104 only possible, when the hypervisor does not sanitize the content of the 105 effective (shadow) page tables. 105 effective (shadow) page tables. 106 106 107 While solutions exist to mitigate these att 107 While solutions exist to mitigate these attack vectors fully, these 108 mitigations are not enabled by default in t 108 mitigations are not enabled by default in the Linux kernel because they 109 can affect performance significantly. The k 109 can affect performance significantly. The kernel provides several 110 mechanisms which can be utilized to address 110 mechanisms which can be utilized to address the problem depending on the 111 deployment scenario. The mitigations, their 111 deployment scenario. The mitigations, their protection scope and impact 112 are described in the next sections. 112 are described in the next sections. 113 113 114 The default mitigations and the rationale f 114 The default mitigations and the rationale for choosing them are explained 115 at the end of this document. See :ref:`defa 115 at the end of this document. See :ref:`default_mitigations`. 116 116 117 .. _l1tf_sys_info: 117 .. _l1tf_sys_info: 118 118 119 L1TF system information 119 L1TF system information 120 ----------------------- 120 ----------------------- 121 121 122 The Linux kernel provides a sysfs interface to 122 The Linux kernel provides a sysfs interface to enumerate the current L1TF 123 status of the system: whether the system is vu 123 status of the system: whether the system is vulnerable, and which 124 mitigations are active. The relevant sysfs fil 124 mitigations are active. The relevant sysfs file is: 125 125 126 /sys/devices/system/cpu/vulnerabilities/l1tf 126 /sys/devices/system/cpu/vulnerabilities/l1tf 127 127 128 The possible values in this file are: 128 The possible values in this file are: 129 129 130 =========================== ============== 130 =========================== =============================== 131 'Not affected' The processor 131 'Not affected' The processor is not vulnerable 132 'Mitigation: PTE Inversion' The host prote 132 'Mitigation: PTE Inversion' The host protection is active 133 =========================== ============== 133 =========================== =============================== 134 134 135 If KVM/VMX is enabled and the processor is vul 135 If KVM/VMX is enabled and the processor is vulnerable then the following 136 information is appended to the 'Mitigation: PT 136 information is appended to the 'Mitigation: PTE Inversion' part: 137 137 138 - SMT status: 138 - SMT status: 139 139 140 ===================== ================ 140 ===================== ================ 141 'VMX: SMT vulnerable' SMT is enabled 141 'VMX: SMT vulnerable' SMT is enabled 142 'VMX: SMT disabled' SMT is disabled 142 'VMX: SMT disabled' SMT is disabled 143 ===================== ================ 143 ===================== ================ 144 144 145 - L1D Flush mode: 145 - L1D Flush mode: 146 146 147 ================================ ======== 147 ================================ ==================================== 148 'L1D vulnerable' L1D flus 148 'L1D vulnerable' L1D flushing is disabled 149 149 150 'L1D conditional cache flushes' L1D flus 150 'L1D conditional cache flushes' L1D flush is conditionally enabled 151 151 152 'L1D cache flushes' L1D flus 152 'L1D cache flushes' L1D flush is unconditionally enabled 153 ================================ ======== 153 ================================ ==================================== 154 154 155 The resulting grade of protection is discussed 155 The resulting grade of protection is discussed in the following sections. 156 156 157 157 158 Host mitigation mechanism 158 Host mitigation mechanism 159 ------------------------- 159 ------------------------- 160 160 161 The kernel is unconditionally protected agains 161 The kernel is unconditionally protected against L1TF attacks from malicious 162 user space running on the host. 162 user space running on the host. 163 163 164 164 165 Guest mitigation mechanisms 165 Guest mitigation mechanisms 166 --------------------------- 166 --------------------------- 167 167 168 .. _l1d_flush: 168 .. _l1d_flush: 169 169 170 1. L1D flush on VMENTER 170 1. L1D flush on VMENTER 171 ^^^^^^^^^^^^^^^^^^^^^^^ 171 ^^^^^^^^^^^^^^^^^^^^^^^ 172 172 173 To make sure that a guest cannot attack dat 173 To make sure that a guest cannot attack data which is present in the L1D 174 the hypervisor flushes the L1D before enter 174 the hypervisor flushes the L1D before entering the guest. 175 175 176 Flushing the L1D evicts not only the data w 176 Flushing the L1D evicts not only the data which should not be accessed 177 by a potentially malicious guest, it also f 177 by a potentially malicious guest, it also flushes the guest 178 data. Flushing the L1D has a performance im 178 data. Flushing the L1D has a performance impact as the processor has to 179 bring the flushed guest data back into the 179 bring the flushed guest data back into the L1D. Depending on the 180 frequency of VMEXIT/VMENTER and the type of 180 frequency of VMEXIT/VMENTER and the type of computations in the guest 181 performance degradation in the range of 1% 181 performance degradation in the range of 1% to 50% has been observed. For 182 scenarios where guest VMEXIT/VMENTER are ra 182 scenarios where guest VMEXIT/VMENTER are rare the performance impact is 183 minimal. Virtio and mechanisms like posted 183 minimal. Virtio and mechanisms like posted interrupts are designed to 184 confine the VMEXITs to a bare minimum, but 184 confine the VMEXITs to a bare minimum, but specific configurations and 185 application scenarios might still suffer fr 185 application scenarios might still suffer from a high VMEXIT rate. 186 186 187 The kernel provides two L1D flush modes: 187 The kernel provides two L1D flush modes: 188 - conditional ('cond') 188 - conditional ('cond') 189 - unconditional ('always') 189 - unconditional ('always') 190 190 191 The conditional mode avoids L1D flushing af 191 The conditional mode avoids L1D flushing after VMEXITs which execute 192 only audited code paths before the correspo 192 only audited code paths before the corresponding VMENTER. These code 193 paths have been verified that they cannot e 193 paths have been verified that they cannot expose secrets or other 194 interesting data to an attacker, but they c 194 interesting data to an attacker, but they can leak information about the 195 address space layout of the hypervisor. 195 address space layout of the hypervisor. 196 196 197 Unconditional mode flushes L1D on all VMENT 197 Unconditional mode flushes L1D on all VMENTER invocations and provides 198 maximum protection. It has a higher overhea 198 maximum protection. It has a higher overhead than the conditional 199 mode. The overhead cannot be quantified cor 199 mode. The overhead cannot be quantified correctly as it depends on the 200 workload scenario and the resulting number 200 workload scenario and the resulting number of VMEXITs. 201 201 202 The general recommendation is to enable L1D 202 The general recommendation is to enable L1D flush on VMENTER. The kernel 203 defaults to conditional mode on affected pr 203 defaults to conditional mode on affected processors. 204 204 205 **Note**, that L1D flush does not prevent t 205 **Note**, that L1D flush does not prevent the SMT problem because the 206 sibling thread will also bring back its dat 206 sibling thread will also bring back its data into the L1D which makes it 207 attackable again. 207 attackable again. 208 208 209 L1D flush can be controlled by the administ 209 L1D flush can be controlled by the administrator via the kernel command 210 line and sysfs control files. See :ref:`mit 210 line and sysfs control files. See :ref:`mitigation_control_command_line` 211 and :ref:`mitigation_control_kvm`. 211 and :ref:`mitigation_control_kvm`. 212 212 213 .. _guest_confinement: 213 .. _guest_confinement: 214 214 215 2. Guest VCPU confinement to dedicated physica 215 2. Guest VCPU confinement to dedicated physical cores 216 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 216 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 217 217 218 To address the SMT problem, it is possible 218 To address the SMT problem, it is possible to make a guest or a group of 219 guests affine to one or more physical cores 219 guests affine to one or more physical cores. The proper mechanism for 220 that is to utilize exclusive cpusets to ens 220 that is to utilize exclusive cpusets to ensure that no other guest or 221 host tasks can run on these cores. 221 host tasks can run on these cores. 222 222 223 If only a single guest or related guests ru 223 If only a single guest or related guests run on sibling SMT threads on 224 the same physical core then they can only a 224 the same physical core then they can only attack their own memory and 225 restricted parts of the host memory. 225 restricted parts of the host memory. 226 226 227 Host memory is attackable, when one of the 227 Host memory is attackable, when one of the sibling SMT threads runs in 228 host OS (hypervisor) context and the other 228 host OS (hypervisor) context and the other in guest context. The amount 229 of valuable information from the host OS co 229 of valuable information from the host OS context depends on the context 230 which the host OS executes, i.e. interrupts 230 which the host OS executes, i.e. interrupts, soft interrupts and kernel 231 threads. The amount of valuable data from t 231 threads. The amount of valuable data from these contexts cannot be 232 declared as non-interesting for an attacker 232 declared as non-interesting for an attacker without deep inspection of 233 the code. 233 the code. 234 234 235 **Note**, that assigning guests to a fixed 235 **Note**, that assigning guests to a fixed set of physical cores affects 236 the ability of the scheduler to do load bal 236 the ability of the scheduler to do load balancing and might have 237 negative effects on CPU utilization dependi 237 negative effects on CPU utilization depending on the hosting 238 scenario. Disabling SMT might be a viable a 238 scenario. Disabling SMT might be a viable alternative for particular 239 scenarios. 239 scenarios. 240 240 241 For further information about confining gue 241 For further information about confining guests to a single or to a group 242 of cores consult the cpusets documentation: 242 of cores consult the cpusets documentation: 243 243 244 https://www.kernel.org/doc/Documentation/ad !! 244 https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt 245 245 246 .. _interrupt_isolation: 246 .. _interrupt_isolation: 247 247 248 3. Interrupt affinity 248 3. Interrupt affinity 249 ^^^^^^^^^^^^^^^^^^^^^ 249 ^^^^^^^^^^^^^^^^^^^^^ 250 250 251 Interrupts can be made affine to logical CP 251 Interrupts can be made affine to logical CPUs. This is not universally 252 true because there are types of interrupts 252 true because there are types of interrupts which are truly per CPU 253 interrupts, e.g. the local timer interrupt. 253 interrupts, e.g. the local timer interrupt. Aside of that multi queue 254 devices affine their interrupts to single C 254 devices affine their interrupts to single CPUs or groups of CPUs per 255 queue without allowing the administrator to 255 queue without allowing the administrator to control the affinities. 256 256 257 Moving the interrupts, which can be affinit 257 Moving the interrupts, which can be affinity controlled, away from CPUs 258 which run untrusted guests, reduces the att 258 which run untrusted guests, reduces the attack vector space. 259 259 260 Whether the interrupts with are affine to C 260 Whether the interrupts with are affine to CPUs, which run untrusted 261 guests, provide interesting data for an att 261 guests, provide interesting data for an attacker depends on the system 262 configuration and the scenarios which run o 262 configuration and the scenarios which run on the system. While for some 263 of the interrupts it can be assumed that th 263 of the interrupts it can be assumed that they won't expose interesting 264 information beyond exposing hints about the 264 information beyond exposing hints about the host OS memory layout, there 265 is no way to make general assumptions. 265 is no way to make general assumptions. 266 266 267 Interrupt affinity can be controlled by the 267 Interrupt affinity can be controlled by the administrator via the 268 /proc/irq/$NR/smp_affinity[_list] files. Li 268 /proc/irq/$NR/smp_affinity[_list] files. Limited documentation is 269 available at: 269 available at: 270 270 271 https://www.kernel.org/doc/Documentation/co !! 271 https://www.kernel.org/doc/Documentation/IRQ-affinity.txt 272 272 273 .. _smt_control: 273 .. _smt_control: 274 274 275 4. SMT control 275 4. SMT control 276 ^^^^^^^^^^^^^^ 276 ^^^^^^^^^^^^^^ 277 277 278 To prevent the SMT issues of L1TF it might 278 To prevent the SMT issues of L1TF it might be necessary to disable SMT 279 completely. Disabling SMT can have a signif 279 completely. Disabling SMT can have a significant performance impact, but 280 the impact depends on the hosting scenario 280 the impact depends on the hosting scenario and the type of workloads. 281 The impact of disabling SMT needs also to b 281 The impact of disabling SMT needs also to be weighted against the impact 282 of other mitigation solutions like confinin 282 of other mitigation solutions like confining guests to dedicated cores. 283 283 284 The kernel provides a sysfs interface to re 284 The kernel provides a sysfs interface to retrieve the status of SMT and 285 to control it. It also provides a kernel co 285 to control it. It also provides a kernel command line interface to 286 control SMT. 286 control SMT. 287 287 288 The kernel command line interface consists 288 The kernel command line interface consists of the following options: 289 289 290 =========== ============================= 290 =========== ========================================================== 291 nosmt Affects the bring up of the s 291 nosmt Affects the bring up of the secondary CPUs during boot. The 292 kernel tries to bring all pre 292 kernel tries to bring all present CPUs online during the 293 boot process. "nosmt" makes s 293 boot process. "nosmt" makes sure that from each physical 294 core only one - the so called 294 core only one - the so called primary (hyper) thread is 295 activated. Due to a design fl 295 activated. Due to a design flaw of Intel processors related 296 to Machine Check Exceptions t 296 to Machine Check Exceptions the non primary siblings have 297 to be brought up at least par 297 to be brought up at least partially and are then shut down 298 again. "nosmt" can be undone 298 again. "nosmt" can be undone via the sysfs interface. 299 299 300 nosmt=force Has the same effect as "nosmt 300 nosmt=force Has the same effect as "nosmt" but it does not allow to 301 undo the SMT disable via the 301 undo the SMT disable via the sysfs interface. 302 =========== ============================= 302 =========== ========================================================== 303 303 304 The sysfs interface provides two files: 304 The sysfs interface provides two files: 305 305 306 - /sys/devices/system/cpu/smt/control 306 - /sys/devices/system/cpu/smt/control 307 - /sys/devices/system/cpu/smt/active 307 - /sys/devices/system/cpu/smt/active 308 308 309 /sys/devices/system/cpu/smt/control: 309 /sys/devices/system/cpu/smt/control: 310 310 311 This file allows to read out the SMT cont 311 This file allows to read out the SMT control state and provides the 312 ability to disable or (re)enable SMT. The 312 ability to disable or (re)enable SMT. The possible states are: 313 313 314 ============== ====================== 314 ============== =================================================== 315 on SMT is supported by th 315 on SMT is supported by the CPU and enabled. All 316 logical CPUs can be on 316 logical CPUs can be onlined and offlined without 317 restrictions. 317 restrictions. 318 318 319 off SMT is supported by th 319 off SMT is supported by the CPU and disabled. Only 320 the so called primary 320 the so called primary SMT threads can be onlined 321 and offlined without r 321 and offlined without restrictions. An attempt to 322 online a non-primary s 322 online a non-primary sibling is rejected 323 323 324 forceoff Same as 'off' but the 324 forceoff Same as 'off' but the state cannot be controlled. 325 Attempts to write to t 325 Attempts to write to the control file are rejected. 326 326 327 notsupported The processor does not 327 notsupported The processor does not support SMT. It's therefore 328 not affected by the SM 328 not affected by the SMT implications of L1TF. 329 Attempts to write to t 329 Attempts to write to the control file are rejected. 330 ============== ====================== 330 ============== =================================================== 331 331 332 The possible states which can be written 332 The possible states which can be written into this file to control SMT 333 state are: 333 state are: 334 334 335 - on 335 - on 336 - off 336 - off 337 - forceoff 337 - forceoff 338 338 339 /sys/devices/system/cpu/smt/active: 339 /sys/devices/system/cpu/smt/active: 340 340 341 This file reports whether SMT is enabled 341 This file reports whether SMT is enabled and active, i.e. if on any 342 physical core two or more sibling threads 342 physical core two or more sibling threads are online. 343 343 344 SMT control is also possible at boot time v 344 SMT control is also possible at boot time via the l1tf kernel command 345 line parameter in combination with L1D flus 345 line parameter in combination with L1D flush control. See 346 :ref:`mitigation_control_command_line`. 346 :ref:`mitigation_control_command_line`. 347 347 348 5. Disabling EPT 348 5. Disabling EPT 349 ^^^^^^^^^^^^^^^^ 349 ^^^^^^^^^^^^^^^^ 350 350 351 Disabling EPT for virtual machines provides 351 Disabling EPT for virtual machines provides full mitigation for L1TF even 352 with SMT enabled, because the effective page 352 with SMT enabled, because the effective page tables for guests are 353 managed and sanitized by the hypervisor. Tho 353 managed and sanitized by the hypervisor. Though disabling EPT has a 354 significant performance impact especially wh 354 significant performance impact especially when the Meltdown mitigation 355 KPTI is enabled. 355 KPTI is enabled. 356 356 357 EPT can be disabled in the hypervisor via th 357 EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter. 358 358 359 There is ongoing research and development for 359 There is ongoing research and development for new mitigation mechanisms to 360 address the performance impact of disabling SM 360 address the performance impact of disabling SMT or EPT. 361 361 362 .. _mitigation_control_command_line: 362 .. _mitigation_control_command_line: 363 363 364 Mitigation control on the kernel command line 364 Mitigation control on the kernel command line 365 --------------------------------------------- 365 --------------------------------------------- 366 366 367 The kernel command line allows to control the 367 The kernel command line allows to control the L1TF mitigations at boot 368 time with the option "l1tf=". The valid argume 368 time with the option "l1tf=". The valid arguments for this option are: 369 369 370 ============ ============================== 370 ============ ============================================================= 371 full Provides all available mitigat 371 full Provides all available mitigations for the L1TF 372 vulnerability. Disables SMT an 372 vulnerability. Disables SMT and enables all mitigations in 373 the hypervisors, i.e. uncondit 373 the hypervisors, i.e. unconditional L1D flushing 374 374 375 SMT control and L1D flush cont 375 SMT control and L1D flush control via the sysfs interface 376 is still possible after boot. 376 is still possible after boot. Hypervisors will issue a 377 warning when the first VM is s 377 warning when the first VM is started in a potentially 378 insecure configuration, i.e. S 378 insecure configuration, i.e. SMT enabled or L1D flush 379 disabled. 379 disabled. 380 380 381 full,force Same as 'full', but disables S 381 full,force Same as 'full', but disables SMT and L1D flush runtime 382 control. Implies the 'nosmt=fo 382 control. Implies the 'nosmt=force' command line option. 383 (i.e. sysfs control of SMT is 383 (i.e. sysfs control of SMT is disabled.) 384 384 385 flush Leaves SMT enabled and enables 385 flush Leaves SMT enabled and enables the default hypervisor 386 mitigation, i.e. conditional L 386 mitigation, i.e. conditional L1D flushing 387 387 388 SMT control and L1D flush cont 388 SMT control and L1D flush control via the sysfs interface 389 is still possible after boot. 389 is still possible after boot. Hypervisors will issue a 390 warning when the first VM is s 390 warning when the first VM is started in a potentially 391 insecure configuration, i.e. S 391 insecure configuration, i.e. SMT enabled or L1D flush 392 disabled. 392 disabled. 393 393 394 flush,nosmt Disables SMT and enables the d 394 flush,nosmt Disables SMT and enables the default hypervisor mitigation, 395 i.e. conditional L1D flushing. 395 i.e. conditional L1D flushing. 396 396 397 SMT control and L1D flush cont 397 SMT control and L1D flush control via the sysfs interface 398 is still possible after boot. 398 is still possible after boot. Hypervisors will issue a 399 warning when the first VM is s 399 warning when the first VM is started in a potentially 400 insecure configuration, i.e. S 400 insecure configuration, i.e. SMT enabled or L1D flush 401 disabled. 401 disabled. 402 402 403 flush,nowarn Same as 'flush', but hyperviso 403 flush,nowarn Same as 'flush', but hypervisors will not warn when a VM is 404 started in a potentially insec 404 started in a potentially insecure configuration. 405 405 406 off Disables hypervisor mitigation 406 off Disables hypervisor mitigations and doesn't emit any 407 warnings. 407 warnings. 408 It also drops the swap size an 408 It also drops the swap size and available RAM limit restrictions 409 on both hypervisor and bare me 409 on both hypervisor and bare metal. 410 410 411 ============ ============================== 411 ============ ============================================================= 412 412 413 The default is 'flush'. For details about L1D 413 The default is 'flush'. For details about L1D flushing see :ref:`l1d_flush`. 414 414 415 415 416 .. _mitigation_control_kvm: 416 .. _mitigation_control_kvm: 417 417 418 Mitigation control for KVM - module parameter 418 Mitigation control for KVM - module parameter 419 ---------------------------------------------- 419 ------------------------------------------------------------- 420 420 421 The KVM hypervisor mitigation mechanism, flush 421 The KVM hypervisor mitigation mechanism, flushing the L1D cache when 422 entering a guest, can be controlled with a mod 422 entering a guest, can be controlled with a module parameter. 423 423 424 The option/parameter is "kvm-intel.vmentry_l1d 424 The option/parameter is "kvm-intel.vmentry_l1d_flush=". It takes the 425 following arguments: 425 following arguments: 426 426 427 ============ ============================== 427 ============ ============================================================== 428 always L1D cache flush on every VMENT 428 always L1D cache flush on every VMENTER. 429 429 430 cond Flush L1D on VMENTER only when 430 cond Flush L1D on VMENTER only when the code between VMEXIT and 431 VMENTER can leak host memory w 431 VMENTER can leak host memory which is considered 432 interesting for an attacker. T 432 interesting for an attacker. This still can leak host memory 433 which allows e.g. to determine 433 which allows e.g. to determine the hosts address space layout. 434 434 435 never Disables the mitigation 435 never Disables the mitigation 436 ============ ============================== 436 ============ ============================================================== 437 437 438 The parameter can be provided on the kernel co 438 The parameter can be provided on the kernel command line, as a module 439 parameter when loading the modules and at runt 439 parameter when loading the modules and at runtime modified via the sysfs 440 file: 440 file: 441 441 442 /sys/module/kvm_intel/parameters/vmentry_l1d_f 442 /sys/module/kvm_intel/parameters/vmentry_l1d_flush 443 443 444 The default is 'cond'. If 'l1tf=full,force' is 444 The default is 'cond'. If 'l1tf=full,force' is given on the kernel command 445 line, then 'always' is enforced and the kvm-in 445 line, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush 446 module parameter is ignored and writes to the 446 module parameter is ignored and writes to the sysfs file are rejected. 447 447 448 .. _mitigation_selection: 448 .. _mitigation_selection: 449 449 450 Mitigation selection guide 450 Mitigation selection guide 451 -------------------------- 451 -------------------------- 452 452 453 1. No virtualization in use 453 1. No virtualization in use 454 ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 454 ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 455 455 456 The system is protected by the kernel uncon 456 The system is protected by the kernel unconditionally and no further 457 action is required. 457 action is required. 458 458 459 2. Virtualization with trusted guests 459 2. Virtualization with trusted guests 460 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 460 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 461 461 462 If the guest comes from a trusted source an 462 If the guest comes from a trusted source and the guest OS kernel is 463 guaranteed to have the L1TF mitigations in 463 guaranteed to have the L1TF mitigations in place the system is fully 464 protected against L1TF and no further actio 464 protected against L1TF and no further action is required. 465 465 466 To avoid the overhead of the default L1D fl 466 To avoid the overhead of the default L1D flushing on VMENTER the 467 administrator can disable the flushing via 467 administrator can disable the flushing via the kernel command line and 468 sysfs control files. See :ref:`mitigation_c 468 sysfs control files. See :ref:`mitigation_control_command_line` and 469 :ref:`mitigation_control_kvm`. 469 :ref:`mitigation_control_kvm`. 470 470 471 471 472 3. Virtualization with untrusted guests 472 3. Virtualization with untrusted guests 473 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 473 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 474 474 475 3.1. SMT not supported or disabled 475 3.1. SMT not supported or disabled 476 """""""""""""""""""""""""""""""""" 476 """""""""""""""""""""""""""""""""" 477 477 478 If SMT is not supported by the processor or 478 If SMT is not supported by the processor or disabled in the BIOS or by 479 the kernel, it's only required to enforce L1 479 the kernel, it's only required to enforce L1D flushing on VMENTER. 480 480 481 Conditional L1D flushing is the default beha 481 Conditional L1D flushing is the default behaviour and can be tuned. See 482 :ref:`mitigation_control_command_line` and : 482 :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`. 483 483 484 3.2. EPT not supported or disabled 484 3.2. EPT not supported or disabled 485 """""""""""""""""""""""""""""""""" 485 """""""""""""""""""""""""""""""""" 486 486 487 If EPT is not supported by the processor or 487 If EPT is not supported by the processor or disabled in the hypervisor, 488 the system is fully protected. SMT can stay 488 the system is fully protected. SMT can stay enabled and L1D flushing on 489 VMENTER is not required. 489 VMENTER is not required. 490 490 491 EPT can be disabled in the hypervisor via th 491 EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter. 492 492 493 3.3. SMT and EPT supported and active 493 3.3. SMT and EPT supported and active 494 """"""""""""""""""""""""""""""""""""" 494 """"""""""""""""""""""""""""""""""""" 495 495 496 If SMT and EPT are supported and active then 496 If SMT and EPT are supported and active then various degrees of 497 mitigations can be employed: 497 mitigations can be employed: 498 498 499 - L1D flushing on VMENTER: 499 - L1D flushing on VMENTER: 500 500 501 L1D flushing on VMENTER is the minimal pro 501 L1D flushing on VMENTER is the minimal protection requirement, but it 502 is only potent in combination with other m 502 is only potent in combination with other mitigation methods. 503 503 504 Conditional L1D flushing is the default be 504 Conditional L1D flushing is the default behaviour and can be tuned. See 505 :ref:`mitigation_control_command_line` and 505 :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`. 506 506 507 - Guest confinement: 507 - Guest confinement: 508 508 509 Confinement of guests to a single or a gro 509 Confinement of guests to a single or a group of physical cores which 510 are not running any other processes, can r 510 are not running any other processes, can reduce the attack surface 511 significantly, but interrupts, soft interr 511 significantly, but interrupts, soft interrupts and kernel threads can 512 still expose valuable data to a potential 512 still expose valuable data to a potential attacker. See 513 :ref:`guest_confinement`. 513 :ref:`guest_confinement`. 514 514 515 - Interrupt isolation: 515 - Interrupt isolation: 516 516 517 Isolating the guest CPUs from interrupts c 517 Isolating the guest CPUs from interrupts can reduce the attack surface 518 further, but still allows a malicious gues 518 further, but still allows a malicious guest to explore a limited amount 519 of host physical memory. This can at least 519 of host physical memory. This can at least be used to gain knowledge 520 about the host address space layout. The i 520 about the host address space layout. The interrupts which have a fixed 521 affinity to the CPUs which run the untrust 521 affinity to the CPUs which run the untrusted guests can depending on 522 the scenario still trigger soft interrupts 522 the scenario still trigger soft interrupts and schedule kernel threads 523 which might expose valuable information. S 523 which might expose valuable information. See 524 :ref:`interrupt_isolation`. 524 :ref:`interrupt_isolation`. 525 525 526 The above three mitigation methods combined ca 526 The above three mitigation methods combined can provide protection to a 527 certain degree, but the risk of the remaining 527 certain degree, but the risk of the remaining attack surface has to be 528 carefully analyzed. For full protection the fo 528 carefully analyzed. For full protection the following methods are 529 available: 529 available: 530 530 531 - Disabling SMT: 531 - Disabling SMT: 532 532 533 Disabling SMT and enforcing the L1D flushi 533 Disabling SMT and enforcing the L1D flushing provides the maximum 534 amount of protection. This mitigation is n 534 amount of protection. This mitigation is not depending on any of the 535 above mitigation methods. 535 above mitigation methods. 536 536 537 SMT control and L1D flushing can be tuned 537 SMT control and L1D flushing can be tuned by the command line 538 parameters 'nosmt', 'l1tf', 'kvm-intel.vme 538 parameters 'nosmt', 'l1tf', 'kvm-intel.vmentry_l1d_flush' and at run 539 time with the matching sysfs control files 539 time with the matching sysfs control files. See :ref:`smt_control`, 540 :ref:`mitigation_control_command_line` and 540 :ref:`mitigation_control_command_line` and 541 :ref:`mitigation_control_kvm`. 541 :ref:`mitigation_control_kvm`. 542 542 543 - Disabling EPT: 543 - Disabling EPT: 544 544 545 Disabling EPT provides the maximum amount 545 Disabling EPT provides the maximum amount of protection as well. It is 546 not depending on any of the above mitigati 546 not depending on any of the above mitigation methods. SMT can stay 547 enabled and L1D flushing is not required, 547 enabled and L1D flushing is not required, but the performance impact is 548 significant. 548 significant. 549 549 550 EPT can be disabled in the hypervisor via 550 EPT can be disabled in the hypervisor via the 'kvm-intel.ept' 551 parameter. 551 parameter. 552 552 553 3.4. Nested virtual machines 553 3.4. Nested virtual machines 554 """""""""""""""""""""""""""" 554 """""""""""""""""""""""""""" 555 555 556 When nested virtualization is in use, three op 556 When nested virtualization is in use, three operating systems are involved: 557 the bare metal hypervisor, the nested hypervis 557 the bare metal hypervisor, the nested hypervisor and the nested virtual 558 machine. VMENTER operations from the nested h 558 machine. VMENTER operations from the nested hypervisor into the nested 559 guest will always be processed by the bare met 559 guest will always be processed by the bare metal hypervisor. If KVM is the 560 bare metal hypervisor it will: 560 bare metal hypervisor it will: 561 561 562 - Flush the L1D cache on every switch from th 562 - Flush the L1D cache on every switch from the nested hypervisor to the 563 nested virtual machine, so that the nested 563 nested virtual machine, so that the nested hypervisor's secrets are not 564 exposed to the nested virtual machine; 564 exposed to the nested virtual machine; 565 565 566 - Flush the L1D cache on every switch from th 566 - Flush the L1D cache on every switch from the nested virtual machine to 567 the nested hypervisor; this is a complex op 567 the nested hypervisor; this is a complex operation, and flushing the L1D 568 cache avoids that the bare metal hypervisor 568 cache avoids that the bare metal hypervisor's secrets are exposed to the 569 nested virtual machine; 569 nested virtual machine; 570 570 571 - Instruct the nested hypervisor to not perfo 571 - Instruct the nested hypervisor to not perform any L1D cache flush. This 572 is an optimization to avoid double L1D flus 572 is an optimization to avoid double L1D flushing. 573 573 574 574 575 .. _default_mitigations: 575 .. _default_mitigations: 576 576 577 Default mitigations 577 Default mitigations 578 ------------------- 578 ------------------- 579 579 580 The kernel default mitigations for vulnerabl 580 The kernel default mitigations for vulnerable processors are: 581 581 582 - PTE inversion to protect against malicious 582 - PTE inversion to protect against malicious user space. This is done 583 unconditionally and cannot be controlled. 583 unconditionally and cannot be controlled. The swap storage is limited 584 to ~16TB. 584 to ~16TB. 585 585 586 - L1D conditional flushing on VMENTER when E 586 - L1D conditional flushing on VMENTER when EPT is enabled for 587 a guest. 587 a guest. 588 588 589 The kernel does not by default enforce the d 589 The kernel does not by default enforce the disabling of SMT, which leaves 590 SMT systems vulnerable when running untruste 590 SMT systems vulnerable when running untrusted guests with EPT enabled. 591 591 592 The rationale for this choice is: 592 The rationale for this choice is: 593 593 594 - Force disabling SMT can break existing set 594 - Force disabling SMT can break existing setups, especially with 595 unattended updates. 595 unattended updates. 596 596 597 - If regular users run untrusted guests on t 597 - If regular users run untrusted guests on their machine, then L1TF is 598 just an add on to other malware which migh 598 just an add on to other malware which might be embedded in an untrusted 599 guest, e.g. spam-bots or attacks on the lo 599 guest, e.g. spam-bots or attacks on the local network. 600 600 601 There is no technical way to prevent a use 601 There is no technical way to prevent a user from running untrusted code 602 on their machines blindly. 602 on their machines blindly. 603 603 604 - It's technically extremely unlikely and fr 604 - It's technically extremely unlikely and from today's knowledge even 605 impossible that L1TF can be exploited via 605 impossible that L1TF can be exploited via the most popular attack 606 mechanisms like JavaScript because these m 606 mechanisms like JavaScript because these mechanisms have no way to 607 control PTEs. If this would be possible an 607 control PTEs. If this would be possible and not other mitigation would 608 be possible, then the default might be dif 608 be possible, then the default might be different. 609 609 610 - The administrators of cloud and hosting se 610 - The administrators of cloud and hosting setups have to carefully 611 analyze the risk for their scenarios and m 611 analyze the risk for their scenarios and make the appropriate 612 mitigation choices, which might even vary 612 mitigation choices, which might even vary across their deployed 613 machines and also result in other changes 613 machines and also result in other changes of their overall setup. 614 There is no way for the kernel to provide 614 There is no way for the kernel to provide a sensible default for this 615 kind of scenarios. 615 kind of scenarios.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.