1 L1TF - L1 Terminal Fault 2 ======================== 3 4 L1 Terminal Fault is a hardware vulnerability 5 speculative access to data which is available 6 when the page table entry controlling the virt 7 for the access, has the Present bit cleared or 8 9 Affected processors 10 ------------------- 11 12 This vulnerability affects a wide range of Int 13 vulnerability is not present on: 14 15 - Processors from AMD, Centaur and other no 16 17 - Older processor models, where the CPU fam 18 19 - A range of Intel ATOM processors (Cedarvi 20 Penwell, Pineview, Silvermont, Airmont, M 21 22 - The Intel XEON PHI family 23 24 - Intel processors which have the ARCH_CAP_ 25 IA32_ARCH_CAPABILITIES MSR. If the bit is 26 by the Meltdown vulnerability either. The 27 available by end of 2018. 28 29 Whether a processor is affected or not can be 30 vulnerability file in sysfs. See :ref:`l1tf_sy 31 32 Related CVEs 33 ------------ 34 35 The following CVE entries are related to the L 36 37 ============= ================= ========= 38 CVE-2018-3615 L1 Terminal Fault SGX relat 39 CVE-2018-3620 L1 Terminal Fault OS, SMM r 40 CVE-2018-3646 L1 Terminal Fault Virtualiz 41 ============= ================= ========= 42 43 Problem 44 ------- 45 46 If an instruction accesses a virtual address f 47 table entry (PTE) has the Present bit cleared 48 then speculative execution ignores the invalid 49 data if it is present in the Level 1 Data Cach 50 by the address bits in the PTE was still prese 51 52 While this is a purely speculative mechanism a 53 a page fault when it is retired eventually, th 54 data and making it available to other speculat 55 opportunity for side channel attacks to unpriv 56 similar to the Meltdown attack. 57 58 While Meltdown breaks the user space to kernel 59 allows to attack any physical memory address i 60 works across all protection domains. It allows 61 works from inside virtual machines because the 62 extended page table (EPT) protection mechanism 63 64 65 Attack scenarios 66 ---------------- 67 68 1. Malicious user space 69 ^^^^^^^^^^^^^^^^^^^^^^^ 70 71 Operating Systems store arbitrary informati 72 PTE which is marked non present. This allow 73 application to attack the physical memory t 74 In some cases user-space can maliciously in 75 encoded in the address bits of the PTE, thu 76 deterministic and more practical. 77 78 The Linux kernel contains a mitigation for 79 inversion, which is permanently enabled and 80 impact. The kernel ensures that the address 81 marked present, never point to cacheable ph 82 83 A system with an up to date kernel is prote 84 malicious user space applications. 85 86 2. Malicious guest in a virtual machine 87 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 88 89 The fact that L1TF breaks all domain protec 90 OSes, which can control the PTEs directly, 91 space applications, which run on an unprote 92 PTE inversion mitigation for L1TF, to attac 93 94 A special aspect of L1TF in the context of 95 multi threading (SMT). The Intel implementa 96 HyperThreading. The fact that Hyperthreads 97 share the L1 Data Cache (L1D) is important 98 only to attack data which is present in L1D 99 on one Hyperthread can attack the data whic 100 the context which runs on the sibling Hyper 101 core. This context can be host OS, host use 102 103 If the processor does not support Extended 104 only possible, when the hypervisor does not 105 effective (shadow) page tables. 106 107 While solutions exist to mitigate these att 108 mitigations are not enabled by default in t 109 can affect performance significantly. The k 110 mechanisms which can be utilized to address 111 deployment scenario. The mitigations, their 112 are described in the next sections. 113 114 The default mitigations and the rationale f 115 at the end of this document. See :ref:`defa 116 117 .. _l1tf_sys_info: 118 119 L1TF system information 120 ----------------------- 121 122 The Linux kernel provides a sysfs interface to 123 status of the system: whether the system is vu 124 mitigations are active. The relevant sysfs fil 125 126 /sys/devices/system/cpu/vulnerabilities/l1tf 127 128 The possible values in this file are: 129 130 =========================== ============== 131 'Not affected' The processor 132 'Mitigation: PTE Inversion' The host prote 133 =========================== ============== 134 135 If KVM/VMX is enabled and the processor is vul 136 information is appended to the 'Mitigation: PT 137 138 - SMT status: 139 140 ===================== ================ 141 'VMX: SMT vulnerable' SMT is enabled 142 'VMX: SMT disabled' SMT is disabled 143 ===================== ================ 144 145 - L1D Flush mode: 146 147 ================================ ======== 148 'L1D vulnerable' L1D flus 149 150 'L1D conditional cache flushes' L1D flus 151 152 'L1D cache flushes' L1D flus 153 ================================ ======== 154 155 The resulting grade of protection is discussed 156 157 158 Host mitigation mechanism 159 ------------------------- 160 161 The kernel is unconditionally protected agains 162 user space running on the host. 163 164 165 Guest mitigation mechanisms 166 --------------------------- 167 168 .. _l1d_flush: 169 170 1. L1D flush on VMENTER 171 ^^^^^^^^^^^^^^^^^^^^^^^ 172 173 To make sure that a guest cannot attack dat 174 the hypervisor flushes the L1D before enter 175 176 Flushing the L1D evicts not only the data w 177 by a potentially malicious guest, it also f 178 data. Flushing the L1D has a performance im 179 bring the flushed guest data back into the 180 frequency of VMEXIT/VMENTER and the type of 181 performance degradation in the range of 1% 182 scenarios where guest VMEXIT/VMENTER are ra 183 minimal. Virtio and mechanisms like posted 184 confine the VMEXITs to a bare minimum, but 185 application scenarios might still suffer fr 186 187 The kernel provides two L1D flush modes: 188 - conditional ('cond') 189 - unconditional ('always') 190 191 The conditional mode avoids L1D flushing af 192 only audited code paths before the correspo 193 paths have been verified that they cannot e 194 interesting data to an attacker, but they c 195 address space layout of the hypervisor. 196 197 Unconditional mode flushes L1D on all VMENT 198 maximum protection. It has a higher overhea 199 mode. The overhead cannot be quantified cor 200 workload scenario and the resulting number 201 202 The general recommendation is to enable L1D 203 defaults to conditional mode on affected pr 204 205 **Note**, that L1D flush does not prevent t 206 sibling thread will also bring back its dat 207 attackable again. 208 209 L1D flush can be controlled by the administ 210 line and sysfs control files. See :ref:`mit 211 and :ref:`mitigation_control_kvm`. 212 213 .. _guest_confinement: 214 215 2. Guest VCPU confinement to dedicated physica 216 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 217 218 To address the SMT problem, it is possible 219 guests affine to one or more physical cores 220 that is to utilize exclusive cpusets to ens 221 host tasks can run on these cores. 222 223 If only a single guest or related guests ru 224 the same physical core then they can only a 225 restricted parts of the host memory. 226 227 Host memory is attackable, when one of the 228 host OS (hypervisor) context and the other 229 of valuable information from the host OS co 230 which the host OS executes, i.e. interrupts 231 threads. The amount of valuable data from t 232 declared as non-interesting for an attacker 233 the code. 234 235 **Note**, that assigning guests to a fixed 236 the ability of the scheduler to do load bal 237 negative effects on CPU utilization dependi 238 scenario. Disabling SMT might be a viable a 239 scenarios. 240 241 For further information about confining gue 242 of cores consult the cpusets documentation: 243 244 https://www.kernel.org/doc/Documentation/ad 245 246 .. _interrupt_isolation: 247 248 3. Interrupt affinity 249 ^^^^^^^^^^^^^^^^^^^^^ 250 251 Interrupts can be made affine to logical CP 252 true because there are types of interrupts 253 interrupts, e.g. the local timer interrupt. 254 devices affine their interrupts to single C 255 queue without allowing the administrator to 256 257 Moving the interrupts, which can be affinit 258 which run untrusted guests, reduces the att 259 260 Whether the interrupts with are affine to C 261 guests, provide interesting data for an att 262 configuration and the scenarios which run o 263 of the interrupts it can be assumed that th 264 information beyond exposing hints about the 265 is no way to make general assumptions. 266 267 Interrupt affinity can be controlled by the 268 /proc/irq/$NR/smp_affinity[_list] files. Li 269 available at: 270 271 https://www.kernel.org/doc/Documentation/co 272 273 .. _smt_control: 274 275 4. SMT control 276 ^^^^^^^^^^^^^^ 277 278 To prevent the SMT issues of L1TF it might 279 completely. Disabling SMT can have a signif 280 the impact depends on the hosting scenario 281 The impact of disabling SMT needs also to b 282 of other mitigation solutions like confinin 283 284 The kernel provides a sysfs interface to re 285 to control it. It also provides a kernel co 286 control SMT. 287 288 The kernel command line interface consists 289 290 =========== ============================= 291 nosmt Affects the bring up of the s 292 kernel tries to bring all pre 293 boot process. "nosmt" makes s 294 core only one - the so called 295 activated. Due to a design fl 296 to Machine Check Exceptions t 297 to be brought up at least par 298 again. "nosmt" can be undone 299 300 nosmt=force Has the same effect as "nosmt 301 undo the SMT disable via the 302 =========== ============================= 303 304 The sysfs interface provides two files: 305 306 - /sys/devices/system/cpu/smt/control 307 - /sys/devices/system/cpu/smt/active 308 309 /sys/devices/system/cpu/smt/control: 310 311 This file allows to read out the SMT cont 312 ability to disable or (re)enable SMT. The 313 314 ============== ====================== 315 on SMT is supported by th 316 logical CPUs can be on 317 restrictions. 318 319 off SMT is supported by th 320 the so called primary 321 and offlined without r 322 online a non-primary s 323 324 forceoff Same as 'off' but the 325 Attempts to write to t 326 327 notsupported The processor does not 328 not affected by the SM 329 Attempts to write to t 330 ============== ====================== 331 332 The possible states which can be written 333 state are: 334 335 - on 336 - off 337 - forceoff 338 339 /sys/devices/system/cpu/smt/active: 340 341 This file reports whether SMT is enabled 342 physical core two or more sibling threads 343 344 SMT control is also possible at boot time v 345 line parameter in combination with L1D flus 346 :ref:`mitigation_control_command_line`. 347 348 5. Disabling EPT 349 ^^^^^^^^^^^^^^^^ 350 351 Disabling EPT for virtual machines provides 352 with SMT enabled, because the effective page 353 managed and sanitized by the hypervisor. Tho 354 significant performance impact especially wh 355 KPTI is enabled. 356 357 EPT can be disabled in the hypervisor via th 358 359 There is ongoing research and development for 360 address the performance impact of disabling SM 361 362 .. _mitigation_control_command_line: 363 364 Mitigation control on the kernel command line 365 --------------------------------------------- 366 367 The kernel command line allows to control the 368 time with the option "l1tf=". The valid argume 369 370 ============ ============================== 371 full Provides all available mitigat 372 vulnerability. Disables SMT an 373 the hypervisors, i.e. uncondit 374 375 SMT control and L1D flush cont 376 is still possible after boot. 377 warning when the first VM is s 378 insecure configuration, i.e. S 379 disabled. 380 381 full,force Same as 'full', but disables S 382 control. Implies the 'nosmt=fo 383 (i.e. sysfs control of SMT is 384 385 flush Leaves SMT enabled and enables 386 mitigation, i.e. conditional L 387 388 SMT control and L1D flush cont 389 is still possible after boot. 390 warning when the first VM is s 391 insecure configuration, i.e. S 392 disabled. 393 394 flush,nosmt Disables SMT and enables the d 395 i.e. conditional L1D flushing. 396 397 SMT control and L1D flush cont 398 is still possible after boot. 399 warning when the first VM is s 400 insecure configuration, i.e. S 401 disabled. 402 403 flush,nowarn Same as 'flush', but hyperviso 404 started in a potentially insec 405 406 off Disables hypervisor mitigation 407 warnings. 408 It also drops the swap size an 409 on both hypervisor and bare me 410 411 ============ ============================== 412 413 The default is 'flush'. For details about L1D 414 415 416 .. _mitigation_control_kvm: 417 418 Mitigation control for KVM - module parameter 419 ---------------------------------------------- 420 421 The KVM hypervisor mitigation mechanism, flush 422 entering a guest, can be controlled with a mod 423 424 The option/parameter is "kvm-intel.vmentry_l1d 425 following arguments: 426 427 ============ ============================== 428 always L1D cache flush on every VMENT 429 430 cond Flush L1D on VMENTER only when 431 VMENTER can leak host memory w 432 interesting for an attacker. T 433 which allows e.g. to determine 434 435 never Disables the mitigation 436 ============ ============================== 437 438 The parameter can be provided on the kernel co 439 parameter when loading the modules and at runt 440 file: 441 442 /sys/module/kvm_intel/parameters/vmentry_l1d_f 443 444 The default is 'cond'. If 'l1tf=full,force' is 445 line, then 'always' is enforced and the kvm-in 446 module parameter is ignored and writes to the 447 448 .. _mitigation_selection: 449 450 Mitigation selection guide 451 -------------------------- 452 453 1. No virtualization in use 454 ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 455 456 The system is protected by the kernel uncon 457 action is required. 458 459 2. Virtualization with trusted guests 460 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 461 462 If the guest comes from a trusted source an 463 guaranteed to have the L1TF mitigations in 464 protected against L1TF and no further actio 465 466 To avoid the overhead of the default L1D fl 467 administrator can disable the flushing via 468 sysfs control files. See :ref:`mitigation_c 469 :ref:`mitigation_control_kvm`. 470 471 472 3. Virtualization with untrusted guests 473 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 474 475 3.1. SMT not supported or disabled 476 """""""""""""""""""""""""""""""""" 477 478 If SMT is not supported by the processor or 479 the kernel, it's only required to enforce L1 480 481 Conditional L1D flushing is the default beha 482 :ref:`mitigation_control_command_line` and : 483 484 3.2. EPT not supported or disabled 485 """""""""""""""""""""""""""""""""" 486 487 If EPT is not supported by the processor or 488 the system is fully protected. SMT can stay 489 VMENTER is not required. 490 491 EPT can be disabled in the hypervisor via th 492 493 3.3. SMT and EPT supported and active 494 """"""""""""""""""""""""""""""""""""" 495 496 If SMT and EPT are supported and active then 497 mitigations can be employed: 498 499 - L1D flushing on VMENTER: 500 501 L1D flushing on VMENTER is the minimal pro 502 is only potent in combination with other m 503 504 Conditional L1D flushing is the default be 505 :ref:`mitigation_control_command_line` and 506 507 - Guest confinement: 508 509 Confinement of guests to a single or a gro 510 are not running any other processes, can r 511 significantly, but interrupts, soft interr 512 still expose valuable data to a potential 513 :ref:`guest_confinement`. 514 515 - Interrupt isolation: 516 517 Isolating the guest CPUs from interrupts c 518 further, but still allows a malicious gues 519 of host physical memory. This can at least 520 about the host address space layout. The i 521 affinity to the CPUs which run the untrust 522 the scenario still trigger soft interrupts 523 which might expose valuable information. S 524 :ref:`interrupt_isolation`. 525 526 The above three mitigation methods combined ca 527 certain degree, but the risk of the remaining 528 carefully analyzed. For full protection the fo 529 available: 530 531 - Disabling SMT: 532 533 Disabling SMT and enforcing the L1D flushi 534 amount of protection. This mitigation is n 535 above mitigation methods. 536 537 SMT control and L1D flushing can be tuned 538 parameters 'nosmt', 'l1tf', 'kvm-intel.vme 539 time with the matching sysfs control files 540 :ref:`mitigation_control_command_line` and 541 :ref:`mitigation_control_kvm`. 542 543 - Disabling EPT: 544 545 Disabling EPT provides the maximum amount 546 not depending on any of the above mitigati 547 enabled and L1D flushing is not required, 548 significant. 549 550 EPT can be disabled in the hypervisor via 551 parameter. 552 553 3.4. Nested virtual machines 554 """""""""""""""""""""""""""" 555 556 When nested virtualization is in use, three op 557 the bare metal hypervisor, the nested hypervis 558 machine. VMENTER operations from the nested h 559 guest will always be processed by the bare met 560 bare metal hypervisor it will: 561 562 - Flush the L1D cache on every switch from th 563 nested virtual machine, so that the nested 564 exposed to the nested virtual machine; 565 566 - Flush the L1D cache on every switch from th 567 the nested hypervisor; this is a complex op 568 cache avoids that the bare metal hypervisor 569 nested virtual machine; 570 571 - Instruct the nested hypervisor to not perfo 572 is an optimization to avoid double L1D flus 573 574 575 .. _default_mitigations: 576 577 Default mitigations 578 ------------------- 579 580 The kernel default mitigations for vulnerabl 581 582 - PTE inversion to protect against malicious 583 unconditionally and cannot be controlled. 584 to ~16TB. 585 586 - L1D conditional flushing on VMENTER when E 587 a guest. 588 589 The kernel does not by default enforce the d 590 SMT systems vulnerable when running untruste 591 592 The rationale for this choice is: 593 594 - Force disabling SMT can break existing set 595 unattended updates. 596 597 - If regular users run untrusted guests on t 598 just an add on to other malware which migh 599 guest, e.g. spam-bots or attacks on the lo 600 601 There is no technical way to prevent a use 602 on their machines blindly. 603 604 - It's technically extremely unlikely and fr 605 impossible that L1TF can be exploited via 606 mechanisms like JavaScript because these m 607 control PTEs. If this would be possible an 608 be possible, then the default might be dif 609 610 - The administrators of cloud and hosting se 611 analyze the risk for their scenarios and m 612 mitigation choices, which might even vary 613 machines and also result in other changes 614 There is no way for the kernel to provide 615 kind of scenarios.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.