~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/admin-guide/hw-vuln/l1tf.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/admin-guide/hw-vuln/l1tf.rst (Version linux-6.11.5) and /Documentation/admin-guide/hw-vuln/l1tf.rst (Version linux-6.3.13)


  1 L1TF - L1 Terminal Fault                            1 L1TF - L1 Terminal Fault
  2 ========================                            2 ========================
  3                                                     3 
  4 L1 Terminal Fault is a hardware vulnerability       4 L1 Terminal Fault is a hardware vulnerability which allows unprivileged
  5 speculative access to data which is available       5 speculative access to data which is available in the Level 1 Data Cache
  6 when the page table entry controlling the virt      6 when the page table entry controlling the virtual address, which is used
  7 for the access, has the Present bit cleared or      7 for the access, has the Present bit cleared or other reserved bits set.
  8                                                     8 
  9 Affected processors                                 9 Affected processors
 10 -------------------                                10 -------------------
 11                                                    11 
 12 This vulnerability affects a wide range of Int     12 This vulnerability affects a wide range of Intel processors. The
 13 vulnerability is not present on:                   13 vulnerability is not present on:
 14                                                    14 
 15    - Processors from AMD, Centaur and other no     15    - Processors from AMD, Centaur and other non Intel vendors
 16                                                    16 
 17    - Older processor models, where the CPU fam     17    - Older processor models, where the CPU family is < 6
 18                                                    18 
 19    - A range of Intel ATOM processors (Cedarvi     19    - A range of Intel ATOM processors (Cedarview, Cloverview, Lincroft,
 20      Penwell, Pineview, Silvermont, Airmont, M     20      Penwell, Pineview, Silvermont, Airmont, Merrifield)
 21                                                    21 
 22    - The Intel XEON PHI family                     22    - The Intel XEON PHI family
 23                                                    23 
 24    - Intel processors which have the ARCH_CAP_     24    - Intel processors which have the ARCH_CAP_RDCL_NO bit set in the
 25      IA32_ARCH_CAPABILITIES MSR. If the bit is     25      IA32_ARCH_CAPABILITIES MSR. If the bit is set the CPU is not affected
 26      by the Meltdown vulnerability either. The     26      by the Meltdown vulnerability either. These CPUs should become
 27      available by end of 2018.                     27      available by end of 2018.
 28                                                    28 
 29 Whether a processor is affected or not can be      29 Whether a processor is affected or not can be read out from the L1TF
 30 vulnerability file in sysfs. See :ref:`l1tf_sy     30 vulnerability file in sysfs. See :ref:`l1tf_sys_info`.
 31                                                    31 
 32 Related CVEs                                       32 Related CVEs
 33 ------------                                       33 ------------
 34                                                    34 
 35 The following CVE entries are related to the L     35 The following CVE entries are related to the L1TF vulnerability:
 36                                                    36 
 37    =============  =================  =========     37    =============  =================  ==============================
 38    CVE-2018-3615  L1 Terminal Fault  SGX relat     38    CVE-2018-3615  L1 Terminal Fault  SGX related aspects
 39    CVE-2018-3620  L1 Terminal Fault  OS, SMM r     39    CVE-2018-3620  L1 Terminal Fault  OS, SMM related aspects
 40    CVE-2018-3646  L1 Terminal Fault  Virtualiz     40    CVE-2018-3646  L1 Terminal Fault  Virtualization related aspects
 41    =============  =================  =========     41    =============  =================  ==============================
 42                                                    42 
 43 Problem                                            43 Problem
 44 -------                                            44 -------
 45                                                    45 
 46 If an instruction accesses a virtual address f     46 If an instruction accesses a virtual address for which the relevant page
 47 table entry (PTE) has the Present bit cleared      47 table entry (PTE) has the Present bit cleared or other reserved bits set,
 48 then speculative execution ignores the invalid     48 then speculative execution ignores the invalid PTE and loads the referenced
 49 data if it is present in the Level 1 Data Cach     49 data if it is present in the Level 1 Data Cache, as if the page referenced
 50 by the address bits in the PTE was still prese     50 by the address bits in the PTE was still present and accessible.
 51                                                    51 
 52 While this is a purely speculative mechanism a     52 While this is a purely speculative mechanism and the instruction will raise
 53 a page fault when it is retired eventually, th     53 a page fault when it is retired eventually, the pure act of loading the
 54 data and making it available to other speculat     54 data and making it available to other speculative instructions opens up the
 55 opportunity for side channel attacks to unpriv     55 opportunity for side channel attacks to unprivileged malicious code,
 56 similar to the Meltdown attack.                    56 similar to the Meltdown attack.
 57                                                    57 
 58 While Meltdown breaks the user space to kernel     58 While Meltdown breaks the user space to kernel space protection, L1TF
 59 allows to attack any physical memory address i     59 allows to attack any physical memory address in the system and the attack
 60 works across all protection domains. It allows     60 works across all protection domains. It allows an attack of SGX and also
 61 works from inside virtual machines because the     61 works from inside virtual machines because the speculation bypasses the
 62 extended page table (EPT) protection mechanism     62 extended page table (EPT) protection mechanism.
 63                                                    63 
 64                                                    64 
 65 Attack scenarios                                   65 Attack scenarios
 66 ----------------                                   66 ----------------
 67                                                    67 
 68 1. Malicious user space                            68 1. Malicious user space
 69 ^^^^^^^^^^^^^^^^^^^^^^^                            69 ^^^^^^^^^^^^^^^^^^^^^^^
 70                                                    70 
 71    Operating Systems store arbitrary informati     71    Operating Systems store arbitrary information in the address bits of a
 72    PTE which is marked non present. This allow     72    PTE which is marked non present. This allows a malicious user space
 73    application to attack the physical memory t     73    application to attack the physical memory to which these PTEs resolve.
 74    In some cases user-space can maliciously in     74    In some cases user-space can maliciously influence the information
 75    encoded in the address bits of the PTE, thu     75    encoded in the address bits of the PTE, thus making attacks more
 76    deterministic and more practical.               76    deterministic and more practical.
 77                                                    77 
 78    The Linux kernel contains a mitigation for      78    The Linux kernel contains a mitigation for this attack vector, PTE
 79    inversion, which is permanently enabled and     79    inversion, which is permanently enabled and has no performance
 80    impact. The kernel ensures that the address     80    impact. The kernel ensures that the address bits of PTEs, which are not
 81    marked present, never point to cacheable ph     81    marked present, never point to cacheable physical memory space.
 82                                                    82 
 83    A system with an up to date kernel is prote     83    A system with an up to date kernel is protected against attacks from
 84    malicious user space applications.              84    malicious user space applications.
 85                                                    85 
 86 2. Malicious guest in a virtual machine            86 2. Malicious guest in a virtual machine
 87 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^            87 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 88                                                    88 
 89    The fact that L1TF breaks all domain protec     89    The fact that L1TF breaks all domain protections allows malicious guest
 90    OSes, which can control the PTEs directly,      90    OSes, which can control the PTEs directly, and malicious guest user
 91    space applications, which run on an unprote     91    space applications, which run on an unprotected guest kernel lacking the
 92    PTE inversion mitigation for L1TF, to attac     92    PTE inversion mitigation for L1TF, to attack physical host memory.
 93                                                    93 
 94    A special aspect of L1TF in the context of      94    A special aspect of L1TF in the context of virtualization is symmetric
 95    multi threading (SMT). The Intel implementa     95    multi threading (SMT). The Intel implementation of SMT is called
 96    HyperThreading. The fact that Hyperthreads      96    HyperThreading. The fact that Hyperthreads on the affected processors
 97    share the L1 Data Cache (L1D) is important      97    share the L1 Data Cache (L1D) is important for this. As the flaw allows
 98    only to attack data which is present in L1D     98    only to attack data which is present in L1D, a malicious guest running
 99    on one Hyperthread can attack the data whic     99    on one Hyperthread can attack the data which is brought into the L1D by
100    the context which runs on the sibling Hyper    100    the context which runs on the sibling Hyperthread of the same physical
101    core. This context can be host OS, host use    101    core. This context can be host OS, host user space or a different guest.
102                                                   102 
103    If the processor does not support Extended     103    If the processor does not support Extended Page Tables, the attack is
104    only possible, when the hypervisor does not    104    only possible, when the hypervisor does not sanitize the content of the
105    effective (shadow) page tables.                105    effective (shadow) page tables.
106                                                   106 
107    While solutions exist to mitigate these att    107    While solutions exist to mitigate these attack vectors fully, these
108    mitigations are not enabled by default in t    108    mitigations are not enabled by default in the Linux kernel because they
109    can affect performance significantly. The k    109    can affect performance significantly. The kernel provides several
110    mechanisms which can be utilized to address    110    mechanisms which can be utilized to address the problem depending on the
111    deployment scenario. The mitigations, their    111    deployment scenario. The mitigations, their protection scope and impact
112    are described in the next sections.            112    are described in the next sections.
113                                                   113 
114    The default mitigations and the rationale f    114    The default mitigations and the rationale for choosing them are explained
115    at the end of this document. See :ref:`defa    115    at the end of this document. See :ref:`default_mitigations`.
116                                                   116 
117 .. _l1tf_sys_info:                                117 .. _l1tf_sys_info:
118                                                   118 
119 L1TF system information                           119 L1TF system information
120 -----------------------                           120 -----------------------
121                                                   121 
122 The Linux kernel provides a sysfs interface to    122 The Linux kernel provides a sysfs interface to enumerate the current L1TF
123 status of the system: whether the system is vu    123 status of the system: whether the system is vulnerable, and which
124 mitigations are active. The relevant sysfs fil    124 mitigations are active. The relevant sysfs file is:
125                                                   125 
126 /sys/devices/system/cpu/vulnerabilities/l1tf      126 /sys/devices/system/cpu/vulnerabilities/l1tf
127                                                   127 
128 The possible values in this file are:             128 The possible values in this file are:
129                                                   129 
130   ===========================   ==============    130   ===========================   ===============================
131   'Not affected'                The processor     131   'Not affected'                The processor is not vulnerable
132   'Mitigation: PTE Inversion'   The host prote    132   'Mitigation: PTE Inversion'   The host protection is active
133   ===========================   ==============    133   ===========================   ===============================
134                                                   134 
135 If KVM/VMX is enabled and the processor is vul    135 If KVM/VMX is enabled and the processor is vulnerable then the following
136 information is appended to the 'Mitigation: PT    136 information is appended to the 'Mitigation: PTE Inversion' part:
137                                                   137 
138   - SMT status:                                   138   - SMT status:
139                                                   139 
140     =====================  ================       140     =====================  ================
141     'VMX: SMT vulnerable'  SMT is enabled         141     'VMX: SMT vulnerable'  SMT is enabled
142     'VMX: SMT disabled'    SMT is disabled        142     'VMX: SMT disabled'    SMT is disabled
143     =====================  ================       143     =====================  ================
144                                                   144 
145   - L1D Flush mode:                               145   - L1D Flush mode:
146                                                   146 
147     ================================  ========    147     ================================  ====================================
148     'L1D vulnerable'                  L1D flus    148     'L1D vulnerable'                  L1D flushing is disabled
149                                                   149 
150     'L1D conditional cache flushes'   L1D flus    150     'L1D conditional cache flushes'   L1D flush is conditionally enabled
151                                                   151 
152     'L1D cache flushes'               L1D flus    152     'L1D cache flushes'               L1D flush is unconditionally enabled
153     ================================  ========    153     ================================  ====================================
154                                                   154 
155 The resulting grade of protection is discussed    155 The resulting grade of protection is discussed in the following sections.
156                                                   156 
157                                                   157 
158 Host mitigation mechanism                         158 Host mitigation mechanism
159 -------------------------                         159 -------------------------
160                                                   160 
161 The kernel is unconditionally protected agains    161 The kernel is unconditionally protected against L1TF attacks from malicious
162 user space running on the host.                   162 user space running on the host.
163                                                   163 
164                                                   164 
165 Guest mitigation mechanisms                       165 Guest mitigation mechanisms
166 ---------------------------                       166 ---------------------------
167                                                   167 
168 .. _l1d_flush:                                    168 .. _l1d_flush:
169                                                   169 
170 1. L1D flush on VMENTER                           170 1. L1D flush on VMENTER
171 ^^^^^^^^^^^^^^^^^^^^^^^                           171 ^^^^^^^^^^^^^^^^^^^^^^^
172                                                   172 
173    To make sure that a guest cannot attack dat    173    To make sure that a guest cannot attack data which is present in the L1D
174    the hypervisor flushes the L1D before enter    174    the hypervisor flushes the L1D before entering the guest.
175                                                   175 
176    Flushing the L1D evicts not only the data w    176    Flushing the L1D evicts not only the data which should not be accessed
177    by a potentially malicious guest, it also f    177    by a potentially malicious guest, it also flushes the guest
178    data. Flushing the L1D has a performance im    178    data. Flushing the L1D has a performance impact as the processor has to
179    bring the flushed guest data back into the     179    bring the flushed guest data back into the L1D. Depending on the
180    frequency of VMEXIT/VMENTER and the type of    180    frequency of VMEXIT/VMENTER and the type of computations in the guest
181    performance degradation in the range of 1%     181    performance degradation in the range of 1% to 50% has been observed. For
182    scenarios where guest VMEXIT/VMENTER are ra    182    scenarios where guest VMEXIT/VMENTER are rare the performance impact is
183    minimal. Virtio and mechanisms like posted     183    minimal. Virtio and mechanisms like posted interrupts are designed to
184    confine the VMEXITs to a bare minimum, but     184    confine the VMEXITs to a bare minimum, but specific configurations and
185    application scenarios might still suffer fr    185    application scenarios might still suffer from a high VMEXIT rate.
186                                                   186 
187    The kernel provides two L1D flush modes:       187    The kernel provides two L1D flush modes:
188     - conditional ('cond')                        188     - conditional ('cond')
189     - unconditional ('always')                    189     - unconditional ('always')
190                                                   190 
191    The conditional mode avoids L1D flushing af    191    The conditional mode avoids L1D flushing after VMEXITs which execute
192    only audited code paths before the correspo    192    only audited code paths before the corresponding VMENTER. These code
193    paths have been verified that they cannot e    193    paths have been verified that they cannot expose secrets or other
194    interesting data to an attacker, but they c    194    interesting data to an attacker, but they can leak information about the
195    address space layout of the hypervisor.        195    address space layout of the hypervisor.
196                                                   196 
197    Unconditional mode flushes L1D on all VMENT    197    Unconditional mode flushes L1D on all VMENTER invocations and provides
198    maximum protection. It has a higher overhea    198    maximum protection. It has a higher overhead than the conditional
199    mode. The overhead cannot be quantified cor    199    mode. The overhead cannot be quantified correctly as it depends on the
200    workload scenario and the resulting number     200    workload scenario and the resulting number of VMEXITs.
201                                                   201 
202    The general recommendation is to enable L1D    202    The general recommendation is to enable L1D flush on VMENTER. The kernel
203    defaults to conditional mode on affected pr    203    defaults to conditional mode on affected processors.
204                                                   204 
205    **Note**, that L1D flush does not prevent t    205    **Note**, that L1D flush does not prevent the SMT problem because the
206    sibling thread will also bring back its dat    206    sibling thread will also bring back its data into the L1D which makes it
207    attackable again.                              207    attackable again.
208                                                   208 
209    L1D flush can be controlled by the administ    209    L1D flush can be controlled by the administrator via the kernel command
210    line and sysfs control files. See :ref:`mit    210    line and sysfs control files. See :ref:`mitigation_control_command_line`
211    and :ref:`mitigation_control_kvm`.             211    and :ref:`mitigation_control_kvm`.
212                                                   212 
213 .. _guest_confinement:                            213 .. _guest_confinement:
214                                                   214 
215 2. Guest VCPU confinement to dedicated physica    215 2. Guest VCPU confinement to dedicated physical cores
216 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^    216 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
217                                                   217 
218    To address the SMT problem, it is possible     218    To address the SMT problem, it is possible to make a guest or a group of
219    guests affine to one or more physical cores    219    guests affine to one or more physical cores. The proper mechanism for
220    that is to utilize exclusive cpusets to ens    220    that is to utilize exclusive cpusets to ensure that no other guest or
221    host tasks can run on these cores.             221    host tasks can run on these cores.
222                                                   222 
223    If only a single guest or related guests ru    223    If only a single guest or related guests run on sibling SMT threads on
224    the same physical core then they can only a    224    the same physical core then they can only attack their own memory and
225    restricted parts of the host memory.           225    restricted parts of the host memory.
226                                                   226 
227    Host memory is attackable, when one of the     227    Host memory is attackable, when one of the sibling SMT threads runs in
228    host OS (hypervisor) context and the other     228    host OS (hypervisor) context and the other in guest context. The amount
229    of valuable information from the host OS co    229    of valuable information from the host OS context depends on the context
230    which the host OS executes, i.e. interrupts    230    which the host OS executes, i.e. interrupts, soft interrupts and kernel
231    threads. The amount of valuable data from t    231    threads. The amount of valuable data from these contexts cannot be
232    declared as non-interesting for an attacker    232    declared as non-interesting for an attacker without deep inspection of
233    the code.                                      233    the code.
234                                                   234 
235    **Note**, that assigning guests to a fixed     235    **Note**, that assigning guests to a fixed set of physical cores affects
236    the ability of the scheduler to do load bal    236    the ability of the scheduler to do load balancing and might have
237    negative effects on CPU utilization dependi    237    negative effects on CPU utilization depending on the hosting
238    scenario. Disabling SMT might be a viable a    238    scenario. Disabling SMT might be a viable alternative for particular
239    scenarios.                                     239    scenarios.
240                                                   240 
241    For further information about confining gue    241    For further information about confining guests to a single or to a group
242    of cores consult the cpusets documentation:    242    of cores consult the cpusets documentation:
243                                                   243 
244    https://www.kernel.org/doc/Documentation/ad    244    https://www.kernel.org/doc/Documentation/admin-guide/cgroup-v1/cpusets.rst
245                                                   245 
246 .. _interrupt_isolation:                          246 .. _interrupt_isolation:
247                                                   247 
248 3. Interrupt affinity                             248 3. Interrupt affinity
249 ^^^^^^^^^^^^^^^^^^^^^                             249 ^^^^^^^^^^^^^^^^^^^^^
250                                                   250 
251    Interrupts can be made affine to logical CP    251    Interrupts can be made affine to logical CPUs. This is not universally
252    true because there are types of interrupts     252    true because there are types of interrupts which are truly per CPU
253    interrupts, e.g. the local timer interrupt.    253    interrupts, e.g. the local timer interrupt. Aside of that multi queue
254    devices affine their interrupts to single C    254    devices affine their interrupts to single CPUs or groups of CPUs per
255    queue without allowing the administrator to    255    queue without allowing the administrator to control the affinities.
256                                                   256 
257    Moving the interrupts, which can be affinit    257    Moving the interrupts, which can be affinity controlled, away from CPUs
258    which run untrusted guests, reduces the att    258    which run untrusted guests, reduces the attack vector space.
259                                                   259 
260    Whether the interrupts with are affine to C    260    Whether the interrupts with are affine to CPUs, which run untrusted
261    guests, provide interesting data for an att    261    guests, provide interesting data for an attacker depends on the system
262    configuration and the scenarios which run o    262    configuration and the scenarios which run on the system. While for some
263    of the interrupts it can be assumed that th    263    of the interrupts it can be assumed that they won't expose interesting
264    information beyond exposing hints about the    264    information beyond exposing hints about the host OS memory layout, there
265    is no way to make general assumptions.         265    is no way to make general assumptions.
266                                                   266 
267    Interrupt affinity can be controlled by the    267    Interrupt affinity can be controlled by the administrator via the
268    /proc/irq/$NR/smp_affinity[_list] files. Li    268    /proc/irq/$NR/smp_affinity[_list] files. Limited documentation is
269    available at:                                  269    available at:
270                                                   270 
271    https://www.kernel.org/doc/Documentation/co    271    https://www.kernel.org/doc/Documentation/core-api/irq/irq-affinity.rst
272                                                   272 
273 .. _smt_control:                                  273 .. _smt_control:
274                                                   274 
275 4. SMT control                                    275 4. SMT control
276 ^^^^^^^^^^^^^^                                    276 ^^^^^^^^^^^^^^
277                                                   277 
278    To prevent the SMT issues of L1TF it might     278    To prevent the SMT issues of L1TF it might be necessary to disable SMT
279    completely. Disabling SMT can have a signif    279    completely. Disabling SMT can have a significant performance impact, but
280    the impact depends on the hosting scenario     280    the impact depends on the hosting scenario and the type of workloads.
281    The impact of disabling SMT needs also to b    281    The impact of disabling SMT needs also to be weighted against the impact
282    of other mitigation solutions like confinin    282    of other mitigation solutions like confining guests to dedicated cores.
283                                                   283 
284    The kernel provides a sysfs interface to re    284    The kernel provides a sysfs interface to retrieve the status of SMT and
285    to control it. It also provides a kernel co    285    to control it. It also provides a kernel command line interface to
286    control SMT.                                   286    control SMT.
287                                                   287 
288    The kernel command line interface consists     288    The kernel command line interface consists of the following options:
289                                                   289 
290      =========== =============================    290      =========== ==========================================================
291      nosmt       Affects the bring up of the s    291      nosmt       Affects the bring up of the secondary CPUs during boot. The
292                  kernel tries to bring all pre    292                  kernel tries to bring all present CPUs online during the
293                  boot process. "nosmt" makes s    293                  boot process. "nosmt" makes sure that from each physical
294                  core only one - the so called    294                  core only one - the so called primary (hyper) thread is
295                  activated. Due to a design fl    295                  activated. Due to a design flaw of Intel processors related
296                  to Machine Check Exceptions t    296                  to Machine Check Exceptions the non primary siblings have
297                  to be brought up at least par    297                  to be brought up at least partially and are then shut down
298                  again.  "nosmt" can be undone    298                  again.  "nosmt" can be undone via the sysfs interface.
299                                                   299 
300      nosmt=force Has the same effect as "nosmt    300      nosmt=force Has the same effect as "nosmt" but it does not allow to
301                  undo the SMT disable via the     301                  undo the SMT disable via the sysfs interface.
302      =========== =============================    302      =========== ==========================================================
303                                                   303 
304    The sysfs interface provides two files:        304    The sysfs interface provides two files:
305                                                   305 
306    - /sys/devices/system/cpu/smt/control          306    - /sys/devices/system/cpu/smt/control
307    - /sys/devices/system/cpu/smt/active           307    - /sys/devices/system/cpu/smt/active
308                                                   308 
309    /sys/devices/system/cpu/smt/control:           309    /sys/devices/system/cpu/smt/control:
310                                                   310 
311      This file allows to read out the SMT cont    311      This file allows to read out the SMT control state and provides the
312      ability to disable or (re)enable SMT. The    312      ability to disable or (re)enable SMT. The possible states are:
313                                                   313 
314         ==============  ======================    314         ==============  ===================================================
315         on              SMT is supported by th    315         on              SMT is supported by the CPU and enabled. All
316                         logical CPUs can be on    316                         logical CPUs can be onlined and offlined without
317                         restrictions.             317                         restrictions.
318                                                   318 
319         off             SMT is supported by th    319         off             SMT is supported by the CPU and disabled. Only
320                         the so called primary     320                         the so called primary SMT threads can be onlined
321                         and offlined without r    321                         and offlined without restrictions. An attempt to
322                         online a non-primary s    322                         online a non-primary sibling is rejected
323                                                   323 
324         forceoff        Same as 'off' but the     324         forceoff        Same as 'off' but the state cannot be controlled.
325                         Attempts to write to t    325                         Attempts to write to the control file are rejected.
326                                                   326 
327         notsupported    The processor does not    327         notsupported    The processor does not support SMT. It's therefore
328                         not affected by the SM    328                         not affected by the SMT implications of L1TF.
329                         Attempts to write to t    329                         Attempts to write to the control file are rejected.
330         ==============  ======================    330         ==============  ===================================================
331                                                   331 
332      The possible states which can be written     332      The possible states which can be written into this file to control SMT
333      state are:                                   333      state are:
334                                                   334 
335      - on                                         335      - on
336      - off                                        336      - off
337      - forceoff                                   337      - forceoff
338                                                   338 
339    /sys/devices/system/cpu/smt/active:            339    /sys/devices/system/cpu/smt/active:
340                                                   340 
341      This file reports whether SMT is enabled     341      This file reports whether SMT is enabled and active, i.e. if on any
342      physical core two or more sibling threads    342      physical core two or more sibling threads are online.
343                                                   343 
344    SMT control is also possible at boot time v    344    SMT control is also possible at boot time via the l1tf kernel command
345    line parameter in combination with L1D flus    345    line parameter in combination with L1D flush control. See
346    :ref:`mitigation_control_command_line`.        346    :ref:`mitigation_control_command_line`.
347                                                   347 
348 5. Disabling EPT                                  348 5. Disabling EPT
349 ^^^^^^^^^^^^^^^^                                  349 ^^^^^^^^^^^^^^^^
350                                                   350 
351   Disabling EPT for virtual machines provides     351   Disabling EPT for virtual machines provides full mitigation for L1TF even
352   with SMT enabled, because the effective page    352   with SMT enabled, because the effective page tables for guests are
353   managed and sanitized by the hypervisor. Tho    353   managed and sanitized by the hypervisor. Though disabling EPT has a
354   significant performance impact especially wh    354   significant performance impact especially when the Meltdown mitigation
355   KPTI is enabled.                                355   KPTI is enabled.
356                                                   356 
357   EPT can be disabled in the hypervisor via th    357   EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
358                                                   358 
359 There is ongoing research and development for     359 There is ongoing research and development for new mitigation mechanisms to
360 address the performance impact of disabling SM    360 address the performance impact of disabling SMT or EPT.
361                                                   361 
362 .. _mitigation_control_command_line:              362 .. _mitigation_control_command_line:
363                                                   363 
364 Mitigation control on the kernel command line     364 Mitigation control on the kernel command line
365 ---------------------------------------------     365 ---------------------------------------------
366                                                   366 
367 The kernel command line allows to control the     367 The kernel command line allows to control the L1TF mitigations at boot
368 time with the option "l1tf=". The valid argume    368 time with the option "l1tf=". The valid arguments for this option are:
369                                                   369 
370   ============  ==============================    370   ============  =============================================================
371   full          Provides all available mitigat    371   full          Provides all available mitigations for the L1TF
372                 vulnerability. Disables SMT an    372                 vulnerability. Disables SMT and enables all mitigations in
373                 the hypervisors, i.e. uncondit    373                 the hypervisors, i.e. unconditional L1D flushing
374                                                   374 
375                 SMT control and L1D flush cont    375                 SMT control and L1D flush control via the sysfs interface
376                 is still possible after boot.     376                 is still possible after boot.  Hypervisors will issue a
377                 warning when the first VM is s    377                 warning when the first VM is started in a potentially
378                 insecure configuration, i.e. S    378                 insecure configuration, i.e. SMT enabled or L1D flush
379                 disabled.                         379                 disabled.
380                                                   380 
381   full,force    Same as 'full', but disables S    381   full,force    Same as 'full', but disables SMT and L1D flush runtime
382                 control. Implies the 'nosmt=fo    382                 control. Implies the 'nosmt=force' command line option.
383                 (i.e. sysfs control of SMT is     383                 (i.e. sysfs control of SMT is disabled.)
384                                                   384 
385   flush         Leaves SMT enabled and enables    385   flush         Leaves SMT enabled and enables the default hypervisor
386                 mitigation, i.e. conditional L    386                 mitigation, i.e. conditional L1D flushing
387                                                   387 
388                 SMT control and L1D flush cont    388                 SMT control and L1D flush control via the sysfs interface
389                 is still possible after boot.     389                 is still possible after boot.  Hypervisors will issue a
390                 warning when the first VM is s    390                 warning when the first VM is started in a potentially
391                 insecure configuration, i.e. S    391                 insecure configuration, i.e. SMT enabled or L1D flush
392                 disabled.                         392                 disabled.
393                                                   393 
394   flush,nosmt   Disables SMT and enables the d    394   flush,nosmt   Disables SMT and enables the default hypervisor mitigation,
395                 i.e. conditional L1D flushing.    395                 i.e. conditional L1D flushing.
396                                                   396 
397                 SMT control and L1D flush cont    397                 SMT control and L1D flush control via the sysfs interface
398                 is still possible after boot.     398                 is still possible after boot.  Hypervisors will issue a
399                 warning when the first VM is s    399                 warning when the first VM is started in a potentially
400                 insecure configuration, i.e. S    400                 insecure configuration, i.e. SMT enabled or L1D flush
401                 disabled.                         401                 disabled.
402                                                   402 
403   flush,nowarn  Same as 'flush', but hyperviso    403   flush,nowarn  Same as 'flush', but hypervisors will not warn when a VM is
404                 started in a potentially insec    404                 started in a potentially insecure configuration.
405                                                   405 
406   off           Disables hypervisor mitigation    406   off           Disables hypervisor mitigations and doesn't emit any
407                 warnings.                         407                 warnings.
408                 It also drops the swap size an    408                 It also drops the swap size and available RAM limit restrictions
409                 on both hypervisor and bare me    409                 on both hypervisor and bare metal.
410                                                   410 
411   ============  ==============================    411   ============  =============================================================
412                                                   412 
413 The default is 'flush'. For details about L1D     413 The default is 'flush'. For details about L1D flushing see :ref:`l1d_flush`.
414                                                   414 
415                                                   415 
416 .. _mitigation_control_kvm:                       416 .. _mitigation_control_kvm:
417                                                   417 
418 Mitigation control for KVM - module parameter     418 Mitigation control for KVM - module parameter
419 ----------------------------------------------    419 -------------------------------------------------------------
420                                                   420 
421 The KVM hypervisor mitigation mechanism, flush    421 The KVM hypervisor mitigation mechanism, flushing the L1D cache when
422 entering a guest, can be controlled with a mod    422 entering a guest, can be controlled with a module parameter.
423                                                   423 
424 The option/parameter is "kvm-intel.vmentry_l1d    424 The option/parameter is "kvm-intel.vmentry_l1d_flush=". It takes the
425 following arguments:                              425 following arguments:
426                                                   426 
427   ============  ==============================    427   ============  ==============================================================
428   always        L1D cache flush on every VMENT    428   always        L1D cache flush on every VMENTER.
429                                                   429 
430   cond          Flush L1D on VMENTER only when    430   cond          Flush L1D on VMENTER only when the code between VMEXIT and
431                 VMENTER can leak host memory w    431                 VMENTER can leak host memory which is considered
432                 interesting for an attacker. T    432                 interesting for an attacker. This still can leak host memory
433                 which allows e.g. to determine    433                 which allows e.g. to determine the hosts address space layout.
434                                                   434 
435   never         Disables the mitigation           435   never         Disables the mitigation
436   ============  ==============================    436   ============  ==============================================================
437                                                   437 
438 The parameter can be provided on the kernel co    438 The parameter can be provided on the kernel command line, as a module
439 parameter when loading the modules and at runt    439 parameter when loading the modules and at runtime modified via the sysfs
440 file:                                             440 file:
441                                                   441 
442 /sys/module/kvm_intel/parameters/vmentry_l1d_f    442 /sys/module/kvm_intel/parameters/vmentry_l1d_flush
443                                                   443 
444 The default is 'cond'. If 'l1tf=full,force' is    444 The default is 'cond'. If 'l1tf=full,force' is given on the kernel command
445 line, then 'always' is enforced and the kvm-in    445 line, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush
446 module parameter is ignored and writes to the     446 module parameter is ignored and writes to the sysfs file are rejected.
447                                                   447 
448 .. _mitigation_selection:                         448 .. _mitigation_selection:
449                                                   449 
450 Mitigation selection guide                        450 Mitigation selection guide
451 --------------------------                        451 --------------------------
452                                                   452 
453 1. No virtualization in use                       453 1. No virtualization in use
454 ^^^^^^^^^^^^^^^^^^^^^^^^^^^                       454 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
455                                                   455 
456    The system is protected by the kernel uncon    456    The system is protected by the kernel unconditionally and no further
457    action is required.                            457    action is required.
458                                                   458 
459 2. Virtualization with trusted guests             459 2. Virtualization with trusted guests
460 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^             460 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
461                                                   461 
462    If the guest comes from a trusted source an    462    If the guest comes from a trusted source and the guest OS kernel is
463    guaranteed to have the L1TF mitigations in     463    guaranteed to have the L1TF mitigations in place the system is fully
464    protected against L1TF and no further actio    464    protected against L1TF and no further action is required.
465                                                   465 
466    To avoid the overhead of the default L1D fl    466    To avoid the overhead of the default L1D flushing on VMENTER the
467    administrator can disable the flushing via     467    administrator can disable the flushing via the kernel command line and
468    sysfs control files. See :ref:`mitigation_c    468    sysfs control files. See :ref:`mitigation_control_command_line` and
469    :ref:`mitigation_control_kvm`.                 469    :ref:`mitigation_control_kvm`.
470                                                   470 
471                                                   471 
472 3. Virtualization with untrusted guests           472 3. Virtualization with untrusted guests
473 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^           473 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
474                                                   474 
475 3.1. SMT not supported or disabled                475 3.1. SMT not supported or disabled
476 """"""""""""""""""""""""""""""""""                476 """"""""""""""""""""""""""""""""""
477                                                   477 
478   If SMT is not supported by the processor or     478   If SMT is not supported by the processor or disabled in the BIOS or by
479   the kernel, it's only required to enforce L1    479   the kernel, it's only required to enforce L1D flushing on VMENTER.
480                                                   480 
481   Conditional L1D flushing is the default beha    481   Conditional L1D flushing is the default behaviour and can be tuned. See
482   :ref:`mitigation_control_command_line` and :    482   :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
483                                                   483 
484 3.2. EPT not supported or disabled                484 3.2. EPT not supported or disabled
485 """"""""""""""""""""""""""""""""""                485 """"""""""""""""""""""""""""""""""
486                                                   486 
487   If EPT is not supported by the processor or     487   If EPT is not supported by the processor or disabled in the hypervisor,
488   the system is fully protected. SMT can stay     488   the system is fully protected. SMT can stay enabled and L1D flushing on
489   VMENTER is not required.                        489   VMENTER is not required.
490                                                   490 
491   EPT can be disabled in the hypervisor via th    491   EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
492                                                   492 
493 3.3. SMT and EPT supported and active             493 3.3. SMT and EPT supported and active
494 """""""""""""""""""""""""""""""""""""             494 """""""""""""""""""""""""""""""""""""
495                                                   495 
496   If SMT and EPT are supported and active then    496   If SMT and EPT are supported and active then various degrees of
497   mitigations can be employed:                    497   mitigations can be employed:
498                                                   498 
499   - L1D flushing on VMENTER:                      499   - L1D flushing on VMENTER:
500                                                   500 
501     L1D flushing on VMENTER is the minimal pro    501     L1D flushing on VMENTER is the minimal protection requirement, but it
502     is only potent in combination with other m    502     is only potent in combination with other mitigation methods.
503                                                   503 
504     Conditional L1D flushing is the default be    504     Conditional L1D flushing is the default behaviour and can be tuned. See
505     :ref:`mitigation_control_command_line` and    505     :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
506                                                   506 
507   - Guest confinement:                            507   - Guest confinement:
508                                                   508 
509     Confinement of guests to a single or a gro    509     Confinement of guests to a single or a group of physical cores which
510     are not running any other processes, can r    510     are not running any other processes, can reduce the attack surface
511     significantly, but interrupts, soft interr    511     significantly, but interrupts, soft interrupts and kernel threads can
512     still expose valuable data to a potential     512     still expose valuable data to a potential attacker. See
513     :ref:`guest_confinement`.                     513     :ref:`guest_confinement`.
514                                                   514 
515   - Interrupt isolation:                          515   - Interrupt isolation:
516                                                   516 
517     Isolating the guest CPUs from interrupts c    517     Isolating the guest CPUs from interrupts can reduce the attack surface
518     further, but still allows a malicious gues    518     further, but still allows a malicious guest to explore a limited amount
519     of host physical memory. This can at least    519     of host physical memory. This can at least be used to gain knowledge
520     about the host address space layout. The i    520     about the host address space layout. The interrupts which have a fixed
521     affinity to the CPUs which run the untrust    521     affinity to the CPUs which run the untrusted guests can depending on
522     the scenario still trigger soft interrupts    522     the scenario still trigger soft interrupts and schedule kernel threads
523     which might expose valuable information. S    523     which might expose valuable information. See
524     :ref:`interrupt_isolation`.                   524     :ref:`interrupt_isolation`.
525                                                   525 
526 The above three mitigation methods combined ca    526 The above three mitigation methods combined can provide protection to a
527 certain degree, but the risk of the remaining     527 certain degree, but the risk of the remaining attack surface has to be
528 carefully analyzed. For full protection the fo    528 carefully analyzed. For full protection the following methods are
529 available:                                        529 available:
530                                                   530 
531   - Disabling SMT:                                531   - Disabling SMT:
532                                                   532 
533     Disabling SMT and enforcing the L1D flushi    533     Disabling SMT and enforcing the L1D flushing provides the maximum
534     amount of protection. This mitigation is n    534     amount of protection. This mitigation is not depending on any of the
535     above mitigation methods.                     535     above mitigation methods.
536                                                   536 
537     SMT control and L1D flushing can be tuned     537     SMT control and L1D flushing can be tuned by the command line
538     parameters 'nosmt', 'l1tf', 'kvm-intel.vme    538     parameters 'nosmt', 'l1tf', 'kvm-intel.vmentry_l1d_flush' and at run
539     time with the matching sysfs control files    539     time with the matching sysfs control files. See :ref:`smt_control`,
540     :ref:`mitigation_control_command_line` and    540     :ref:`mitigation_control_command_line` and
541     :ref:`mitigation_control_kvm`.                541     :ref:`mitigation_control_kvm`.
542                                                   542 
543   - Disabling EPT:                                543   - Disabling EPT:
544                                                   544 
545     Disabling EPT provides the maximum amount     545     Disabling EPT provides the maximum amount of protection as well. It is
546     not depending on any of the above mitigati    546     not depending on any of the above mitigation methods. SMT can stay
547     enabled and L1D flushing is not required,     547     enabled and L1D flushing is not required, but the performance impact is
548     significant.                                  548     significant.
549                                                   549 
550     EPT can be disabled in the hypervisor via     550     EPT can be disabled in the hypervisor via the 'kvm-intel.ept'
551     parameter.                                    551     parameter.
552                                                   552 
553 3.4. Nested virtual machines                      553 3.4. Nested virtual machines
554 """"""""""""""""""""""""""""                      554 """"""""""""""""""""""""""""
555                                                   555 
556 When nested virtualization is in use, three op    556 When nested virtualization is in use, three operating systems are involved:
557 the bare metal hypervisor, the nested hypervis    557 the bare metal hypervisor, the nested hypervisor and the nested virtual
558 machine.  VMENTER operations from the nested h    558 machine.  VMENTER operations from the nested hypervisor into the nested
559 guest will always be processed by the bare met    559 guest will always be processed by the bare metal hypervisor. If KVM is the
560 bare metal hypervisor it will:                    560 bare metal hypervisor it will:
561                                                   561 
562  - Flush the L1D cache on every switch from th    562  - Flush the L1D cache on every switch from the nested hypervisor to the
563    nested virtual machine, so that the nested     563    nested virtual machine, so that the nested hypervisor's secrets are not
564    exposed to the nested virtual machine;         564    exposed to the nested virtual machine;
565                                                   565 
566  - Flush the L1D cache on every switch from th    566  - Flush the L1D cache on every switch from the nested virtual machine to
567    the nested hypervisor; this is a complex op    567    the nested hypervisor; this is a complex operation, and flushing the L1D
568    cache avoids that the bare metal hypervisor    568    cache avoids that the bare metal hypervisor's secrets are exposed to the
569    nested virtual machine;                        569    nested virtual machine;
570                                                   570 
571  - Instruct the nested hypervisor to not perfo    571  - Instruct the nested hypervisor to not perform any L1D cache flush. This
572    is an optimization to avoid double L1D flus    572    is an optimization to avoid double L1D flushing.
573                                                   573 
574                                                   574 
575 .. _default_mitigations:                          575 .. _default_mitigations:
576                                                   576 
577 Default mitigations                               577 Default mitigations
578 -------------------                               578 -------------------
579                                                   579 
580   The kernel default mitigations for vulnerabl    580   The kernel default mitigations for vulnerable processors are:
581                                                   581 
582   - PTE inversion to protect against malicious    582   - PTE inversion to protect against malicious user space. This is done
583     unconditionally and cannot be controlled.     583     unconditionally and cannot be controlled. The swap storage is limited
584     to ~16TB.                                     584     to ~16TB.
585                                                   585 
586   - L1D conditional flushing on VMENTER when E    586   - L1D conditional flushing on VMENTER when EPT is enabled for
587     a guest.                                      587     a guest.
588                                                   588 
589   The kernel does not by default enforce the d    589   The kernel does not by default enforce the disabling of SMT, which leaves
590   SMT systems vulnerable when running untruste    590   SMT systems vulnerable when running untrusted guests with EPT enabled.
591                                                   591 
592   The rationale for this choice is:               592   The rationale for this choice is:
593                                                   593 
594   - Force disabling SMT can break existing set    594   - Force disabling SMT can break existing setups, especially with
595     unattended updates.                           595     unattended updates.
596                                                   596 
597   - If regular users run untrusted guests on t    597   - If regular users run untrusted guests on their machine, then L1TF is
598     just an add on to other malware which migh    598     just an add on to other malware which might be embedded in an untrusted
599     guest, e.g. spam-bots or attacks on the lo    599     guest, e.g. spam-bots or attacks on the local network.
600                                                   600 
601     There is no technical way to prevent a use    601     There is no technical way to prevent a user from running untrusted code
602     on their machines blindly.                    602     on their machines blindly.
603                                                   603 
604   - It's technically extremely unlikely and fr    604   - It's technically extremely unlikely and from today's knowledge even
605     impossible that L1TF can be exploited via     605     impossible that L1TF can be exploited via the most popular attack
606     mechanisms like JavaScript because these m    606     mechanisms like JavaScript because these mechanisms have no way to
607     control PTEs. If this would be possible an    607     control PTEs. If this would be possible and not other mitigation would
608     be possible, then the default might be dif    608     be possible, then the default might be different.
609                                                   609 
610   - The administrators of cloud and hosting se    610   - The administrators of cloud and hosting setups have to carefully
611     analyze the risk for their scenarios and m    611     analyze the risk for their scenarios and make the appropriate
612     mitigation choices, which might even vary     612     mitigation choices, which might even vary across their deployed
613     machines and also result in other changes     613     machines and also result in other changes of their overall setup.
614     There is no way for the kernel to provide     614     There is no way for the kernel to provide a sensible default for this
615     kind of scenarios.                            615     kind of scenarios.
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php