~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/arch/x86/mds.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 Microarchitectural Data Sampling (MDS) mitigation
  2 =================================================
  3 
  4 .. _mds:
  5 
  6 Overview
  7 --------
  8 
  9 Microarchitectural Data Sampling (MDS) is a family of side channel attacks
 10 on internal buffers in Intel CPUs. The variants are:
 11 
 12  - Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126)
 13  - Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130)
 14  - Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127)
 15  - Microarchitectural Data Sampling Uncacheable Memory (MDSUM) (CVE-2019-11091)
 16 
 17 MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a
 18 dependent load (store-to-load forwarding) as an optimization. The forward
 19 can also happen to a faulting or assisting load operation for a different
 20 memory address, which can be exploited under certain conditions. Store
 21 buffers are partitioned between Hyper-Threads so cross thread forwarding is
 22 not possible. But if a thread enters or exits a sleep state the store
 23 buffer is repartitioned which can expose data from one thread to the other.
 24 
 25 MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage
 26 L1 miss situations and to hold data which is returned or sent in response
 27 to a memory or I/O operation. Fill buffers can forward data to a load
 28 operation and also write data to the cache. When the fill buffer is
 29 deallocated it can retain the stale data of the preceding operations which
 30 can then be forwarded to a faulting or assisting load operation, which can
 31 be exploited under certain conditions. Fill buffers are shared between
 32 Hyper-Threads so cross thread leakage is possible.
 33 
 34 MLPDS leaks Load Port Data. Load ports are used to perform load operations
 35 from memory or I/O. The received data is then forwarded to the register
 36 file or a subsequent operation. In some implementations the Load Port can
 37 contain stale data from a previous operation which can be forwarded to
 38 faulting or assisting loads under certain conditions, which again can be
 39 exploited eventually. Load ports are shared between Hyper-Threads so cross
 40 thread leakage is possible.
 41 
 42 MDSUM is a special case of MSBDS, MFBDS and MLPDS. An uncacheable load from
 43 memory that takes a fault or assist can leave data in a microarchitectural
 44 structure that may later be observed using one of the same methods used by
 45 MSBDS, MFBDS or MLPDS.
 46 
 47 Exposure assumptions
 48 --------------------
 49 
 50 It is assumed that attack code resides in user space or in a guest with one
 51 exception. The rationale behind this assumption is that the code construct
 52 needed for exploiting MDS requires:
 53 
 54  - to control the load to trigger a fault or assist
 55 
 56  - to have a disclosure gadget which exposes the speculatively accessed
 57    data for consumption through a side channel.
 58 
 59  - to control the pointer through which the disclosure gadget exposes the
 60    data
 61 
 62 The existence of such a construct in the kernel cannot be excluded with
 63 100% certainty, but the complexity involved makes it extremely unlikely.
 64 
 65 There is one exception, which is untrusted BPF. The functionality of
 66 untrusted BPF is limited, but it needs to be thoroughly investigated
 67 whether it can be used to create such a construct.
 68 
 69 
 70 Mitigation strategy
 71 -------------------
 72 
 73 All variants have the same mitigation strategy at least for the single CPU
 74 thread case (SMT off): Force the CPU to clear the affected buffers.
 75 
 76 This is achieved by using the otherwise unused and obsolete VERW
 77 instruction in combination with a microcode update. The microcode clears
 78 the affected CPU buffers when the VERW instruction is executed.
 79 
 80 For virtualization there are two ways to achieve CPU buffer
 81 clearing. Either the modified VERW instruction or via the L1D Flush
 82 command. The latter is issued when L1TF mitigation is enabled so the extra
 83 VERW can be avoided. If the CPU is not affected by L1TF then VERW needs to
 84 be issued.
 85 
 86 If the VERW instruction with the supplied segment selector argument is
 87 executed on a CPU without the microcode update there is no side effect
 88 other than a small number of pointlessly wasted CPU cycles.
 89 
 90 This does not protect against cross Hyper-Thread attacks except for MSBDS
 91 which is only exploitable cross Hyper-thread when one of the Hyper-Threads
 92 enters a C-state.
 93 
 94 The kernel provides a function to invoke the buffer clearing:
 95 
 96     mds_clear_cpu_buffers()
 97 
 98 Also macro CLEAR_CPU_BUFFERS can be used in ASM late in exit-to-user path.
 99 Other than CFLAGS.ZF, this macro doesn't clobber any registers.
100 
101 The mitigation is invoked on kernel/userspace, hypervisor/guest and C-state
102 (idle) transitions.
103 
104 As a special quirk to address virtualization scenarios where the host has
105 the microcode updated, but the hypervisor does not (yet) expose the
106 MD_CLEAR CPUID bit to guests, the kernel issues the VERW instruction in the
107 hope that it might actually clear the buffers. The state is reflected
108 accordingly.
109 
110 According to current knowledge additional mitigations inside the kernel
111 itself are not required because the necessary gadgets to expose the leaked
112 data cannot be controlled in a way which allows exploitation from malicious
113 user space or VM guests.
114 
115 Kernel internal mitigation modes
116 --------------------------------
117 
118  ======= ============================================================
119  off      Mitigation is disabled. Either the CPU is not affected or
120           mds=off is supplied on the kernel command line
121 
122  full     Mitigation is enabled. CPU is affected and MD_CLEAR is
123           advertised in CPUID.
124 
125  vmwerv   Mitigation is enabled. CPU is affected and MD_CLEAR is not
126           advertised in CPUID. That is mainly for virtualization
127           scenarios where the host has the updated microcode but the
128           hypervisor does not expose MD_CLEAR in CPUID. It's a best
129           effort approach without guarantee.
130  ======= ============================================================
131 
132 If the CPU is affected and mds=off is not supplied on the kernel command
133 line then the kernel selects the appropriate mitigation mode depending on
134 the availability of the MD_CLEAR CPUID bit.
135 
136 Mitigation points
137 -----------------
138 
139 1. Return to user space
140 ^^^^^^^^^^^^^^^^^^^^^^^
141 
142    When transitioning from kernel to user space the CPU buffers are flushed
143    on affected CPUs when the mitigation is not disabled on the kernel
144    command line. The mitigation is enabled through the feature flag
145    X86_FEATURE_CLEAR_CPU_BUF.
146 
147    The mitigation is invoked just before transitioning to userspace after
148    user registers are restored. This is done to minimize the window in
149    which kernel data could be accessed after VERW e.g. via an NMI after
150    VERW.
151 
152    **Corner case not handled**
153    Interrupts returning to kernel don't clear CPUs buffers since the
154    exit-to-user path is expected to do that anyways. But, there could be
155    a case when an NMI is generated in kernel after the exit-to-user path
156    has cleared the buffers. This case is not handled and NMI returning to
157    kernel don't clear CPU buffers because:
158 
159    1. It is rare to get an NMI after VERW, but before returning to userspace.
160    2. For an unprivileged user, there is no known way to make that NMI
161       less rare or target it.
162    3. It would take a large number of these precisely-timed NMIs to mount
163       an actual attack.  There's presumably not enough bandwidth.
164    4. The NMI in question occurs after a VERW, i.e. when user state is
165       restored and most interesting data is already scrubbed. What's left
166       is only the data that NMI touches, and that may or may not be of
167       any interest.
168 
169 
170 2. C-State transition
171 ^^^^^^^^^^^^^^^^^^^^^
172 
173    When a CPU goes idle and enters a C-State the CPU buffers need to be
174    cleared on affected CPUs when SMT is active. This addresses the
175    repartitioning of the store buffer when one of the Hyper-Threads enters
176    a C-State.
177 
178    When SMT is inactive, i.e. either the CPU does not support it or all
179    sibling threads are offline CPU buffer clearing is not required.
180 
181    The idle clearing is enabled on CPUs which are only affected by MSBDS
182    and not by any other MDS variant. The other MDS variants cannot be
183    protected against cross Hyper-Thread attacks because the Fill Buffer and
184    the Load Ports are shared. So on CPUs affected by other variants, the
185    idle clearing would be a window dressing exercise and is therefore not
186    activated.
187 
188    The invocation is controlled by the static key mds_idle_clear which is
189    switched depending on the chosen mitigation mode and the SMT state of
190    the system.
191 
192    The buffer clear is only invoked before entering the C-State to prevent
193    that stale data from the idling CPU from spilling to the Hyper-Thread
194    sibling after the store buffer got repartitioned and all entries are
195    available to the non idle sibling.
196 
197    When coming out of idle the store buffer is partitioned again so each
198    sibling has half of it available. The back from idle CPU could be then
199    speculatively exposed to contents of the sibling. The buffers are
200    flushed either on exit to user space or on VMENTER so malicious code
201    in user space or the guest cannot speculatively access them.
202 
203    The mitigation is hooked into all variants of halt()/mwait(), but does
204    not cover the legacy ACPI IO-Port mechanism because the ACPI idle driver
205    has been superseded by the intel_idle driver around 2010 and is
206    preferred on all affected CPUs which are expected to gain the MD_CLEAR
207    functionality in microcode. Aside of that the IO-Port mechanism is a
208    legacy interface which is only used on older systems which are either
209    not affected or do not receive microcode updates anymore.

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php