~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/arch/x86/sgx.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 .. SPDX-License-Identifier: GPL-2.0
  2 
  3 ===============================
  4 Software Guard eXtensions (SGX)
  5 ===============================
  6 
  7 Overview
  8 ========
  9 
 10 Software Guard eXtensions (SGX) hardware enables for user space applications
 11 to set aside private memory regions of code and data:
 12 
 13 * Privileged (ring-0) ENCLS functions orchestrate the construction of the
 14   regions.
 15 * Unprivileged (ring-3) ENCLU functions allow an application to enter and
 16   execute inside the regions.
 17 
 18 These memory regions are called enclaves. An enclave can be only entered at a
 19 fixed set of entry points. Each entry point can hold a single hardware thread
 20 at a time.  While the enclave is loaded from a regular binary file by using
 21 ENCLS functions, only the threads inside the enclave can access its memory. The
 22 region is denied from outside access by the CPU, and encrypted before it leaves
 23 from LLC.
 24 
 25 The support can be determined by
 26 
 27         ``grep sgx /proc/cpuinfo``
 28 
 29 SGX must both be supported in the processor and enabled by the BIOS.  If SGX
 30 appears to be unsupported on a system which has hardware support, ensure
 31 support is enabled in the BIOS.  If a BIOS presents a choice between "Enabled"
 32 and "Software Enabled" modes for SGX, choose "Enabled".
 33 
 34 Enclave Page Cache
 35 ==================
 36 
 37 SGX utilizes an *Enclave Page Cache (EPC)* to store pages that are associated
 38 with an enclave. It is contained in a BIOS-reserved region of physical memory.
 39 Unlike pages used for regular memory, pages can only be accessed from outside of
 40 the enclave during enclave construction with special, limited SGX instructions.
 41 
 42 Only a CPU executing inside an enclave can directly access enclave memory.
 43 However, a CPU executing inside an enclave may access normal memory outside the
 44 enclave.
 45 
 46 The kernel manages enclave memory similar to how it treats device memory.
 47 
 48 Enclave Page Types
 49 ------------------
 50 
 51 **SGX Enclave Control Structure (SECS)**
 52    Enclave's address range, attributes and other global data are defined
 53    by this structure.
 54 
 55 **Regular (REG)**
 56    Regular EPC pages contain the code and data of an enclave.
 57 
 58 **Thread Control Structure (TCS)**
 59    Thread Control Structure pages define the entry points to an enclave and
 60    track the execution state of an enclave thread.
 61 
 62 **Version Array (VA)**
 63    Version Array pages contain 512 slots, each of which can contain a version
 64    number for a page evicted from the EPC.
 65 
 66 Enclave Page Cache Map
 67 ----------------------
 68 
 69 The processor tracks EPC pages in a hardware metadata structure called the
 70 *Enclave Page Cache Map (EPCM)*.  The EPCM contains an entry for each EPC page
 71 which describes the owning enclave, access rights and page type among the other
 72 things.
 73 
 74 EPCM permissions are separate from the normal page tables.  This prevents the
 75 kernel from, for instance, allowing writes to data which an enclave wishes to
 76 remain read-only.  EPCM permissions may only impose additional restrictions on
 77 top of normal x86 page permissions.
 78 
 79 For all intents and purposes, the SGX architecture allows the processor to
 80 invalidate all EPCM entries at will.  This requires that software be prepared to
 81 handle an EPCM fault at any time.  In practice, this can happen on events like
 82 power transitions when the ephemeral key that encrypts enclave memory is lost.
 83 
 84 Application interface
 85 =====================
 86 
 87 Enclave build functions
 88 -----------------------
 89 
 90 In addition to the traditional compiler and linker build process, SGX has a
 91 separate enclave “build” process.  Enclaves must be built before they can be
 92 executed (entered). The first step in building an enclave is opening the
 93 **/dev/sgx_enclave** device.  Since enclave memory is protected from direct
 94 access, special privileged instructions are then used to copy data into enclave
 95 pages and establish enclave page permissions.
 96 
 97 .. kernel-doc:: arch/x86/kernel/cpu/sgx/ioctl.c
 98    :functions: sgx_ioc_enclave_create
 99                sgx_ioc_enclave_add_pages
100                sgx_ioc_enclave_init
101                sgx_ioc_enclave_provision
102 
103 Enclave runtime management
104 --------------------------
105 
106 Systems supporting SGX2 additionally support changes to initialized
107 enclaves: modifying enclave page permissions and type, and dynamically
108 adding and removing of enclave pages. When an enclave accesses an address
109 within its address range that does not have a backing page then a new
110 regular page will be dynamically added to the enclave. The enclave is
111 still required to run EACCEPT on the new page before it can be used.
112 
113 .. kernel-doc:: arch/x86/kernel/cpu/sgx/ioctl.c
114    :functions: sgx_ioc_enclave_restrict_permissions
115                sgx_ioc_enclave_modify_types
116                sgx_ioc_enclave_remove_pages
117 
118 Enclave vDSO
119 ------------
120 
121 Entering an enclave can only be done through SGX-specific EENTER and ERESUME
122 functions, and is a non-trivial process.  Because of the complexity of
123 transitioning to and from an enclave, enclaves typically utilize a library to
124 handle the actual transitions.  This is roughly analogous to how glibc
125 implementations are used by most applications to wrap system calls.
126 
127 Another crucial characteristic of enclaves is that they can generate exceptions
128 as part of their normal operation that need to be handled in the enclave or are
129 unique to SGX.
130 
131 Instead of the traditional signal mechanism to handle these exceptions, SGX
132 can leverage special exception fixup provided by the vDSO.  The kernel-provided
133 vDSO function wraps low-level transitions to/from the enclave like EENTER and
134 ERESUME.  The vDSO function intercepts exceptions that would otherwise generate
135 a signal and return the fault information directly to its caller.  This avoids
136 the need to juggle signal handlers.
137 
138 .. kernel-doc:: arch/x86/include/uapi/asm/sgx.h
139    :functions: vdso_sgx_enter_enclave_t
140 
141 ksgxd
142 =====
143 
144 SGX support includes a kernel thread called *ksgxd*.
145 
146 EPC sanitization
147 ----------------
148 
149 ksgxd is started when SGX initializes.  Enclave memory is typically ready
150 for use when the processor powers on or resets.  However, if SGX has been in
151 use since the reset, enclave pages may be in an inconsistent state.  This might
152 occur after a crash and kexec() cycle, for instance.  At boot, ksgxd
153 reinitializes all enclave pages so that they can be allocated and re-used.
154 
155 The sanitization is done by going through EPC address space and applying the
156 EREMOVE function to each physical page. Some enclave pages like SECS pages have
157 hardware dependencies on other pages which prevents EREMOVE from functioning.
158 Executing two EREMOVE passes removes the dependencies.
159 
160 Page reclaimer
161 --------------
162 
163 Similar to the core kswapd, ksgxd, is responsible for managing the
164 overcommitment of enclave memory.  If the system runs out of enclave memory,
165 *ksgxd* “swaps” enclave memory to normal memory.
166 
167 Launch Control
168 ==============
169 
170 SGX provides a launch control mechanism. After all enclave pages have been
171 copied, kernel executes EINIT function, which initializes the enclave. Only after
172 this the CPU can execute inside the enclave.
173 
174 EINIT function takes an RSA-3072 signature of the enclave measurement.  The function
175 checks that the measurement is correct and signature is signed with the key
176 hashed to the four **IA32_SGXLEPUBKEYHASH{0, 1, 2, 3}** MSRs representing the
177 SHA256 of a public key.
178 
179 Those MSRs can be configured by the BIOS to be either readable or writable.
180 Linux supports only writable configuration in order to give full control to the
181 kernel on launch control policy. Before calling EINIT function, the driver sets
182 the MSRs to match the enclave's signing key.
183 
184 Encryption engines
185 ==================
186 
187 In order to conceal the enclave data while it is out of the CPU package, the
188 memory controller has an encryption engine to transparently encrypt and decrypt
189 enclave memory.
190 
191 In CPUs prior to Ice Lake, the Memory Encryption Engine (MEE) is used to
192 encrypt pages leaving the CPU caches. MEE uses a n-ary Merkle tree with root in
193 SRAM to maintain integrity of the encrypted data. This provides integrity and
194 anti-replay protection but does not scale to large memory sizes because the time
195 required to update the Merkle tree grows logarithmically in relation to the
196 memory size.
197 
198 CPUs starting from Icelake use Total Memory Encryption (TME) in the place of
199 MEE. TME-based SGX implementations do not have an integrity Merkle tree, which
200 means integrity and replay-attacks are not mitigated.  B, it includes
201 additional changes to prevent cipher text from being returned and SW memory
202 aliases from being created.
203 
204 DMA to enclave memory is blocked by range registers on both MEE and TME systems
205 (SDM section 41.10).
206 
207 Usage Models
208 ============
209 
210 Shared Library
211 --------------
212 
213 Sensitive data and the code that acts on it is partitioned from the application
214 into a separate library. The library is then linked as a DSO which can be loaded
215 into an enclave. The application can then make individual function calls into
216 the enclave through special SGX instructions. A run-time within the enclave is
217 configured to marshal function parameters into and out of the enclave and to
218 call the correct library function.
219 
220 Application Container
221 ---------------------
222 
223 An application may be loaded into a container enclave which is specially
224 configured with a library OS and run-time which permits the application to run.
225 The enclave run-time and library OS work together to execute the application
226 when a thread enters the enclave.
227 
228 Impact of Potential Kernel SGX Bugs
229 ===================================
230 
231 EPC leaks
232 ---------
233 
234 When EPC page leaks happen, a WARNING like this is shown in dmesg:
235 
236 "EREMOVE returned ... and an EPC page was leaked.  SGX may become unusable..."
237 
238 This is effectively a kernel use-after-free of an EPC page, and due
239 to the way SGX works, the bug is detected at freeing. Rather than
240 adding the page back to the pool of available EPC pages, the kernel
241 intentionally leaks the page to avoid additional errors in the future.
242 
243 When this happens, the kernel will likely soon leak more EPC pages, and
244 SGX will likely become unusable because the memory available to SGX is
245 limited. However, while this may be fatal to SGX, the rest of the kernel
246 is unlikely to be impacted and should continue to work.
247 
248 As a result, when this happens, user should stop running any new
249 SGX workloads, (or just any new workloads), and migrate all valuable
250 workloads. Although a machine reboot can recover all EPC memory, the bug
251 should be reported to Linux developers.
252 
253 
254 Virtual EPC
255 ===========
256 
257 The implementation has also a virtual EPC driver to support SGX enclaves
258 in guests. Unlike the SGX driver, an EPC page allocated by the virtual
259 EPC driver doesn't have a specific enclave associated with it. This is
260 because KVM doesn't track how a guest uses EPC pages.
261 
262 As a result, the SGX core page reclaimer doesn't support reclaiming EPC
263 pages allocated to KVM guests through the virtual EPC driver. If the
264 user wants to deploy SGX applications both on the host and in guests
265 on the same machine, the user should reserve enough EPC (by taking out
266 total virtual EPC size of all SGX VMs from the physical EPC size) for
267 host SGX applications so they can run with acceptable performance.
268 
269 Architectural behavior is to restore all EPC pages to an uninitialized
270 state also after a guest reboot.  Because this state can be reached only
271 through the privileged ``ENCLS[EREMOVE]`` instruction, ``/dev/sgx_vepc``
272 provides the ``SGX_IOC_VEPC_REMOVE_ALL`` ioctl to execute the instruction
273 on all pages in the virtual EPC.
274 
275 ``EREMOVE`` can fail for three reasons.  Userspace must pay attention
276 to expected failures and handle them as follows:
277 
278 1. Page removal will always fail when any thread is running in the
279    enclave to which the page belongs.  In this case the ioctl will
280    return ``EBUSY`` independent of whether it has successfully removed
281    some pages; userspace can avoid these failures by preventing execution
282    of any vcpu which maps the virtual EPC.
283 
284 2. Page removal will cause a general protection fault if two calls to
285    ``EREMOVE`` happen concurrently for pages that refer to the same
286    "SECS" metadata pages.  This can happen if there are concurrent
287    invocations to ``SGX_IOC_VEPC_REMOVE_ALL``, or if a ``/dev/sgx_vepc``
288    file descriptor in the guest is closed at the same time as
289    ``SGX_IOC_VEPC_REMOVE_ALL``; it will also be reported as ``EBUSY``.
290    This can be avoided in userspace by serializing calls to the ioctl()
291    and to close(), but in general it should not be a problem.
292 
293 3. Finally, page removal will fail for SECS metadata pages which still
294    have child pages.  Child pages can be removed by executing
295    ``SGX_IOC_VEPC_REMOVE_ALL`` on all ``/dev/sgx_vepc`` file descriptors
296    mapped into the guest.  This means that the ioctl() must be called
297    twice: an initial set of calls to remove child pages and a subsequent
298    set of calls to remove SECS pages.  The second set of calls is only
299    required for those mappings that returned a nonzero value from the
300    first call.  It indicates a bug in the kernel or the userspace client
301    if any of the second round of ``SGX_IOC_VEPC_REMOVE_ALL`` calls has
302    a return code other than 0.

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php