~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/arch/x86/shstk.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 .. SPDX-License-Identifier: GPL-2.0
  2 
  3 ======================================================
  4 Control-flow Enforcement Technology (CET) Shadow Stack
  5 ======================================================
  6 
  7 CET Background
  8 ==============
  9 
 10 Control-flow Enforcement Technology (CET) covers several related x86 processor
 11 features that provide protection against control flow hijacking attacks. CET
 12 can protect both applications and the kernel.
 13 
 14 CET introduces shadow stack and indirect branch tracking (IBT). A shadow stack
 15 is a secondary stack allocated from memory which cannot be directly modified by
 16 applications. When executing a CALL instruction, the processor pushes the
 17 return address to both the normal stack and the shadow stack. Upon
 18 function return, the processor pops the shadow stack copy and compares it
 19 to the normal stack copy. If the two differ, the processor raises a
 20 control-protection fault. IBT verifies indirect CALL/JMP targets are intended
 21 as marked by the compiler with 'ENDBR' opcodes. Not all CPU's have both Shadow
 22 Stack and Indirect Branch Tracking. Today in the 64-bit kernel, only userspace
 23 shadow stack and kernel IBT are supported.
 24 
 25 Requirements to use Shadow Stack
 26 ================================
 27 
 28 To use userspace shadow stack you need HW that supports it, a kernel
 29 configured with it and userspace libraries compiled with it.
 30 
 31 The kernel Kconfig option is X86_USER_SHADOW_STACK.  When compiled in, shadow
 32 stacks can be disabled at runtime with the kernel parameter: nousershstk.
 33 
 34 To build a user shadow stack enabled kernel, Binutils v2.29 or LLVM v6 or later
 35 are required.
 36 
 37 At run time, /proc/cpuinfo shows CET features if the processor supports
 38 CET. "user_shstk" means that userspace shadow stack is supported on the current
 39 kernel and HW.
 40 
 41 Application Enabling
 42 ====================
 43 
 44 An application's CET capability is marked in its ELF note and can be verified
 45 from readelf/llvm-readelf output::
 46 
 47     readelf -n <application> | grep -a SHSTK
 48         properties: x86 feature: SHSTK
 49 
 50 The kernel does not process these applications markers directly. Applications
 51 or loaders must enable CET features using the interface described in section 4.
 52 Typically this would be done in dynamic loader or static runtime objects, as is
 53 the case in GLIBC.
 54 
 55 Enabling arch_prctl()'s
 56 =======================
 57 
 58 Elf features should be enabled by the loader using the below arch_prctl's. They
 59 are only supported in 64 bit user applications. These operate on the features
 60 on a per-thread basis. The enablement status is inherited on clone, so if the
 61 feature is enabled on the first thread, it will propagate to all the thread's
 62 in an app.
 63 
 64 arch_prctl(ARCH_SHSTK_ENABLE, unsigned long feature)
 65     Enable a single feature specified in 'feature'. Can only operate on
 66     one feature at a time.
 67 
 68 arch_prctl(ARCH_SHSTK_DISABLE, unsigned long feature)
 69     Disable a single feature specified in 'feature'. Can only operate on
 70     one feature at a time.
 71 
 72 arch_prctl(ARCH_SHSTK_LOCK, unsigned long features)
 73     Lock in features at their current enabled or disabled status. 'features'
 74     is a mask of all features to lock. All bits set are processed, unset bits
 75     are ignored. The mask is ORed with the existing value. So any feature bits
 76     set here cannot be enabled or disabled afterwards.
 77 
 78 arch_prctl(ARCH_SHSTK_UNLOCK, unsigned long features)
 79     Unlock features. 'features' is a mask of all features to unlock. All
 80     bits set are processed, unset bits are ignored. Only works via ptrace.
 81 
 82 arch_prctl(ARCH_SHSTK_STATUS, unsigned long addr)
 83     Copy the currently enabled features to the address passed in addr. The
 84     features are described using the bits passed into the others in
 85     'features'.
 86 
 87 The return values are as follows. On success, return 0. On error, errno can
 88 be::
 89 
 90         -EPERM if any of the passed feature are locked.
 91         -ENOTSUPP if the feature is not supported by the hardware or
 92          kernel.
 93         -EINVAL arguments (non existing feature, etc)
 94         -EFAULT if could not copy information back to userspace
 95 
 96 The feature's bits supported are::
 97 
 98     ARCH_SHSTK_SHSTK - Shadow stack
 99     ARCH_SHSTK_WRSS  - WRSS
100 
101 Currently shadow stack and WRSS are supported via this interface. WRSS
102 can only be enabled with shadow stack, and is automatically disabled
103 if shadow stack is disabled.
104 
105 Proc Status
106 ===========
107 To check if an application is actually running with shadow stack, the
108 user can read the /proc/$PID/status. It will report "wrss" or "shstk"
109 depending on what is enabled. The lines look like this::
110 
111     x86_Thread_features: shstk wrss
112     x86_Thread_features_locked: shstk wrss
113 
114 Implementation of the Shadow Stack
115 ==================================
116 
117 Shadow Stack Size
118 -----------------
119 
120 A task's shadow stack is allocated from memory to a fixed size of
121 MIN(RLIMIT_STACK, 4 GB). In other words, the shadow stack is allocated to
122 the maximum size of the normal stack, but capped to 4 GB. In the case
123 of the clone3 syscall, there is a stack size passed in and shadow stack
124 uses this instead of the rlimit.
125 
126 Signal
127 ------
128 
129 The main program and its signal handlers use the same shadow stack. Because
130 the shadow stack stores only return addresses, a large shadow stack covers
131 the condition that both the program stack and the signal alternate stack run
132 out.
133 
134 When a signal happens, the old pre-signal state is pushed on the stack. When
135 shadow stack is enabled, the shadow stack specific state is pushed onto the
136 shadow stack. Today this is only the old SSP (shadow stack pointer), pushed
137 in a special format with bit 63 set. On sigreturn this old SSP token is
138 verified and restored by the kernel. The kernel will also push the normal
139 restorer address to the shadow stack to help userspace avoid a shadow stack
140 violation on the sigreturn path that goes through the restorer.
141 
142 So the shadow stack signal frame format is as follows::
143 
144     |1...old SSP| - Pointer to old pre-signal ssp in sigframe token format
145                     (bit 63 set to 1)
146     |        ...| - Other state may be added in the future
147 
148 
149 32 bit ABI signals are not supported in shadow stack processes. Linux prevents
150 32 bit execution while shadow stack is enabled by the allocating shadow stacks
151 outside of the 32 bit address space. When execution enters 32 bit mode, either
152 via far call or returning to userspace, a #GP is generated by the hardware
153 which, will be delivered to the process as a segfault. When transitioning to
154 userspace the register's state will be as if the userspace ip being returned to
155 caused the segfault.
156 
157 Fork
158 ----
159 
160 The shadow stack's vma has VM_SHADOW_STACK flag set; its PTEs are required
161 to be read-only and dirty. When a shadow stack PTE is not RO and dirty, a
162 shadow access triggers a page fault with the shadow stack access bit set
163 in the page fault error code.
164 
165 When a task forks a child, its shadow stack PTEs are copied and both the
166 parent's and the child's shadow stack PTEs are cleared of the dirty bit.
167 Upon the next shadow stack access, the resulting shadow stack page fault
168 is handled by page copy/re-use.
169 
170 When a pthread child is created, the kernel allocates a new shadow stack
171 for the new thread. New shadow stack creation behaves like mmap() with respect
172 to ASLR behavior. Similarly, on thread exit the thread's shadow stack is
173 disabled.
174 
175 Exec
176 ----
177 
178 On exec, shadow stack features are disabled by the kernel. At which point,
179 userspace can choose to re-enable, or lock them.

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php