~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/arch/x86/pti.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/arch/x86/pti.rst (Version linux-6.12-rc7) and /Documentation/arch/mips/pti.rst (Version linux-4.9.337)


  1 .. SPDX-License-Identifier: GPL-2.0               
  2                                                   
  3 ==========================                        
  4 Page Table Isolation (PTI)                        
  5 ==========================                        
  6                                                   
  7 Overview                                          
  8 ========                                          
  9                                                   
 10 Page Table Isolation (pti, previously known as    
 11 countermeasure against attacks on the shared u    
 12 space such as the "Meltdown" approach [2]_.       
 13                                                   
 14 To mitigate this class of attacks, we create a    
 15 page tables for use only when running userspac    
 16 the kernel is entered via syscalls, interrupts    
 17 page tables are switched to the full "kernel"     
 18 switches back to user mode, the user copy is u    
 19                                                   
 20 The userspace page tables contain only a minim    
 21 data: only what is needed to enter/exit the ke    
 22 entry/exit functions themselves and the interr    
 23 (IDT).  There are a few strictly unnecessary t    
 24 such as the first C function when entering an     
 25 comments in pti.c).                               
 26                                                   
 27 This approach helps to ensure that side-channe    
 28 the paging structures do not function when PTI    
 29 enabled by setting CONFIG_MITIGATION_PAGE_TABL    
 30 time.  Once enabled at compile-time, it can be    
 31 the 'nopti' or 'pti=' kernel parameters (see k    
 32                                                   
 33 Page Table Management                             
 34 =====================                             
 35                                                   
 36 When PTI is enabled, the kernel manages two se    
 37 The first set is very similar to the single se    
 38 kernels without PTI.  This includes a complete    
 39 that the kernel can use for things like copy_t    
 40                                                   
 41 Although _complete_, the user portion of the k    
 42 crippled by setting the NX bit in the top leve    
 43 that any missed kernel->user CR3 switch will i    
 44 userspace upon executing its first instruction    
 45                                                   
 46 The userspace page tables map only the kernel     
 47 and exit the kernel.  This data is entirely co    
 48 cpu_entry_area' structure which is placed in t    
 49 each CPU's copy of the area a compile-time-fix    
 50                                                   
 51 For new userspace mappings, the kernel makes t    
 52 page tables like normal.  The only difference     
 53 makes entries in the top (PGD) level.  In addi    
 54 entry in the main kernel PGD, a copy of the en    
 55 userspace page tables' PGD.                       
 56                                                   
 57 This sharing at the PGD level also inherently     
 58 layers of the page tables.  This leaves a sing    
 59 userspace page tables to manage.  One PTE to l    
 60 accessed bits, dirty bits, etc...                 
 61                                                   
 62 Overhead                                          
 63 ========                                          
 64                                                   
 65 Protection against side-channel attacks is imp    
 66 this protection comes at a cost:                  
 67                                                   
 68 1. Increased Memory Use                           
 69                                                   
 70   a. Each process now needs an order-1 PGD ins    
 71      (Consumes an additional 4k per process).     
 72   b. The 'cpu_entry_area' structure must be 2M    
 73      aligned so that it can be mapped by setti    
 74      entry.  This consumes nearly 2MB of RAM o    
 75      is decompressed, but no space in the kern    
 76                                                   
 77 2. Runtime Cost                                   
 78                                                   
 79   a. CR3 manipulation to switch between the pa    
 80      must be done at interrupt, syscall, and e    
 81      and exit (it can be skipped when the kern    
 82      though.)  Moves to CR3 are on the order o    
 83      cycles, and are required at every entry a    
 84   b. Percpu TSS is mapped into the user page t    
 85      to work under PTI. This doesn't have a di    
 86      be argued it opens certain timing attack     
 87   c. Global pages are disabled for all kernel     
 88      mapped into both kernel and userspace pag    
 89      feature of the MMU allows different proce    
 90      entries mapping the kernel.  Losing the f    
 91      TLB misses after a context switch.  The a    
 92      performance is very small, however, never    
 93   d. Process Context IDentifiers (PCID) is a C    
 94      allows us to skip flushing the entire TLB    
 95      tables by setting a special bit in CR3 wh    
 96      are changed.  This makes switching the pa    
 97      switch, or kernel entry/exit) cheaper.  B    
 98      PCID support, the context switch code mus    
 99      and kernel entries out of the TLB.  The u    
100      deferred until the exit to userspace, min    
101      See intel.com/sdm for the gory PCID/INVPC    
102   e. The userspace page tables must be populat    
103      process.  Even without PTI, the shared ke    
104      are created by copying top-level (PGD) en    
105      new process.  But, with PTI, there are no    
106      mappings: one in the kernel page tables t    
107      and one for the entry/exit structures.  A    
108      copy both.                                   
109   f. In addition to the fork()-time copying, t    
110      be an update to the userspace PGD any tim    
111      on a PGD used to map userspace.  This ens    
112      and userspace copies always map the same     
113      memory.                                      
114   g. On systems without PCID support, each CR3    
115      the entire TLB.  That means that each sys    
116      or exception flushes the TLB.                
117   h. INVPCID is a TLB-flushing instruction whi    
118      of TLB entries for non-current PCIDs.  So    
119      PCIDs, but do not support INVPCID.  On th    
120      can only be flushed from the TLB for the     
121      flushing a kernel address, we need to flu    
122      single kernel address flush will require     
123      write upon the next use of every PCID.       
124                                                   
125 Possible Future Work                              
126 ====================                              
127 1. We can be more careful about not actually w    
128    unless its value is actually changed.          
129 2. Allow PTI to be enabled/disabled at runtime    
130    boot-time switching.                           
131                                                   
132 Testing                                           
133 ========                                          
134                                                   
135 To test stability of PTI, the following test p    
136 ideally doing all of these in parallel:           
137                                                   
138 1. Set CONFIG_DEBUG_ENTRY=y                       
139 2. Run several copies of all of the tools/test    
140    (excluding MPX and protection_keys) in a lo    
141    several minutes.  These tests frequently un    
142    kernel entry code.  In general, old kernels    
143    themselves to crash, but they should never     
144 3. Run the 'perf' tool in a mode (top or recor    
145    frequent performance monitoring non-maskabl    
146    in /proc/interrupts).  This exercises the N    
147    is known to trigger bugs in code paths that    
148    interrupted, including nested NMIs.  Using     
149    NMIs, and using two -c with separate counte    
150    and less deterministic behavior.               
151    ::                                             
152                                                   
153         while true; do perf record -c 10000 -e    
154                                                   
155 4. Launch a KVM virtual machine.                  
156 5. Run 32-bit binaries on systems supporting t    
157    This has been a lightly-tested code path an    
158                                                   
159 Debugging                                         
160 =========                                         
161                                                   
162 Bugs in PTI cause a few different signatures o    
163 that are worth noting here.                       
164                                                   
165  * Failures of the selftests/x86 code.  Usuall    
166    more obscure corners of entry_64.S             
167  * Crashes in early boot, especially around CP    
168    in the mappings cause these.                   
169  * Crashes at the first interrupt.  Caused by     
170    like screwing up a page table switch.  Also    
171    incorrectly mapping the IRQ handler entry c    
172  * Crashes at the first NMI.  The NMI code is     
173    interrupt handlers and can have bugs that d    
174    normal interrupts.  Also caused by incorrec    
175    code.  NMIs that interrupt the entry code m    
176    careful and can be the cause of crashes tha    
177    running perf.                                  
178  * Kernel crashes at the first exit to userspa    
179    bugs, or failing to map some of the exit co    
180  * Crashes at first interrupt that interrupts     
181    in entry_64.S that return to userspace are     
182    from the ones that return to the kernel.       
183  * Double faults: overflowing the kernel stack    
184    faults upon page faults.  Caused by touchin    
185    data in the entry code, or forgetting to sw    
186    CR3 before calling into C functions which a    
187  * Userspace segfaults early in boot, sometime    
188    as mount(8) failing to mount the rootfs.  T    
189    tended to be TLB invalidation issues.  Usua    
190    the wrong PCID, or otherwise missing an inv    
191                                                   
192 .. [1] https://gruss.cc/files/kaiser.pdf          
193 .. [2] https://meltdownattack.com/meltdown.pdf    
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php