~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/mm/vmalloced-kernel-stacks.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 .. SPDX-License-Identifier: GPL-2.0
  2 
  3 =====================================
  4 Virtually Mapped Kernel Stack Support
  5 =====================================
  6 
  7 :Author: Shuah Khan <skhan@linuxfoundation.org>
  8 
  9 .. contents:: :local:
 10 
 11 Overview
 12 --------
 13 
 14 This is a compilation of information from the code and original patch
 15 series that introduced the `Virtually Mapped Kernel Stacks feature
 16 <https://lwn.net/Articles/694348/>`
 17 
 18 Introduction
 19 ------------
 20 
 21 Kernel stack overflows are often hard to debug and make the kernel
 22 susceptible to exploits. Problems could show up at a later time making
 23 it difficult to isolate and root-cause.
 24 
 25 Virtually mapped kernel stacks with guard pages cause kernel stack
 26 overflows to be caught immediately rather than causing difficult to
 27 diagnose corruptions.
 28 
 29 HAVE_ARCH_VMAP_STACK and VMAP_STACK configuration options enable
 30 support for virtually mapped stacks with guard pages. This feature
 31 causes reliable faults when the stack overflows. The usability of
 32 the stack trace after overflow and response to the overflow itself
 33 is architecture dependent.
 34 
 35 .. note::
 36         As of this writing, arm64, powerpc, riscv, s390, um, and x86 have
 37         support for VMAP_STACK.
 38 
 39 HAVE_ARCH_VMAP_STACK
 40 --------------------
 41 
 42 Architectures that can support Virtually Mapped Kernel Stacks should
 43 enable this bool configuration option. The requirements are:
 44 
 45 - vmalloc space must be large enough to hold many kernel stacks. This
 46   may rule out many 32-bit architectures.
 47 - Stacks in vmalloc space need to work reliably.  For example, if
 48   vmap page tables are created on demand, either this mechanism
 49   needs to work while the stack points to a virtual address with
 50   unpopulated page tables or arch code (switch_to() and switch_mm(),
 51   most likely) needs to ensure that the stack's page table entries
 52   are populated before running on a possibly unpopulated stack.
 53 - If the stack overflows into a guard page, something reasonable
 54   should happen. The definition of "reasonable" is flexible, but
 55   instantly rebooting without logging anything would be unfriendly.
 56 
 57 VMAP_STACK
 58 ----------
 59 
 60 When enabled, the VMAP_STACK bool configuration option allocates virtually
 61 mapped task stacks. This option depends on HAVE_ARCH_VMAP_STACK.
 62 
 63 - Enable this if you want the use virtually-mapped kernel stacks
 64   with guard pages. This causes kernel stack overflows to be caught
 65   immediately rather than causing difficult-to-diagnose corruption.
 66 
 67 .. note::
 68 
 69         Using this feature with KASAN requires architecture support
 70         for backing virtual mappings with real shadow memory, and
 71         KASAN_VMALLOC must be enabled.
 72 
 73 .. note::
 74 
 75         VMAP_STACK is enabled, it is not possible to run DMA on stack
 76         allocated data.
 77 
 78 Kernel configuration options and dependencies keep changing. Refer to
 79 the latest code base:
 80 
 81 `Kconfig <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/Kconfig>`
 82 
 83 Allocation
 84 -----------
 85 
 86 When a new kernel thread is created, a thread stack is allocated from
 87 virtually contiguous memory pages from the page level allocator. These
 88 pages are mapped into contiguous kernel virtual space with PAGE_KERNEL
 89 protections.
 90 
 91 alloc_thread_stack_node() calls __vmalloc_node_range() to allocate stack
 92 with PAGE_KERNEL protections.
 93 
 94 - Allocated stacks are cached and later reused by new threads, so memcg
 95   accounting is performed manually on assigning/releasing stacks to tasks.
 96   Hence, __vmalloc_node_range is called without __GFP_ACCOUNT.
 97 - vm_struct is cached to be able to find when thread free is initiated
 98   in interrupt context. free_thread_stack() can be called in interrupt
 99   context.
100 - On arm64, all VMAP's stacks need to have the same alignment to ensure
101   that VMAP'd stack overflow detection works correctly. Arch specific
102   vmap stack allocator takes care of this detail.
103 - This does not address interrupt stacks - according to the original patch
104 
105 Thread stack allocation is initiated from clone(), fork(), vfork(),
106 kernel_thread() via kernel_clone(). These are a few hints for searching
107 the code base to understand when and how a thread stack is allocated.
108 
109 Bulk of the code is in:
110 `kernel/fork.c <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/fork.c>`.
111 
112 stack_vm_area pointer in task_struct keeps track of the virtually allocated
113 stack and a non-null stack_vm_area pointer serves as a indication that the
114 virtually mapped kernel stacks are enabled.
115 
116 ::
117 
118         struct vm_struct *stack_vm_area;
119 
120 Stack overflow handling
121 -----------------------
122 
123 Leading and trailing guard pages help detect stack overflows. When stack
124 overflows into the guard pages, handlers have to be careful not overflow
125 the stack again. When handlers are called, it is likely that very little
126 stack space is left.
127 
128 On x86, this is done by handling the page fault indicating the kernel
129 stack overflow on the double-fault stack.
130 
131 Testing VMAP allocation with guard pages
132 ----------------------------------------
133 
134 How do we ensure that VMAP_STACK is actually allocating with a leading
135 and trailing guard page? The following lkdtm tests can help detect any
136 regressions.
137 
138 ::
139 
140         void lkdtm_STACK_GUARD_PAGE_LEADING()
141         void lkdtm_STACK_GUARD_PAGE_TRAILING()
142 
143 Conclusions
144 -----------
145 
146 - A percpu cache of vmalloced stacks appears to be a bit faster than a
147   high-order stack allocation, at least when the cache hits.
148 - THREAD_INFO_IN_TASK gets rid of arch-specific thread_info entirely and
149   simply embed the thread_info (containing only flags) and 'int cpu' into
150   task_struct.
151 - The thread stack can be free'ed as soon as the task is dead (without
152   waiting for RCU) and then, if vmapped stacks are in use, cache the
153   entire stack for reuse on the same cpu.

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php