~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/userspace-api/mseal.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 .. SPDX-License-Identifier: GPL-2.0
  2 
  3 =====================
  4 Introduction of mseal
  5 =====================
  6 
  7 :Author: Jeff Xu <jeffxu@chromium.org>
  8 
  9 Modern CPUs support memory permissions such as RW and NX bits. The memory
 10 permission feature improves security stance on memory corruption bugs, i.e.
 11 the attacker can’t just write to arbitrary memory and point the code to it,
 12 the memory has to be marked with X bit, or else an exception will happen.
 13 
 14 Memory sealing additionally protects the mapping itself against
 15 modifications. This is useful to mitigate memory corruption issues where a
 16 corrupted pointer is passed to a memory management system. For example,
 17 such an attacker primitive can break control-flow integrity guarantees
 18 since read-only memory that is supposed to be trusted can become writable
 19 or .text pages can get remapped. Memory sealing can automatically be
 20 applied by the runtime loader to seal .text and .rodata pages and
 21 applications can additionally seal security critical data at runtime.
 22 
 23 A similar feature already exists in the XNU kernel with the
 24 VM_FLAGS_PERMANENT flag [1] and on OpenBSD with the mimmutable syscall [2].
 25 
 26 SYSCALL
 27 =======
 28 mseal syscall signature
 29 -----------------------
 30    ``int mseal(void \* addr, size_t len, unsigned long flags)``
 31 
 32    **addr**/**len**: virtual memory address range.
 33       The address range set by **addr**/**len** must meet:
 34          - The start address must be in an allocated VMA.
 35          - The start address must be page aligned.
 36          - The end address (**addr** + **len**) must be in an allocated VMA.
 37          - no gap (unallocated memory) between start and end address.
 38 
 39       The ``len`` will be paged aligned implicitly by the kernel.
 40 
 41    **flags**: reserved for future use.
 42 
 43    **Return values**:
 44       - **0**: Success.
 45       - **-EINVAL**:
 46          * Invalid input ``flags``.
 47          * The start address (``addr``) is not page aligned.
 48          * Address range (``addr`` + ``len``) overflow.
 49       - **-ENOMEM**:
 50          * The start address (``addr``) is not allocated.
 51          * The end address (``addr`` + ``len``) is not allocated.
 52          * A gap (unallocated memory) between start and end address.
 53       - **-EPERM**:
 54          * sealing is supported only on 64-bit CPUs, 32-bit is not supported.
 55 
 56    **Note about error return**:
 57       - For above error cases, users can expect the given memory range is
 58         unmodified, i.e. no partial update.
 59       - There might be other internal errors/cases not listed here, e.g.
 60         error during merging/splitting VMAs, or the process reaching the maximum
 61         number of supported VMAs. In those cases, partial updates to the given
 62         memory range could happen. However, those cases should be rare.
 63 
 64    **Architecture support**:
 65       mseal only works on 64-bit CPUs, not 32-bit CPUs.
 66 
 67    **Idempotent**:
 68       users can call mseal multiple times. mseal on an already sealed memory
 69       is a no-action (not error).
 70 
 71    **no munseal**
 72       Once mapping is sealed, it can't be unsealed. The kernel should never
 73       have munseal, this is consistent with other sealing feature, e.g.
 74       F_SEAL_SEAL for file.
 75 
 76 Blocked mm syscall for sealed mapping
 77 -------------------------------------
 78    It might be important to note: **once the mapping is sealed, it will
 79    stay in the process's memory until the process terminates**.
 80 
 81    Example::
 82 
 83          *ptr = mmap(0, 4096, PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
 84          rc = mseal(ptr, 4096, 0);
 85          /* munmap will fail */
 86          rc = munmap(ptr, 4096);
 87          assert(rc < 0);
 88 
 89    Blocked mm syscall:
 90       - munmap
 91       - mmap
 92       - mremap
 93       - mprotect and pkey_mprotect
 94       - some destructive madvise behaviors: MADV_DONTNEED, MADV_FREE,
 95         MADV_DONTNEED_LOCKED, MADV_FREE, MADV_DONTFORK, MADV_WIPEONFORK
 96 
 97    The first set of syscalls to block is munmap, mremap, mmap. They can
 98    either leave an empty space in the address space, therefore allowing
 99    replacement with a new mapping with new set of attributes, or can
100    overwrite the existing mapping with another mapping.
101 
102    mprotect and pkey_mprotect are blocked because they changes the
103    protection bits (RWX) of the mapping.
104 
105    Certain destructive madvise behaviors, specifically MADV_DONTNEED,
106    MADV_FREE, MADV_DONTNEED_LOCKED, and MADV_WIPEONFORK, can introduce
107    risks when applied to anonymous memory by threads lacking write
108    permissions. Consequently, these operations are prohibited under such
109    conditions. The aforementioned behaviors have the potential to modify
110    region contents by discarding pages, effectively performing a memset(0)
111    operation on the anonymous memory.
112 
113    Kernel will return -EPERM for blocked syscalls.
114 
115    When blocked syscall return -EPERM due to sealing, the memory regions may
116    or may not be changed, depends on the syscall being blocked:
117 
118       - munmap: munmap is atomic. If one of VMAs in the given range is
119         sealed, none of VMAs are updated.
120       - mprotect, pkey_mprotect, madvise: partial update might happen, e.g.
121         when mprotect over multiple VMAs, mprotect might update the beginning
122         VMAs before reaching the sealed VMA and return -EPERM.
123       - mmap and mremap: undefined behavior.
124 
125 Use cases
126 =========
127 - glibc:
128   The dynamic linker, during loading ELF executables, can apply sealing to
129   mapping segments.
130 
131 - Chrome browser: protect some security sensitive data structures.
132 
133 When not to use mseal
134 =====================
135 Applications can apply sealing to any virtual memory region from userspace,
136 but it is *crucial to thoroughly analyze the mapping's lifetime* prior to
137 apply the sealing. This is because the sealed mapping *won’t be unmapped*
138 until the process terminates or the exec system call is invoked.
139 
140 For example:
141    - aio/shm
142      aio/shm can call mmap and  munmap on behalf of userspace, e.g.
143      ksys_shmdt() in shm.c. The lifetimes of those mapping are not tied to
144      the lifetime of the process. If those memories are sealed from userspace,
145      then munmap will fail, causing leaks in VMA address space during the
146      lifetime of the process.
147 
148    - ptr allocated by malloc (heap)
149      Don't use mseal on the memory ptr return from malloc().
150      malloc() is implemented by allocator, e.g. by glibc. Heap manager might
151      allocate a ptr from brk or mapping created by mmap.
152      If an app calls mseal on a ptr returned from malloc(), this can affect
153      the heap manager's ability to manage the mappings; the outcome is
154      non-deterministic.
155 
156      Example::
157 
158         ptr = malloc(size);
159         /* don't call mseal on ptr return from malloc. */
160         mseal(ptr, size);
161         /* free will success, allocator can't shrink heap lower than ptr */
162         free(ptr);
163 
164 mseal doesn't block
165 ===================
166 In a nutshell, mseal blocks certain mm syscall from modifying some of VMA's
167 attributes, such as protection bits (RWX). Sealed mappings doesn't mean the
168 memory is immutable.
169 
170 As Jann Horn pointed out in [3], there are still a few ways to write
171 to RO memory, which is, in a way, by design. And those could be blocked
172 by different security measures.
173 
174 Those cases are:
175 
176    - Write to read-only memory through /proc/self/mem interface (FOLL_FORCE).
177    - Write to read-only memory through ptrace (such as PTRACE_POKETEXT).
178    - userfaultfd.
179 
180 The idea that inspired this patch comes from Stephen Röttger’s work in V8
181 CFI [4]. Chrome browser in ChromeOS will be the first user of this API.
182 
183 Reference
184 =========
185 - [1] https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b9f69dbd3c8c3fd30a/osfmk/mach/vm_statistics.h#L274
186 - [2] https://man.openbsd.org/mimmutable.2
187 - [3] https://lore.kernel.org/lkml/CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426FkcgnfUGLvA@mail.gmail.com
188 - [4] https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXgeaRHo/edit#heading=h.bvaojj9fu6hc

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php