1 .. SPDX-License-Identifier: GPL-2.0 1 .. SPDX-License-Identifier: GPL-2.0 2 2 3 ================================== 3 ================================== 4 Introduction of non-executable mfd 4 Introduction of non-executable mfd 5 ================================== 5 ================================== 6 :Author: 6 :Author: 7 Daniel Verkamp <dverkamp@chromium.org> 7 Daniel Verkamp <dverkamp@chromium.org> 8 Jeff Xu <jeffxu@chromium.org> 8 Jeff Xu <jeffxu@chromium.org> 9 9 10 :Contributor: 10 :Contributor: 11 Aleksa Sarai <cyphar@cyphar.com> 11 Aleksa Sarai <cyphar@cyphar.com> 12 12 13 Since Linux introduced the memfd feature, memf 13 Since Linux introduced the memfd feature, memfds have always had their 14 execute bit set, and the memfd_create() syscal 14 execute bit set, and the memfd_create() syscall doesn't allow setting 15 it differently. 15 it differently. 16 16 17 However, in a secure-by-default system, such a 17 However, in a secure-by-default system, such as ChromeOS, (where all 18 executables should come from the rootfs, which 18 executables should come from the rootfs, which is protected by verified 19 boot), this executable nature of memfd opens a 19 boot), this executable nature of memfd opens a door for NoExec bypass 20 and enables “confused deputy attack”. E.g 20 and enables “confused deputy attack”. E.g, in VRP bug [1]: cros_vm 21 process created a memfd to share the content w 21 process created a memfd to share the content with an external process, 22 however the memfd is overwritten and used for 22 however the memfd is overwritten and used for executing arbitrary code 23 and root escalation. [2] lists more VRP of thi 23 and root escalation. [2] lists more VRP of this kind. 24 24 25 On the other hand, executable memfd has its le 25 On the other hand, executable memfd has its legit use: runc uses memfd’s 26 seal and executable feature to copy the conten 26 seal and executable feature to copy the contents of the binary then 27 execute them. For such a system, we need a sol 27 execute them. For such a system, we need a solution to differentiate runc's 28 use of executable memfds and an attacker's [3] 28 use of executable memfds and an attacker's [3]. 29 29 30 To address those above: 30 To address those above: 31 - Let memfd_create() set X bit at creation ti 31 - Let memfd_create() set X bit at creation time. 32 - Let memfd be sealed for modifying X bit whe 32 - Let memfd be sealed for modifying X bit when NX is set. 33 - Add a new pid namespace sysctl: vm.memfd_no 33 - Add a new pid namespace sysctl: vm.memfd_noexec to help applications in 34 migrating and enforcing non-executable MFD. 34 migrating and enforcing non-executable MFD. 35 35 36 User API 36 User API 37 ======== 37 ======== 38 ``int memfd_create(const char *name, unsigned 38 ``int memfd_create(const char *name, unsigned int flags)`` 39 39 40 ``MFD_NOEXEC_SEAL`` 40 ``MFD_NOEXEC_SEAL`` 41 When MFD_NOEXEC_SEAL bit is set in the 41 When MFD_NOEXEC_SEAL bit is set in the ``flags``, memfd is created 42 with NX. F_SEAL_EXEC is set and the me 42 with NX. F_SEAL_EXEC is set and the memfd can't be modified to 43 add X later. MFD_ALLOW_SEALING is also 43 add X later. MFD_ALLOW_SEALING is also implied. 44 This is the most common case for the a 44 This is the most common case for the application to use memfd. 45 45 46 ``MFD_EXEC`` 46 ``MFD_EXEC`` 47 When MFD_EXEC bit is set in the ``flag 47 When MFD_EXEC bit is set in the ``flags``, memfd is created with X. 48 48 49 Note: 49 Note: 50 ``MFD_NOEXEC_SEAL`` implies ``MFD_ALLO 50 ``MFD_NOEXEC_SEAL`` implies ``MFD_ALLOW_SEALING``. In case that 51 an app doesn't want sealing, it can ad 51 an app doesn't want sealing, it can add F_SEAL_SEAL after creation. 52 52 53 53 54 Sysctl: 54 Sysctl: 55 ======== 55 ======== 56 ``pid namespaced sysctl vm.memfd_noexec`` 56 ``pid namespaced sysctl vm.memfd_noexec`` 57 57 58 The new pid namespaced sysctl vm.memfd_noexec 58 The new pid namespaced sysctl vm.memfd_noexec has 3 values: 59 59 60 - 0: MEMFD_NOEXEC_SCOPE_EXEC 60 - 0: MEMFD_NOEXEC_SCOPE_EXEC 61 memfd_create() without MFD_EXEC nor MF 61 memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like 62 MFD_EXEC was set. 62 MFD_EXEC was set. 63 63 64 - 1: MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL 64 - 1: MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL 65 memfd_create() without MFD_EXEC nor MF 65 memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like 66 MFD_NOEXEC_SEAL was set. 66 MFD_NOEXEC_SEAL was set. 67 67 68 - 2: MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED 68 - 2: MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED 69 memfd_create() without MFD_NOEXEC_SEAL 69 memfd_create() without MFD_NOEXEC_SEAL will be rejected. 70 70 71 The sysctl allows finer control of memfd_creat 71 The sysctl allows finer control of memfd_create for old software that 72 doesn't set the executable bit; for example, a 72 doesn't set the executable bit; for example, a container with 73 vm.memfd_noexec=1 means the old software will 73 vm.memfd_noexec=1 means the old software will create non-executable memfd 74 by default while new software can create execu 74 by default while new software can create executable memfd by setting 75 MFD_EXEC. 75 MFD_EXEC. 76 76 77 The value of vm.memfd_noexec is passed to chil 77 The value of vm.memfd_noexec is passed to child namespace at creation 78 time. In addition, the setting is hierarchical 78 time. In addition, the setting is hierarchical, i.e. during memfd_create, 79 we will search from current ns to root ns and 79 we will search from current ns to root ns and use the most restrictive 80 setting. 80 setting. 81 81 82 [1] https://crbug.com/1305267 82 [1] https://crbug.com/1305267 83 83 84 [2] https://bugs.chromium.org/p/chromium/issue 84 [2] https://bugs.chromium.org/p/chromium/issues/list?q=type%3Dbug-security%20memfd%20escalation&can=1 85 85 86 [3] https://lwn.net/Articles/781013/ 86 [3] https://lwn.net/Articles/781013/
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.