1 .. SPDX-License-Identifier: GPL-2.0 1 .. SPDX-License-Identifier: GPL-2.0 2 2 3 ====================== 3 ====================== 4 Memory Protection Keys 4 Memory Protection Keys 5 ====================== 5 ====================== 6 6 7 Memory Protection Keys provide a mechanism for !! 7 Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature 8 protections, but without requiring modificatio !! 8 which is found on Intel's Skylake (and later) "Scalable Processor" 9 application changes protection domains. !! 9 Server CPUs. It will be available in future non-server Intel parts >> 10 and future AMD processors. 10 11 11 Pkeys Userspace (PKU) is a feature which can b !! 12 For anyone wishing to test or use this feature, it is available in 12 * Intel server CPUs, Skylake and later !! 13 Amazon's EC2 C5 instances and is known to work there using an Ubuntu 13 * Intel client CPUs, Tiger Lake (11th !! 14 17.04 image. 14 * Future AMD CPUs << 15 15 16 Pkeys work by dedicating 4 previously Reserved !! 16 Memory Protection Keys provides a mechanism for enforcing page-based 17 a "protection key", giving 16 possible keys. !! 17 protections, but without requiring modification of the page tables >> 18 when an application changes protection domains. It works by >> 19 dedicating 4 previously ignored bits in each page table entry to a >> 20 "protection key", giving 16 possible keys. 18 21 19 Protections for each key are defined with a pe !! 22 There is also a new user-accessible register (PKRU) with two separate 20 (PKRU). Each of these is a 32-bit register st !! 23 bits (Access Disable and Write Disable) for each key. Being a CPU 21 and Write Disable) for each of 16 keys. !! 24 register, PKRU is inherently thread-local, potentially giving each 22 << 23 Being a CPU register, PKRU is inherently threa << 24 thread a different set of protections from eve 25 thread a different set of protections from every other thread. 25 26 26 There are two instructions (RDPKRU/WRPKRU) for !! 27 There are two new instructions (RDPKRU/WRPKRU) for reading and writing 27 register. The feature is only available in 64 !! 28 to the new register. The feature is only available in 64-bit mode, 28 theoretically space in the PAE PTEs. These pe !! 29 even though there is theoretically space in the PAE PTEs. These 29 access only and have no effect on instruction !! 30 permissions are enforced on data access only and have no effect on >> 31 instruction fetches. 30 32 31 Syscalls 33 Syscalls 32 ======== 34 ======== 33 35 34 There are 3 system calls which directly intera 36 There are 3 system calls which directly interact with pkeys:: 35 37 36 int pkey_alloc(unsigned long flags, un 38 int pkey_alloc(unsigned long flags, unsigned long init_access_rights) 37 int pkey_free(int pkey); 39 int pkey_free(int pkey); 38 int pkey_mprotect(unsigned long start, 40 int pkey_mprotect(unsigned long start, size_t len, 39 unsigned long prot, 41 unsigned long prot, int pkey); 40 42 41 Before a pkey can be used, it must first be al 43 Before a pkey can be used, it must first be allocated with 42 pkey_alloc(). An application calls the WRPKRU 44 pkey_alloc(). An application calls the WRPKRU instruction 43 directly in order to change access permissions 45 directly in order to change access permissions to memory covered 44 with a key. In this example WRPKRU is wrapped 46 with a key. In this example WRPKRU is wrapped by a C function 45 called pkey_set(). 47 called pkey_set(). 46 :: 48 :: 47 49 48 int real_prot = PROT_READ|PROT_WRITE; 50 int real_prot = PROT_READ|PROT_WRITE; 49 pkey = pkey_alloc(0, PKEY_DISABLE_WRIT 51 pkey = pkey_alloc(0, PKEY_DISABLE_WRITE); 50 ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, 52 ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); 51 ret = pkey_mprotect(ptr, PAGE_SIZE, re 53 ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey); 52 ... application runs here 54 ... application runs here 53 55 54 Now, if the application needs to update the da 56 Now, if the application needs to update the data at 'ptr', it can 55 gain access, do the update, then remove its wr 57 gain access, do the update, then remove its write access:: 56 58 57 pkey_set(pkey, 0); // clear PKEY_DISAB 59 pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE 58 *ptr = foo; // assign something 60 *ptr = foo; // assign something 59 pkey_set(pkey, PKEY_DISABLE_WRITE); // 61 pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again 60 62 61 Now when it frees the memory, it will also fre 63 Now when it frees the memory, it will also free the pkey since it 62 is no longer in use:: 64 is no longer in use:: 63 65 64 munmap(ptr, PAGE_SIZE); 66 munmap(ptr, PAGE_SIZE); 65 pkey_free(pkey); 67 pkey_free(pkey); 66 68 67 .. note:: pkey_set() is a wrapper for the RDPK 69 .. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions. 68 An example implementation can be fou 70 An example implementation can be found in 69 tools/testing/selftests/x86/protecti 71 tools/testing/selftests/x86/protection_keys.c. 70 72 71 Behavior 73 Behavior 72 ======== 74 ======== 73 75 74 The kernel attempts to make protection keys co 76 The kernel attempts to make protection keys consistent with the 75 behavior of a plain mprotect(). For instance 77 behavior of a plain mprotect(). For instance if you do this:: 76 78 77 mprotect(ptr, size, PROT_NONE); 79 mprotect(ptr, size, PROT_NONE); 78 something(ptr); 80 something(ptr); 79 81 80 you can expect the same effects with protectio 82 you can expect the same effects with protection keys when doing this:: 81 83 82 pkey = pkey_alloc(0, PKEY_DISABLE_WRIT 84 pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ); 83 pkey_mprotect(ptr, size, PROT_READ|PRO 85 pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey); 84 something(ptr); 86 something(ptr); 85 87 86 That should be true whether something() is a d 88 That should be true whether something() is a direct access to 'ptr' 87 like:: 89 like:: 88 90 89 *ptr = foo; 91 *ptr = foo; 90 92 91 or when the kernel does the access on the appl 93 or when the kernel does the access on the application's behalf like 92 with a read():: 94 with a read():: 93 95 94 read(fd, ptr, 1); 96 read(fd, ptr, 1); 95 97 96 The kernel will send a SIGSEGV in both cases, 98 The kernel will send a SIGSEGV in both cases, but si_code will be set 97 to SEGV_PKERR when violating protection keys v 99 to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when 98 the plain mprotect() permissions are violated. 100 the plain mprotect() permissions are violated.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.