1 .. SPDX-License-Identifier: GPL-2.0 2 3 ===================== 4 Syscall User Dispatch 5 ===================== 6 7 Background 8 ---------- 9 10 Compatibility layers like Wine need a way to e 11 calls of only a part of their process - the pa 12 incompatible code - while being able to execut 13 a high performance penalty on the native part 14 falls short on this task, since it has limited 15 filter syscalls based on memory regions, and i 16 filters. Therefore a new mechanism is necessa 17 18 Syscall User Dispatch brings the filtering of 19 address back to userspace. The application is 20 switch, indicating the current personality of 21 multiple-personality application can then flip 22 invoking the kernel, when crossing the compati 23 boundaries, to enable/disable the syscall redi 24 syscalls directly (disabled) or send them to b 25 through a SIGSYS. 26 27 The goal of this design is to provide very qui 28 boundary crosses, which is achieved by not exe 29 personality every time the compatibility layer 30 userspace memory region exposed to the kernel 31 personality, and the application simply modifi 32 configure the mechanism. 33 34 There is a relatively high cost associated wit 35 architectures, like x86, but at least for Wine 36 native Windows code are currently not known to 37 since they are quite rare, at least for modern 38 39 Since this mechanism is designed to capture sy 40 non-native applications, it must function on s 41 ABI is completely unexpected to Linux. Syscal 42 doesn't rely on any of the syscall ABI to make 43 only the syscall dispatcher address and the us 44 45 As the ABI of these intercepted syscalls is un 46 syscalls are not instrumentable via ptrace or 47 48 Interface 49 --------- 50 51 A thread can setup this mechanism on supported 52 following prctl: 53 54 prctl(PR_SET_SYSCALL_USER_DISPATCH, <op>, <o 55 56 <op> is either PR_SYS_DISPATCH_ON or PR_SYS_DI 57 disable the mechanism globally for that thread 58 PR_SYS_DISPATCH_OFF is used, the other fields 59 60 [<offset>, <offset>+<length>) delimit a memory 61 from which syscalls are always executed direct 62 userspace selector. This provides a fast path 63 includes the most common syscall dispatchers i 64 applications, and also provides a way for the 65 without triggering a nested SIGSYS on (rt\_)si 66 interface should make sure that at least the s 67 included in this region. In addition, for sysc 68 trampoline code on the vDSO, that trampoline i 69 70 [selector] is a pointer to a char-sized region 71 region, that provides a quick way to enable di 72 thread-wide, without the need to invoke the ke 73 can be set to SYSCALL_DISPATCH_FILTER_ALLOW or 74 Any other value should terminate the program w 75 76 Additionally, a tasks syscall user dispatch co 77 and poked via the PTRACE_(GET|SET)_SYSCALL_USE 78 requests. This is useful for checkpoint/restar 79 80 Security Notes 81 -------------- 82 83 Syscall User Dispatch provides functionality f 84 quickly capture system calls issued by a non-n 85 application, while not impacting the Linux nat 86 process. It is not a mechanism for sandboxing 87 should not be seen as a security mechanism, si 88 malicious application to subvert the mechanism 89 dispatcher region prior to executing the sysca 90 address and modify the selector value. If the 91 kind of security sandboxing, Seccomp should be 92 93 Any fork or exec of the existing process reset 94 PR_SYS_DISPATCH_OFF.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.