1 unshare system call 2 =================== 3 4 This document describes the new system call, u 5 provides an overview of the feature, why it is 6 be used, its interface specification, design, 7 how it can be tested. 8 9 Change Log 10 ---------- 11 version 0.1 Initial document, Janak Desai (ja 12 13 Contents 14 -------- 15 1) Overview 16 2) Benefits 17 3) Cost 18 4) Requirements 19 5) Functional Specification 20 6) High Level Design 21 7) Low Level Design 22 8) Test Specification 23 9) Future Work 24 25 1) Overview 26 ----------- 27 28 Most legacy operating system kernels support a 29 as multiple execution contexts within a proces 30 special resources and mechanisms to maintain t 31 kernel, in a clever and simple manner, does no 32 between processes and "threads". The kernel al 33 resources and thus they can achieve legacy "th 34 requiring additional data structures and mecha 35 power of implementing threads in this manner c 36 its simplicity but also from allowing applicat 37 outside the confinement of all-or-nothing shar 38 threads. On Linux, at the time of thread creat 39 call, applications can selectively choose whic 40 between threads. 41 42 unshare() system call adds a primitive to the 43 allows threads to selectively 'unshare' any re 44 shared at the time of their creation. unshare( 45 Al Viro in the August of 2000, on the Linux-Ke 46 of the discussion on POSIX threads on Linux. 47 usefulness of Linux threads for applications t 48 shared resources without creating a new proces 49 addition to the set of available primitives on 50 the concept of process/thread as a virtual mac 51 52 2) Benefits 53 ----------- 54 55 unshare() would be useful to large application 56 where creating a new process to control sharin 57 resources is not possible. Since namespaces ar 58 when creating a new process using fork or clon 59 even non-threaded applications if they have a 60 from default shared namespace. The following l 61 where unshare() can be used. 62 63 2.1 Per-security context namespaces 64 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 65 66 unshare() can be used to implement polyinstant 67 the kernel's per-process namespace mechanism. 68 such as per-user and/or per-security context i 69 per-security context instance of a user's home 70 processes when working with these directories. 71 module can easily setup a private namespace fo 72 Polyinstantiated directories are required for 73 with Labeled System Protection Profile, howeve 74 of shared-tree feature in the Linux kernel, ev 75 can benefit from setting up private namespaces 76 polyinstantiating /tmp, /var/tmp and other dir 77 appropriate by system administrators. 78 79 2.2 unsharing of virtual memory and/or open fi 80 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 81 82 Consider a client/server application where the 83 client requests by creating processes that sha 84 virtual memory and open files. Without unshare 85 decide what needs to be shared at the time of 86 which services the request. unshare() allows t 87 disassociate parts of the context during the s 88 request. For large and complex middleware appl 89 ability to unshare() after the process was cre 90 useful. 91 92 3) Cost 93 ------- 94 95 In order to not duplicate code and to handle t 96 works on an active task (as opposed to clone/f 97 allocated inactive task) unshare() had to make 98 changes to copy_* functions utilized by clone/ 99 There is a cost associated with altering exist 100 stable code to implement a new feature that ma 101 extensively in the beginning. However, with pr 102 review of the changes and creation of an unsha 103 the benefits of this new feature can exceed it 104 105 4) Requirements 106 --------------- 107 108 unshare() reverses sharing that was done using 109 so unshare() should have a similar interface a 110 since flags in clone(int flags, void \*stack) 111 be shared, similar flags in unshare(int flags) 112 what should be unshared. Unfortunately, this m 113 the meaning of the flags from the way they are 114 However, there was no easy solution that was l 115 allowed incremental context unsharing in futur 116 117 unshare() interface should accommodate possibl 118 new context flags without requiring a rebuild 119 If and when new context flags are added, unsha 120 incremental unsharing of those resources on an 121 122 5) Functional Specification 123 --------------------------- 124 125 NAME 126 unshare - disassociate parts of the pr 127 128 SYNOPSIS 129 #include <sched.h> 130 131 int unshare(int flags); 132 133 DESCRIPTION 134 unshare() allows a process to disassoc 135 context that are currently being share 136 of execution context, such as the name 137 when a new process is created using fo 138 such as the virtual memory, open file 139 shared by explicit request to share th 140 using clone(2). 141 142 The main use of unshare() is to allow 143 shared execution context without creat 144 145 The flags argument specifies one or bi 146 the following constants. 147 148 CLONE_FS 149 If CLONE_FS is set, file syste 150 is disassociated from the shar 151 152 CLONE_FILES 153 If CLONE_FILES is set, the fil 154 caller is disassociated from t 155 table. 156 157 CLONE_NEWNS 158 If CLONE_NEWNS is set, the nam 159 disassociated from the shared 160 161 CLONE_VM 162 If CLONE_VM is set, the virtua 163 disassociated from the shared 164 165 RETURN VALUE 166 On success, zero returned. On failure, 167 168 ERRORS 169 EPERM CLONE_NEWNS was specified by a 170 without CAP_SYS_ADMIN). 171 172 ENOMEM Cannot allocate sufficient mem 173 context that need to be unshar 174 175 EINVAL Invalid flag was specified as 176 177 CONFORMING TO 178 The unshare() call is Linux-specific a 179 in programs intended to be portable. 180 181 SEE ALSO 182 clone(2), fork(2) 183 184 6) High Level Design 185 -------------------- 186 187 Depending on the flags argument, the unshare() 188 appropriate process context structures, popula 189 the current shared version, associates newly d 190 with the current task structure and releases c 191 versions. Helper functions of clone (copy_*) c 192 directly by unshare() because of the following 193 194 1) clone operates on a newly allocated not-y 195 structure, where as unshare() operates on 196 task. Therefore unshare() has to take app 197 before associating newly duplicated conte 198 199 2) unshare() has to allocate and duplicate a 200 that are being unshared, before associati 201 current task and releasing older shared s 202 do so will create race conditions and/or 203 to backout due to an error. Consider the 204 both virtual memory and namespace. After 205 vm, if the system call encounters an erro 206 new namespace structure, the error return 207 reverse the unsharing of vm. As part of t 208 system call will have to go back to older 209 structure, which may not exist anymore. 210 211 Therefore code from copy_* functions that allo 212 current context structure was moved into new d 213 copy_* functions call dup_* functions to alloc 214 appropriate context structures and then associ 215 task structure that is being constructed. unsh 216 the other hand performs the following: 217 218 1) Check flags to force missing, but implied 219 220 2) For each context structure, call the corr 221 helper function to allocate and duplicate 222 structure, if the appropriate bit is set 223 224 3) If there is no error in allocation and du 225 are new context structures then lock the 226 associate new context structures with the 227 and release the lock on the current task 228 229 4) Appropriately release older, shared, cont 230 231 7) Low Level Design 232 ------------------- 233 234 Implementation of unshare() can be grouped in 235 items: 236 237 a) Reorganization of existing copy_* functio 238 239 b) unshare() system call service function 240 241 c) unshare() helper functions for each diffe 242 243 d) Registration of system call number for di 244 245 7.1) Reorganization of copy_* functions 246 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 247 248 Each copy function such as copy_mm, copy_names 249 etc, had roughly two components. The first com 250 and duplicated the appropriate structure and t 251 linked it to the task structure passed in as a 252 function. The first component was split into i 253 These dup_* functions allocated and duplicated 254 context structure. The reorganized copy_* func 255 their corresponding dup_* functions and then l 256 duplicated structures to the task structure wi 257 copy function was called. 258 259 7.2) unshare() system call service function 260 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 261 262 * Check flags 263 Force implied flags. If CLONE_THREAD 264 If CLONE_VM is set, force CLONE_SIGHA 265 set and signals are also being shared 266 CLONE_NEWNS is set, force CLONE_FS. 267 268 * For each context flag, invoke the cor 269 helper routine with flags passed into 270 reference to pointer pointing the new 271 272 * If any new structures are created by 273 functions, take the task_lock() on th 274 modify appropriate context pointers, 275 task lock. 276 277 * For all newly unshared structures, re 278 older, shared, structures. 279 280 7.3) unshare_* helper functions 281 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 282 283 For unshare_* helpers corresponding to CLONE_S 284 and CLONE_THREAD, return -EINVAL since they ar 285 For others, check the flag value to see if the 286 required for that structure. If it is, invoke 287 dup_* function to allocate and duplicate the s 288 a pointer to it. 289 290 7.4) Finally 291 ~~~~~~~~~~~~ 292 293 Appropriately modify architecture specific cod 294 new system call. 295 296 8) Test Specification 297 --------------------- 298 299 The test for unshare() should test the followi 300 301 1) Valid flags: Test to check that clone fla 302 signal handlers, for which unsharing is n 303 yet, return -EINVAL. 304 305 2) Missing/implied flags: Test to make sure 306 namespace without specifying unsharing of 307 unshares both namespace and filesystem in 308 309 3) For each of the four (namespace, filesyst 310 supported unsharing, verify that the syst 311 unshares the appropriate structure. Verif 312 them individually as well as in combinati 313 other works as expected. 314 315 4) Concurrent execution: Use shared memory s 316 an address in the shm segment to synchron 317 about 10 threads. Have a couple of thread 318 a couple _exit and the rest unshare with 319 of flags. Verify that unsharing is perfor 320 that there are no oops or hangs. 321 322 9) Future Work 323 -------------- 324 325 The current implementation of unshare() does n 326 signals and signal handlers. Signals are compl 327 to unshare signals and/or signal handlers of a 328 process is even more complex. If in the future 329 need to allow unsharing of signals and/or sign 330 be incrementally added to unshare() without af 331 applications using unshare(). 332
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.