1 .. SPDX-License-Identifier: GPL-2.0 2 3 ==== 4 FUSE 5 ==== 6 7 Definitions 8 =========== 9 10 Userspace filesystem: 11 A filesystem in which data and metadata are 12 userspace process. The filesystem can be ac 13 the kernel interface. 14 15 Filesystem daemon: 16 The process(es) providing the data and metad 17 18 Non-privileged mount (or user mount): 19 A userspace filesystem mounted by a non-priv 20 The filesystem daemon is running with the pr 21 user. NOTE: this is not the same as mounts 22 option in /etc/fstab, which is not discussed 23 24 Filesystem connection: 25 A connection between the filesystem daemon a 26 connection exists until either the daemon di 27 umounted. Note that detaching (or lazy umou 28 does *not* break the connection, in this cas 29 the last reference to the filesystem is rele 30 31 Mount owner: 32 The user who does the mounting. 33 34 User: 35 The user who is performing filesystem operat 36 37 What is FUSE? 38 ============= 39 40 FUSE is a userspace filesystem framework. It 41 module (fuse.ko), a userspace library (libfuse 42 (fusermount). 43 44 One of the most important features of FUSE is 45 non-privileged mounts. This opens up new poss 46 filesystems. A good example is sshfs: a secur 47 using the sftp protocol. 48 49 The userspace library and utilities are availa 50 `FUSE homepage: <https://github.com/libfuse/>` 51 52 Filesystem type 53 =============== 54 55 The filesystem type given to mount(2) can be o 56 57 fuse 58 This is the usual way to mount a FUSE fi 59 argument of the mount system call may co 60 which is not interpreted by the kernel. 61 62 fuseblk 63 The filesystem is block device based. T 64 mount system call is interpreted as the 65 66 Mount options 67 ============= 68 69 fd=N 70 The file descriptor to use for communication 71 filesystem and the kernel. The file descrip 72 obtained by opening the FUSE device ('/dev/f 73 74 rootmode=M 75 The file mode of the filesystem's root in oc 76 77 user_id=N 78 The numeric user id of the mount owner. 79 80 group_id=N 81 The numeric group id of the mount owner. 82 83 default_permissions 84 By default FUSE doesn't check file access pe 85 filesystem is free to implement its access p 86 the underlying file access mechanism (e.g. i 87 filesystems). This option enables permissio 88 access based on file mode. It is usually us 89 'allow_other' mount option. 90 91 allow_other 92 This option overrides the security measure r 93 to the user mounting the filesystem. This o 94 allowed to root, but this restriction can be 95 (userspace) configuration option. 96 97 max_read=N 98 With this option the maximum size of read op 99 The default is infinite. Note that the size 100 limited anyway to 32 pages (which is 128kbyt 101 102 blksize=N 103 Set the block size for the filesystem. The 104 option is only valid for 'fuseblk' type moun 105 106 Control filesystem 107 ================== 108 109 There's a control filesystem for FUSE, which c 110 111 mount -t fusectl none /sys/fs/fuse/connectio 112 113 Mounting it under the '/sys/fs/fuse/connection 114 backwards compatible with earlier versions. 115 116 Under the fuse control filesystem each connect 117 named by a unique number. 118 119 For each connection the following files exist 120 121 waiting 122 The number of requests which are wai 123 userspace or being processed by the 124 no filesystem activity and 'waiting' 125 filesystem is hung or deadlocked. 126 127 abort 128 Writing anything into this file will 129 connection. This means that all wai 130 error returned for all aborted and n 131 132 Only the owner of the mount may read or write 133 134 Interrupting filesystem operations 135 ################################## 136 137 If a process issuing a FUSE filesystem request 138 following will happen: 139 140 - If the request is not yet sent to userspa 141 fatal (SIGKILL or unhandled fatal signal) 142 dequeued and returns immediately. 143 144 - If the request is not yet sent to userspa 145 fatal, then an interrupted flag is set fo 146 the request has been successfully transfe 147 this flag is set, an INTERRUPT request is 148 149 - If the request is already sent to userspa 150 request is queued. 151 152 INTERRUPT requests take precedence over other 153 userspace filesystem will receive queued INTER 154 155 The userspace filesystem may ignore the INTERR 156 or may honor them by sending a reply to the *o 157 the error set to EINTR. 158 159 It is also possible that there's a race betwee 160 original request and its INTERRUPT request. T 161 162 1. The INTERRUPT request is processed before 163 processed 164 165 2. The INTERRUPT request is processed after 166 been answered 167 168 If the filesystem cannot find the original req 169 some timeout and/or a number of new requests t 170 should reply to the INTERRUPT request with an 171 1) the INTERRUPT request will be requeued. In 172 reply will be ignored. 173 174 Aborting a filesystem connection 175 ================================ 176 177 It is possible to get into certain situations 178 not responding. Reasons for this may be: 179 180 a) Broken userspace filesystem implementatio 181 182 b) Network connection down 183 184 c) Accidental deadlock 185 186 d) Malicious deadlock 187 188 (For more on c) and d) see later sections) 189 190 In either of these cases it may be useful to a 191 the filesystem. There are several ways to do 192 193 - Kill the filesystem daemon. Works in case 194 195 - Kill the filesystem daemon and all users o 196 in all cases except some malicious deadloc 197 198 - Use forced umount (umount -f). Works in a 199 filesystem is still attached (it hasn't be 200 201 - Abort filesystem through the FUSE control 202 powerful method, always works. 203 204 How do non-privileged mounts work? 205 ================================== 206 207 Since the mount() system call is a privileged 208 program (fusermount) is needed, which is insta 209 210 The implication of providing non-privileged mo 211 owner must not be able to use this capability 212 system. Obvious requirements arising from thi 213 214 A) mount owner should not be able to get elev 215 help of the mounted filesystem 216 217 B) mount owner should not get illegitimate ac 218 other users' and the super user's processe 219 220 C) mount owner should not be able to induce u 221 other users' or the super user's processes 222 223 How are requirements fulfilled? 224 =============================== 225 226 A) The mount owner could gain elevated privil 227 228 1. creating a filesystem containing a devi 229 230 2. creating a filesystem containing a suid 231 232 The solution is not to allow opening devic 233 setuid and setgid bits when executing prog 234 fusermount always adds "nosuid" and "nodev 235 for non-privileged mounts. 236 237 B) If another user is accessing files or dire 238 filesystem, the filesystem daemon serving 239 exact sequence and timing of operations pe 240 information is otherwise inaccessible to t 241 counts as an information leak. 242 243 The solution to this problem will be prese 244 245 C) There are several ways in which the mount 246 undesired behavior in other users' process 247 248 1) mounting a filesystem over a file or d 249 owner could otherwise not be able to m 250 make limited modifications). 251 252 This is solved in fusermount, by check 253 permissions on the mountpoint and only 254 the mount owner can do unlimited modif 255 access to the mountpoint, and mountpoi 256 directory) 257 258 2) Even if 1) is solved the mount owner c 259 of other users' processes. 260 261 i) It can slow down or indefinitely d 262 filesystem operation creating a Do 263 whole system. For example a suid 264 system file, and then accessing a 265 filesystem could be stopped, and t 266 file to be locked forever. 267 268 ii) It can present files or directori 269 directory structures of unlimited 270 system process to eat up diskspac 271 resources, again causing *DoS*. 272 273 The solution to this as well as B) is 274 to access the filesystem, which could 275 monitored or manipulated by the mount 276 mount owner can ptrace a process, it c 277 without using a FUSE mount, the same c 278 ptrace can be used to check if a proce 279 the filesystem or not. 280 281 Note that the *ptrace* check is not st 282 prevent C/2/i, it is enough to check i 283 privilege to send signal to the proces 284 filesystem, since *SIGSTOP* can be use 285 286 I think these limitations are unacceptable? 287 =========================================== 288 289 If a sysadmin trusts the users enough, or can 290 measures, that system processes will never ent 291 mounts, it can relax the last limitation in se 292 293 - With the 'user_allow_other' config option. 294 set, the mounting user can add the 'allow_ 295 disables the check for other users' proces 296 297 User namespaces have an unintuitive intera 298 an unprivileged user - normally restricted 299 'allow_other' - could do so in a user name 300 privileged. If any process could access su 301 this would give the mounting user the abil 302 processes in user namespaces where they're 303 reason 'allow_other' restricts access to u 304 or a descendant. 305 306 - With the 'allow_sys_admin_access' module o 307 set, super user's processes have unrestric 308 irrespective of allow_other setting or use 309 mounting user. 310 311 Note that both of these relaxations expose the 312 information leak or *DoS* as described in poin 313 preceding section. 314 315 Kernel - userspace interface 316 ============================ 317 318 The following diagram shows how a filesystem o 319 example unlink) is performed in FUSE. :: 320 321 322 | "rm /mnt/fuse/file" | FUSE 323 | | 324 | | >sys_ 325 | | >fu 326 | | > 327 | | 328 | | 329 | >sys_unlink() | 330 | >fuse_unlink() | 331 | [get request from | 332 | fc->unused_list] | 333 | >request_send() | 334 | [queue req on fc->pending] | 335 | [wake up fc->waitq] | 336 | >request_wait_answer() | 337 | [sleep on req->waitq] | 338 | | < 339 | | [ 340 | | [ 341 | | [ 342 | | <fu 343 | | <sys_ 344 | | 345 | | [perf 346 | | 347 | | >sys_ 348 | | >fu 349 | | [ 350 | | [ 351 | | [ 352 | [woken up] | [ 353 | | <fu 354 | | <sys_ 355 | <request_wait_answer() | 356 | <request_send() | 357 | [add request to | 358 | fc->unused_list] | 359 | <fuse_unlink() | 360 | <sys_unlink() | 361 362 .. note:: Everything in the description above 363 364 There are a couple of ways in which to deadloc 365 Since we are talking about unprivileged usersp 366 something must be done about these. 367 368 **Scenario 1 - Simple deadlock**:: 369 370 | "rm /mnt/fuse/file" | FUSE 371 | | 372 | >sys_unlink("/mnt/fuse/file") | 373 | [acquire inode semaphore | 374 | for "file"] | 375 | >fuse_unlink() | 376 | [sleep on req->waitq] | 377 | | <sys_ 378 | | >sys_ 379 | | [ac 380 | | fo 381 | | *DE 382 383 The solution for this is to allow the filesyst 384 385 **Scenario 2 - Tricky deadlock** 386 387 388 This one needs a carefully crafted filesystem. 389 the above, only the call back to the filesyste 390 but is caused by a pagefault. :: 391 392 | Kamikaze filesystem thread 1 | Kamik 393 | | 394 | [fd = open("/mnt/fuse/file")] | [requ 395 | [mmap fd to 'addr'] | 396 | [close fd] | [FLUS 397 | [read a byte from addr] | 398 | >do_page_fault() | 399 | [find or create page] | 400 | [lock page] | 401 | >fuse_readpage() | 402 | [queue READ request] | 403 | [sleep on req->waitq] | 404 | | [read 405 | | [crea 406 | | >sys_ 407 | | >fu 408 | | [ 409 | | [ 410 | | [ 411 | | 412 | | 413 | | 414 | | 415 416 The solution is basically the same as above. 417 418 An additional problem is that while the write 419 to the request, the request must not be interr 420 because the destination address of the copy ma 421 request has returned. 422 423 This is solved with doing the copy atomically, 424 while the page(s) belonging to the write buffe 425 get_user_pages(). The 'req->locked' flag indi 426 taking place, and abort is delayed until this
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.