~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/filesystems/sharedsubtree.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 .. SPDX-License-Identifier: GPL-2.0
  2 
  3 ===============
  4 Shared Subtrees
  5 ===============
  6 
  7 .. Contents:
  8         1) Overview
  9         2) Features
 10         3) Setting mount states
 11         4) Use-case
 12         5) Detailed semantics
 13         6) Quiz
 14         7) FAQ
 15         8) Implementation
 16 
 17 
 18 1) Overview
 19 -----------
 20 
 21 Consider the following situation:
 22 
 23 A process wants to clone its own namespace, but still wants to access the CD
 24 that got mounted recently.  Shared subtree semantics provide the necessary
 25 mechanism to accomplish the above.
 26 
 27 It provides the necessary building blocks for features like per-user-namespace
 28 and versioned filesystem.
 29 
 30 2) Features
 31 -----------
 32 
 33 Shared subtree provides four different flavors of mounts; struct vfsmount to be
 34 precise
 35 
 36         a. shared mount
 37         b. slave mount
 38         c. private mount
 39         d. unbindable mount
 40 
 41 
 42 2a) A shared mount can be replicated to as many mountpoints and all the
 43 replicas continue to be exactly same.
 44 
 45         Here is an example:
 46 
 47         Let's say /mnt has a mount that is shared::
 48 
 49             mount --make-shared /mnt
 50 
 51         Note: mount(8) command now supports the --make-shared flag,
 52         so the sample 'smount' program is no longer needed and has been
 53         removed.
 54 
 55         ::
 56 
 57             # mount --bind /mnt /tmp
 58 
 59         The above command replicates the mount at /mnt to the mountpoint /tmp
 60         and the contents of both the mounts remain identical.
 61 
 62         ::
 63 
 64             #ls /mnt
 65             a b c
 66 
 67             #ls /tmp
 68             a b c
 69 
 70         Now let's say we mount a device at /tmp/a::
 71 
 72             # mount /dev/sd0  /tmp/a
 73 
 74             #ls /tmp/a
 75             t1 t2 t3
 76 
 77             #ls /mnt/a
 78             t1 t2 t3
 79 
 80         Note that the mount has propagated to the mount at /mnt as well.
 81 
 82         And the same is true even when /dev/sd0 is mounted on /mnt/a. The
 83         contents will be visible under /tmp/a too.
 84 
 85 
 86 2b) A slave mount is like a shared mount except that mount and umount events
 87         only propagate towards it.
 88 
 89         All slave mounts have a master mount which is a shared.
 90 
 91         Here is an example:
 92 
 93         Let's say /mnt has a mount which is shared.
 94         # mount --make-shared /mnt
 95 
 96         Let's bind mount /mnt to /tmp
 97         # mount --bind /mnt /tmp
 98 
 99         the new mount at /tmp becomes a shared mount and it is a replica of
100         the mount at /mnt.
101 
102         Now let's make the mount at /tmp; a slave of /mnt
103         # mount --make-slave /tmp
104 
105         let's mount /dev/sd0 on /mnt/a
106         # mount /dev/sd0 /mnt/a
107 
108         #ls /mnt/a
109         t1 t2 t3
110 
111         #ls /tmp/a
112         t1 t2 t3
113 
114         Note the mount event has propagated to the mount at /tmp
115 
116         However let's see what happens if we mount something on the mount at /tmp
117 
118         # mount /dev/sd1 /tmp/b
119 
120         #ls /tmp/b
121         s1 s2 s3
122 
123         #ls /mnt/b
124 
125         Note how the mount event has not propagated to the mount at
126         /mnt
127 
128 
129 2c) A private mount does not forward or receive propagation.
130 
131         This is the mount we are familiar with. Its the default type.
132 
133 
134 2d) A unbindable mount is a unbindable private mount
135 
136         let's say we have a mount at /mnt and we make it unbindable::
137 
138             # mount --make-unbindable /mnt
139 
140          Let's try to bind mount this mount somewhere else::
141 
142             # mount --bind /mnt /tmp
143             mount: wrong fs type, bad option, bad superblock on /mnt,
144                     or too many mounted file systems
145 
146         Binding a unbindable mount is a invalid operation.
147 
148 
149 3) Setting mount states
150 -----------------------
151 
152         The mount command (util-linux package) can be used to set mount
153         states::
154 
155             mount --make-shared mountpoint
156             mount --make-slave mountpoint
157             mount --make-private mountpoint
158             mount --make-unbindable mountpoint
159 
160 
161 4) Use cases
162 ------------
163 
164         A) A process wants to clone its own namespace, but still wants to
165            access the CD that got mounted recently.
166 
167            Solution:
168 
169                 The system administrator can make the mount at /cdrom shared::
170 
171                     mount --bind /cdrom /cdrom
172                     mount --make-shared /cdrom
173 
174                 Now any process that clones off a new namespace will have a
175                 mount at /cdrom which is a replica of the same mount in the
176                 parent namespace.
177 
178                 So when a CD is inserted and mounted at /cdrom that mount gets
179                 propagated to the other mount at /cdrom in all the other clone
180                 namespaces.
181 
182         B) A process wants its mounts invisible to any other process, but
183         still be able to see the other system mounts.
184 
185            Solution:
186 
187                 To begin with, the administrator can mark the entire mount tree
188                 as shareable::
189 
190                     mount --make-rshared /
191 
192                 A new process can clone off a new namespace. And mark some part
193                 of its namespace as slave::
194 
195                     mount --make-rslave /myprivatetree
196 
197                 Hence forth any mounts within the /myprivatetree done by the
198                 process will not show up in any other namespace. However mounts
199                 done in the parent namespace under /myprivatetree still shows
200                 up in the process's namespace.
201 
202 
203         Apart from the above semantics this feature provides the
204         building blocks to solve the following problems:
205 
206         C)  Per-user namespace
207 
208                 The above semantics allows a way to share mounts across
209                 namespaces.  But namespaces are associated with processes. If
210                 namespaces are made first class objects with user API to
211                 associate/disassociate a namespace with userid, then each user
212                 could have his/her own namespace and tailor it to his/her
213                 requirements. This needs to be supported in PAM.
214 
215         D)  Versioned files
216 
217                 If the entire mount tree is visible at multiple locations, then
218                 an underlying versioning file system can return different
219                 versions of the file depending on the path used to access that
220                 file.
221 
222                 An example is::
223 
224                     mount --make-shared /
225                     mount --rbind / /view/v1
226                     mount --rbind / /view/v2
227                     mount --rbind / /view/v3
228                     mount --rbind / /view/v4
229 
230                 and if /usr has a versioning filesystem mounted, then that
231                 mount appears at /view/v1/usr, /view/v2/usr, /view/v3/usr and
232                 /view/v4/usr too
233 
234                 A user can request v3 version of the file /usr/fs/namespace.c
235                 by accessing /view/v3/usr/fs/namespace.c . The underlying
236                 versioning filesystem can then decipher that v3 version of the
237                 filesystem is being requested and return the corresponding
238                 inode.
239 
240 5) Detailed semantics
241 ---------------------
242         The section below explains the detailed semantics of
243         bind, rbind, move, mount, umount and clone-namespace operations.
244 
245         Note: the word 'vfsmount' and the noun 'mount' have been used
246         to mean the same thing, throughout this document.
247 
248 5a) Mount states
249 
250         A given mount can be in one of the following states
251 
252         1) shared
253         2) slave
254         3) shared and slave
255         4) private
256         5) unbindable
257 
258         A 'propagation event' is defined as event generated on a vfsmount
259         that leads to mount or unmount actions in other vfsmounts.
260 
261         A 'peer group' is defined as a group of vfsmounts that propagate
262         events to each other.
263 
264         (1) Shared mounts
265 
266                 A 'shared mount' is defined as a vfsmount that belongs to a
267                 'peer group'.
268 
269                 For example::
270 
271                         mount --make-shared /mnt
272                         mount --bind /mnt /tmp
273 
274                 The mount at /mnt and that at /tmp are both shared and belong
275                 to the same peer group. Anything mounted or unmounted under
276                 /mnt or /tmp reflect in all the other mounts of its peer
277                 group.
278 
279 
280         (2) Slave mounts
281 
282                 A 'slave mount' is defined as a vfsmount that receives
283                 propagation events and does not forward propagation events.
284 
285                 A slave mount as the name implies has a master mount from which
286                 mount/unmount events are received. Events do not propagate from
287                 the slave mount to the master.  Only a shared mount can be made
288                 a slave by executing the following command::
289 
290                         mount --make-slave mount
291 
292                 A shared mount that is made as a slave is no more shared unless
293                 modified to become shared.
294 
295         (3) Shared and Slave
296 
297                 A vfsmount can be both shared as well as slave.  This state
298                 indicates that the mount is a slave of some vfsmount, and
299                 has its own peer group too.  This vfsmount receives propagation
300                 events from its master vfsmount, and also forwards propagation
301                 events to its 'peer group' and to its slave vfsmounts.
302 
303                 Strictly speaking, the vfsmount is shared having its own
304                 peer group, and this peer-group is a slave of some other
305                 peer group.
306 
307                 Only a slave vfsmount can be made as 'shared and slave' by
308                 either executing the following command::
309 
310                         mount --make-shared mount
311 
312                 or by moving the slave vfsmount under a shared vfsmount.
313 
314         (4) Private mount
315 
316                 A 'private mount' is defined as vfsmount that does not
317                 receive or forward any propagation events.
318 
319         (5) Unbindable mount
320 
321                 A 'unbindable mount' is defined as vfsmount that does not
322                 receive or forward any propagation events and cannot
323                 be bind mounted.
324 
325 
326         State diagram:
327 
328         The state diagram below explains the state transition of a mount,
329         in response to various commands::
330 
331             -----------------------------------------------------------------------
332             |             |make-shared |  make-slave  | make-private |make-unbindab|
333             --------------|------------|--------------|--------------|-------------|
334             |shared       |shared      |*slave/private|   private    | unbindable  |
335             |             |            |              |              |             |
336             |-------------|------------|--------------|--------------|-------------|
337             |slave        |shared      | **slave      |    private   | unbindable  |
338             |             |and slave   |              |              |             |
339             |-------------|------------|--------------|--------------|-------------|
340             |shared       |shared      | slave        |    private   | unbindable  |
341             |and slave    |and slave   |              |              |             |
342             |-------------|------------|--------------|--------------|-------------|
343             |private      |shared      |  **private   |    private   | unbindable  |
344             |-------------|------------|--------------|--------------|-------------|
345             |unbindable   |shared      |**unbindable  |    private   | unbindable  |
346             ------------------------------------------------------------------------
347 
348             * if the shared mount is the only mount in its peer group, making it
349             slave, makes it private automatically. Note that there is no master to
350             which it can be slaved to.
351 
352             ** slaving a non-shared mount has no effect on the mount.
353 
354         Apart from the commands listed below, the 'move' operation also changes
355         the state of a mount depending on type of the destination mount. Its
356         explained in section 5d.
357 
358 5b) Bind semantics
359 
360         Consider the following command::
361 
362             mount --bind A/a  B/b
363 
364         where 'A' is the source mount, 'a' is the dentry in the mount 'A', 'B'
365         is the destination mount and 'b' is the dentry in the destination mount.
366 
367         The outcome depends on the type of mount of 'A' and 'B'. The table
368         below contains quick reference::
369 
370             --------------------------------------------------------------------------
371             |         BIND MOUNT OPERATION                                           |
372             |************************************************************************|
373             |source(A)->| shared      |       private  |       slave    | unbindable |
374             | dest(B)  |              |                |                |            |
375             |   |      |              |                |                |            |
376             |   v      |              |                |                |            |
377             |************************************************************************|
378             |  shared  | shared       |     shared     | shared & slave |  invalid   |
379             |          |              |                |                |            |
380             |non-shared| shared       |      private   |      slave     |  invalid   |
381             **************************************************************************
382 
383         Details:
384 
385     1. 'A' is a shared mount and 'B' is a shared mount. A new mount 'C'
386         which is clone of 'A', is created. Its root dentry is 'a' . 'C' is
387         mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
388         are created and mounted at the dentry 'b' on all mounts where 'B'
389         propagates to. A new propagation tree containing 'C1',..,'Cn' is
390         created. This propagation tree is identical to the propagation tree of
391         'B'.  And finally the peer-group of 'C' is merged with the peer group
392         of 'A'.
393 
394     2. 'A' is a private mount and 'B' is a shared mount. A new mount 'C'
395         which is clone of 'A', is created. Its root dentry is 'a'. 'C' is
396         mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
397         are created and mounted at the dentry 'b' on all mounts where 'B'
398         propagates to. A new propagation tree is set containing all new mounts
399         'C', 'C1', .., 'Cn' with exactly the same configuration as the
400         propagation tree for 'B'.
401 
402     3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. A new
403         mount 'C' which is clone of 'A', is created. Its root dentry is 'a' .
404         'C' is mounted on mount 'B' at dentry 'b'. Also new mounts 'C1', 'C2',
405         'C3' ... are created and mounted at the dentry 'b' on all mounts where
406         'B' propagates to. A new propagation tree containing the new mounts
407         'C','C1',..  'Cn' is created. This propagation tree is identical to the
408         propagation tree for 'B'. And finally the mount 'C' and its peer group
409         is made the slave of mount 'Z'.  In other words, mount 'C' is in the
410         state 'slave and shared'.
411 
412     4. 'A' is a unbindable mount and 'B' is a shared mount. This is a
413         invalid operation.
414 
415     5. 'A' is a private mount and 'B' is a non-shared(private or slave or
416         unbindable) mount. A new mount 'C' which is clone of 'A', is created.
417         Its root dentry is 'a'. 'C' is mounted on mount 'B' at dentry 'b'.
418 
419     6. 'A' is a shared mount and 'B' is a non-shared mount. A new mount 'C'
420         which is a clone of 'A' is created. Its root dentry is 'a'. 'C' is
421         mounted on mount 'B' at dentry 'b'.  'C' is made a member of the
422         peer-group of 'A'.
423 
424     7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. A
425         new mount 'C' which is a clone of 'A' is created. Its root dentry is
426         'a'.  'C' is mounted on mount 'B' at dentry 'b'. Also 'C' is set as a
427         slave mount of 'Z'. In other words 'A' and 'C' are both slave mounts of
428         'Z'.  All mount/unmount events on 'Z' propagates to 'A' and 'C'. But
429         mount/unmount on 'A' do not propagate anywhere else. Similarly
430         mount/unmount on 'C' do not propagate anywhere else.
431 
432     8. 'A' is a unbindable mount and 'B' is a non-shared mount. This is a
433         invalid operation. A unbindable mount cannot be bind mounted.
434 
435 5c) Rbind semantics
436 
437         rbind is same as bind. Bind replicates the specified mount.  Rbind
438         replicates all the mounts in the tree belonging to the specified mount.
439         Rbind mount is bind mount applied to all the mounts in the tree.
440 
441         If the source tree that is rbind has some unbindable mounts,
442         then the subtree under the unbindable mount is pruned in the new
443         location.
444 
445         eg:
446 
447           let's say we have the following mount tree::
448 
449                 A
450               /   \
451               B   C
452              / \ / \
453              D E F G
454 
455           Let's say all the mount except the mount C in the tree are
456           of a type other than unbindable.
457 
458           If this tree is rbound to say Z
459 
460           We will have the following tree at the new location::
461 
462                 Z
463                 |
464                 A'
465                /
466               B'                Note how the tree under C is pruned
467              / \                in the new location.
468             D' E'
469 
470 
471 
472 5d) Move semantics
473 
474         Consider the following command
475 
476         mount --move A  B/b
477 
478         where 'A' is the source mount, 'B' is the destination mount and 'b' is
479         the dentry in the destination mount.
480 
481         The outcome depends on the type of the mount of 'A' and 'B'. The table
482         below is a quick reference::
483 
484             ---------------------------------------------------------------------------
485             |                   MOVE MOUNT OPERATION                                 |
486             |**************************************************************************
487             | source(A)->| shared      |       private  |       slave    | unbindable |
488             | dest(B)  |               |                |                |            |
489             |   |      |               |                |                |            |
490             |   v      |               |                |                |            |
491             |**************************************************************************
492             |  shared  | shared        |     shared     |shared and slave|  invalid   |
493             |          |               |                |                |            |
494             |non-shared| shared        |      private   |    slave       | unbindable |
495             ***************************************************************************
496 
497         .. Note:: moving a mount residing under a shared mount is invalid.
498 
499       Details follow:
500 
501     1. 'A' is a shared mount and 'B' is a shared mount.  The mount 'A' is
502         mounted on mount 'B' at dentry 'b'.  Also new mounts 'A1', 'A2'...'An'
503         are created and mounted at dentry 'b' on all mounts that receive
504         propagation from mount 'B'. A new propagation tree is created in the
505         exact same configuration as that of 'B'. This new propagation tree
506         contains all the new mounts 'A1', 'A2'...  'An'.  And this new
507         propagation tree is appended to the already existing propagation tree
508         of 'A'.
509 
510     2. 'A' is a private mount and 'B' is a shared mount. The mount 'A' is
511         mounted on mount 'B' at dentry 'b'. Also new mount 'A1', 'A2'... 'An'
512         are created and mounted at dentry 'b' on all mounts that receive
513         propagation from mount 'B'. The mount 'A' becomes a shared mount and a
514         propagation tree is created which is identical to that of
515         'B'. This new propagation tree contains all the new mounts 'A1',
516         'A2'...  'An'.
517 
518     3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount.  The
519         mount 'A' is mounted on mount 'B' at dentry 'b'.  Also new mounts 'A1',
520         'A2'... 'An' are created and mounted at dentry 'b' on all mounts that
521         receive propagation from mount 'B'. A new propagation tree is created
522         in the exact same configuration as that of 'B'. This new propagation
523         tree contains all the new mounts 'A1', 'A2'...  'An'.  And this new
524         propagation tree is appended to the already existing propagation tree of
525         'A'.  Mount 'A' continues to be the slave mount of 'Z' but it also
526         becomes 'shared'.
527 
528     4. 'A' is a unbindable mount and 'B' is a shared mount. The operation
529         is invalid. Because mounting anything on the shared mount 'B' can
530         create new mounts that get mounted on the mounts that receive
531         propagation from 'B'.  And since the mount 'A' is unbindable, cloning
532         it to mount at other mountpoints is not possible.
533 
534     5. 'A' is a private mount and 'B' is a non-shared(private or slave or
535         unbindable) mount. The mount 'A' is mounted on mount 'B' at dentry 'b'.
536 
537     6. 'A' is a shared mount and 'B' is a non-shared mount.  The mount 'A'
538         is mounted on mount 'B' at dentry 'b'.  Mount 'A' continues to be a
539         shared mount.
540 
541     7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount.
542         The mount 'A' is mounted on mount 'B' at dentry 'b'.  Mount 'A'
543         continues to be a slave mount of mount 'Z'.
544 
545     8. 'A' is a unbindable mount and 'B' is a non-shared mount. The mount
546         'A' is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a
547         unbindable mount.
548 
549 5e) Mount semantics
550 
551         Consider the following command::
552 
553             mount device  B/b
554 
555         'B' is the destination mount and 'b' is the dentry in the destination
556         mount.
557 
558         The above operation is the same as bind operation with the exception
559         that the source mount is always a private mount.
560 
561 
562 5f) Unmount semantics
563 
564         Consider the following command::
565 
566             umount A
567 
568         where 'A' is a mount mounted on mount 'B' at dentry 'b'.
569 
570         If mount 'B' is shared, then all most-recently-mounted mounts at dentry
571         'b' on mounts that receive propagation from mount 'B' and does not have
572         sub-mounts within them are unmounted.
573 
574         Example: Let's say 'B1', 'B2', 'B3' are shared mounts that propagate to
575         each other.
576 
577         let's say 'A1', 'A2', 'A3' are first mounted at dentry 'b' on mount
578         'B1', 'B2' and 'B3' respectively.
579 
580         let's say 'C1', 'C2', 'C3' are next mounted at the same dentry 'b' on
581         mount 'B1', 'B2' and 'B3' respectively.
582 
583         if 'C1' is unmounted, all the mounts that are most-recently-mounted on
584         'B1' and on the mounts that 'B1' propagates-to are unmounted.
585 
586         'B1' propagates to 'B2' and 'B3'. And the most recently mounted mount
587         on 'B2' at dentry 'b' is 'C2', and that of mount 'B3' is 'C3'.
588 
589         So all 'C1', 'C2' and 'C3' should be unmounted.
590 
591         If any of 'C2' or 'C3' has some child mounts, then that mount is not
592         unmounted, but all other mounts are unmounted. However if 'C1' is told
593         to be unmounted and 'C1' has some sub-mounts, the umount operation is
594         failed entirely.
595 
596 5g) Clone Namespace
597 
598         A cloned namespace contains all the mounts as that of the parent
599         namespace.
600 
601         Let's say 'A' and 'B' are the corresponding mounts in the parent and the
602         child namespace.
603 
604         If 'A' is shared, then 'B' is also shared and 'A' and 'B' propagate to
605         each other.
606 
607         If 'A' is a slave mount of 'Z', then 'B' is also the slave mount of
608         'Z'.
609 
610         If 'A' is a private mount, then 'B' is a private mount too.
611 
612         If 'A' is unbindable mount, then 'B' is a unbindable mount too.
613 
614 
615 6) Quiz
616 -------
617 
618         A. What is the result of the following command sequence?
619 
620                 ::
621 
622                     mount --bind /mnt /mnt
623                     mount --make-shared /mnt
624                     mount --bind /mnt /tmp
625                     mount --move /tmp /mnt/1
626 
627                 what should be the contents of /mnt /mnt/1 /mnt/1/1 should be?
628                 Should they all be identical? or should /mnt and /mnt/1 be
629                 identical only?
630 
631 
632         B. What is the result of the following command sequence?
633 
634                 ::
635 
636                     mount --make-rshared /
637                     mkdir -p /v/1
638                     mount --rbind / /v/1
639 
640                 what should be the content of /v/1/v/1 be?
641 
642 
643         C. What is the result of the following command sequence?
644 
645                 ::
646 
647                     mount --bind /mnt /mnt
648                     mount --make-shared /mnt
649                     mkdir -p /mnt/1/2/3 /mnt/1/test
650                     mount --bind /mnt/1 /tmp
651                     mount --make-slave /mnt
652                     mount --make-shared /mnt
653                     mount --bind /mnt/1/2 /tmp1
654                     mount --make-slave /mnt
655 
656                 At this point we have the first mount at /tmp and
657                 its root dentry is 1. Let's call this mount 'A'
658                 And then we have a second mount at /tmp1 with root
659                 dentry 2. Let's call this mount 'B'
660                 Next we have a third mount at /mnt with root dentry
661                 mnt. Let's call this mount 'C'
662 
663                 'B' is the slave of 'A' and 'C' is a slave of 'B'
664                 A -> B -> C
665 
666                 at this point if we execute the following command
667 
668                 mount --bind /bin /tmp/test
669 
670                 The mount is attempted on 'A'
671 
672                 will the mount propagate to 'B' and 'C' ?
673 
674                 what would be the contents of
675                 /mnt/1/test be?
676 
677 7) FAQ
678 ------
679 
680         Q1. Why is bind mount needed? How is it different from symbolic links?
681                 symbolic links can get stale if the destination mount gets
682                 unmounted or moved. Bind mounts continue to exist even if the
683                 other mount is unmounted or moved.
684 
685         Q2. Why can't the shared subtree be implemented using exportfs?
686 
687                 exportfs is a heavyweight way of accomplishing part of what
688                 shared subtree can do. I cannot imagine a way to implement the
689                 semantics of slave mount using exportfs?
690 
691         Q3 Why is unbindable mount needed?
692 
693                 Let's say we want to replicate the mount tree at multiple
694                 locations within the same subtree.
695 
696                 if one rbind mounts a tree within the same subtree 'n' times
697                 the number of mounts created is an exponential function of 'n'.
698                 Having unbindable mount can help prune the unneeded bind
699                 mounts. Here is an example.
700 
701                 step 1:
702                    let's say the root tree has just two directories with
703                    one vfsmount::
704 
705                                     root
706                                    /    \
707                                   tmp    usr
708 
709                     And we want to replicate the tree at multiple
710                     mountpoints under /root/tmp
711 
712                 step 2:
713                       ::
714 
715 
716                         mount --make-shared /root
717 
718                         mkdir -p /tmp/m1
719 
720                         mount --rbind /root /tmp/m1
721 
722                       the new tree now looks like this::
723 
724                                     root
725                                    /    \
726                                  tmp    usr
727                                 /
728                                m1
729                               /  \
730                              tmp  usr
731                              /
732                             m1
733 
734                           it has two vfsmounts
735 
736                 step 3:
737                     ::
738 
739                             mkdir -p /tmp/m2
740                             mount --rbind /root /tmp/m2
741 
742                         the new tree now looks like this::
743 
744                                       root
745                                      /    \
746                                    tmp     usr
747                                   /    \
748                                 m1       m2
749                                / \       /  \
750                              tmp  usr   tmp  usr
751                              / \          /
752                             m1  m2      m1
753                                 / \     /  \
754                               tmp usr  tmp   usr
755                               /        / \
756                              m1       m1  m2
757                             /  \
758                           tmp   usr
759                           /  \
760                          m1   m2
761 
762                        it has 6 vfsmounts
763 
764                 step 4:
765                       ::
766                           mkdir -p /tmp/m3
767                           mount --rbind /root /tmp/m3
768 
769                           I won't draw the tree..but it has 24 vfsmounts
770 
771 
772                 at step i the number of vfsmounts is V[i] = i*V[i-1].
773                 This is an exponential function. And this tree has way more
774                 mounts than what we really needed in the first place.
775 
776                 One could use a series of umount at each step to prune
777                 out the unneeded mounts. But there is a better solution.
778                 Unclonable mounts come in handy here.
779 
780                 step 1:
781                    let's say the root tree has just two directories with
782                    one vfsmount::
783 
784                                     root
785                                    /    \
786                                   tmp    usr
787 
788                     How do we set up the same tree at multiple locations under
789                     /root/tmp
790 
791                 step 2:
792                       ::
793 
794 
795                         mount --bind /root/tmp /root/tmp
796 
797                         mount --make-rshared /root
798                         mount --make-unbindable /root/tmp
799 
800                         mkdir -p /tmp/m1
801 
802                         mount --rbind /root /tmp/m1
803 
804                       the new tree now looks like this::
805 
806                                     root
807                                    /    \
808                                  tmp    usr
809                                 /
810                                m1
811                               /  \
812                              tmp  usr
813 
814                 step 3:
815                       ::
816 
817                             mkdir -p /tmp/m2
818                             mount --rbind /root /tmp/m2
819 
820                       the new tree now looks like this::
821 
822                                     root
823                                    /    \
824                                  tmp    usr
825                                 /   \
826                                m1     m2
827                               /  \     / \
828                              tmp  usr tmp usr
829 
830                 step 4:
831                       ::
832 
833                             mkdir -p /tmp/m3
834                             mount --rbind /root /tmp/m3
835 
836                       the new tree now looks like this::
837 
838                                           root
839                                       /           \
840                                      tmp           usr
841                                  /    \    \
842                                m1     m2     m3
843                               /  \     / \    /  \
844                              tmp  usr tmp usr tmp usr
845 
846 8) Implementation
847 -----------------
848 
849 8A) Datastructure
850 
851         4 new fields are introduced to struct vfsmount:
852 
853         *   ->mnt_share
854         *   ->mnt_slave_list
855         *   ->mnt_slave
856         *   ->mnt_master
857 
858         ->mnt_share
859                 links together all the mount to/from which this vfsmount
860                 send/receives propagation events.
861 
862         ->mnt_slave_list
863                 links all the mounts to which this vfsmount propagates
864                 to.
865 
866         ->mnt_slave
867                 links together all the slaves that its master vfsmount
868                 propagates to.
869 
870         ->mnt_master
871                 points to the master vfsmount from which this vfsmount
872                 receives propagation.
873 
874         ->mnt_flags
875                 takes two more flags to indicate the propagation status of
876                 the vfsmount.  MNT_SHARE indicates that the vfsmount is a shared
877                 vfsmount.  MNT_UNCLONABLE indicates that the vfsmount cannot be
878                 replicated.
879 
880         All the shared vfsmounts in a peer group form a cyclic list through
881         ->mnt_share.
882 
883         All vfsmounts with the same ->mnt_master form on a cyclic list anchored
884         in ->mnt_master->mnt_slave_list and going through ->mnt_slave.
885 
886          ->mnt_master can point to arbitrary (and possibly different) members
887          of master peer group.  To find all immediate slaves of a peer group
888          you need to go through _all_ ->mnt_slave_list of its members.
889          Conceptually it's just a single set - distribution among the
890          individual lists does not affect propagation or the way propagation
891          tree is modified by operations.
892 
893         All vfsmounts in a peer group have the same ->mnt_master.  If it is
894         non-NULL, they form a contiguous (ordered) segment of slave list.
895 
896         A example propagation tree looks as shown in the figure below.
897         [ NOTE: Though it looks like a forest, if we consider all the shared
898         mounts as a conceptual entity called 'pnode', it becomes a tree]::
899 
900 
901                         A <--> B <--> C <---> D
902                        /|\            /|      |\
903                       / F G          J K      H I
904                      /
905                     E<-->K
906                         /|\
907                        M L N
908 
909         In the above figure  A,B,C and D all are shared and propagate to each
910         other.   'A' has got 3 slave mounts 'E' 'F' and 'G' 'C' has got 2 slave
911         mounts 'J' and 'K'  and  'D' has got two slave mounts 'H' and 'I'.
912         'E' is also shared with 'K' and they propagate to each other.  And
913         'K' has 3 slaves 'M', 'L' and 'N'
914 
915         A's ->mnt_share links with the ->mnt_share of 'B' 'C' and 'D'
916 
917         A's ->mnt_slave_list links with ->mnt_slave of 'E', 'K', 'F' and 'G'
918 
919         E's ->mnt_share links with ->mnt_share of K
920 
921         'E', 'K', 'F', 'G' have their ->mnt_master point to struct vfsmount of 'A'
922 
923         'M', 'L', 'N' have their ->mnt_master point to struct vfsmount of 'K'
924 
925         K's ->mnt_slave_list links with ->mnt_slave of 'M', 'L' and 'N'
926 
927         C's ->mnt_slave_list links with ->mnt_slave of 'J' and 'K'
928 
929         J and K's ->mnt_master points to struct vfsmount of C
930 
931         and finally D's ->mnt_slave_list links with ->mnt_slave of 'H' and 'I'
932 
933         'H' and 'I' have their ->mnt_master pointing to struct vfsmount of 'D'.
934 
935 
936         NOTE: The propagation tree is orthogonal to the mount tree.
937 
938 8B Locking:
939 
940         ->mnt_share, ->mnt_slave, ->mnt_slave_list, ->mnt_master are protected
941         by namespace_sem (exclusive for modifications, shared for reading).
942 
943         Normally we have ->mnt_flags modifications serialized by vfsmount_lock.
944         There are two exceptions: do_add_mount() and clone_mnt().
945         The former modifies a vfsmount that has not been visible in any shared
946         data structures yet.
947         The latter holds namespace_sem and the only references to vfsmount
948         are in lists that can't be traversed without namespace_sem.
949 
950 8C Algorithm:
951 
952         The crux of the implementation resides in rbind/move operation.
953 
954         The overall algorithm breaks the operation into 3 phases: (look at
955         attach_recursive_mnt() and propagate_mnt())
956 
957         1. prepare phase.
958         2. commit phases.
959         3. abort phases.
960 
961         Prepare phase:
962 
963         for each mount in the source tree:
964 
965                    a) Create the necessary number of mount trees to
966                         be attached to each of the mounts that receive
967                         propagation from the destination mount.
968                    b) Do not attach any of the trees to its destination.
969                       However note down its ->mnt_parent and ->mnt_mountpoint
970                    c) Link all the new mounts to form a propagation tree that
971                       is identical to the propagation tree of the destination
972                       mount.
973 
974                    If this phase is successful, there should be 'n' new
975                    propagation trees; where 'n' is the number of mounts in the
976                    source tree.  Go to the commit phase
977 
978                    Also there should be 'm' new mount trees, where 'm' is
979                    the number of mounts to which the destination mount
980                    propagates to.
981 
982                    if any memory allocations fail, go to the abort phase.
983 
984         Commit phase
985                 attach each of the mount trees to their corresponding
986                 destination mounts.
987 
988         Abort phase
989                 delete all the newly created trees.
990 
991         .. Note::
992            all the propagation related functionality resides in the file pnode.c
993 
994 
995 ------------------------------------------------------------------------
996 
997 version 0.1  (created the initial document, Ram Pai linuxram@us.ibm.com)
998 
999 version 0.2  (Incorporated comments from Al Viro)

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php