~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/locking/robust-futex-ABI.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/locking/robust-futex-ABI.rst (Version linux-6.11.5) and /Documentation/locking/robust-futex-ABI.rst (Version linux-5.16.20)


  1 ====================                                1 ====================
  2 The robust futex ABI                                2 The robust futex ABI
  3 ====================                                3 ====================
  4                                                     4 
  5 :Author: Started by Paul Jackson <pj@sgi.com>        5 :Author: Started by Paul Jackson <pj@sgi.com>
  6                                                     6 
  7                                                     7 
  8 Robust_futexes provide a mechanism that is use      8 Robust_futexes provide a mechanism that is used in addition to normal
  9 futexes, for kernel assist of cleanup of held       9 futexes, for kernel assist of cleanup of held locks on task exit.
 10                                                    10 
 11 The interesting data as to what futexes a thre     11 The interesting data as to what futexes a thread is holding is kept on a
 12 linked list in user space, where it can be upd     12 linked list in user space, where it can be updated efficiently as locks
 13 are taken and dropped, without kernel interven     13 are taken and dropped, without kernel intervention.  The only additional
 14 kernel intervention required for robust_futexe     14 kernel intervention required for robust_futexes above and beyond what is
 15 required for futexes is:                           15 required for futexes is:
 16                                                    16 
 17  1) a one time call, per thread, to tell the k     17  1) a one time call, per thread, to tell the kernel where its list of
 18     held robust_futexes begins, and                18     held robust_futexes begins, and
 19  2) internal kernel code at exit, to handle an     19  2) internal kernel code at exit, to handle any listed locks held
 20     by the exiting thread.                         20     by the exiting thread.
 21                                                    21 
 22 The existing normal futexes already provide a      22 The existing normal futexes already provide a "Fast Userspace Locking"
 23 mechanism, which handles uncontested locking w     23 mechanism, which handles uncontested locking without needing a system
 24 call, and handles contested locking by maintai     24 call, and handles contested locking by maintaining a list of waiting
 25 threads in the kernel.  Options on the sys_fut     25 threads in the kernel.  Options on the sys_futex(2) system call support
 26 waiting on a particular futex, and waking up t     26 waiting on a particular futex, and waking up the next waiter on a
 27 particular futex.                                  27 particular futex.
 28                                                    28 
 29 For robust_futexes to work, the user code (typ     29 For robust_futexes to work, the user code (typically in a library such
 30 as glibc linked with the application) has to m     30 as glibc linked with the application) has to manage and place the
 31 necessary list elements exactly as the kernel      31 necessary list elements exactly as the kernel expects them.  If it fails
 32 to do so, then improperly listed locks will no     32 to do so, then improperly listed locks will not be cleaned up on exit,
 33 probably causing deadlock or other such failur     33 probably causing deadlock or other such failure of the other threads
 34 waiting on the same locks.                         34 waiting on the same locks.
 35                                                    35 
 36 A thread that anticipates possibly using robus     36 A thread that anticipates possibly using robust_futexes should first
 37 issue the system call::                            37 issue the system call::
 38                                                    38 
 39     asmlinkage long                                39     asmlinkage long
 40     sys_set_robust_list(struct robust_list_hea     40     sys_set_robust_list(struct robust_list_head __user *head, size_t len);
 41                                                    41 
 42 The pointer 'head' points to a structure in th     42 The pointer 'head' points to a structure in the threads address space
 43 consisting of three words.  Each word is 32 bi     43 consisting of three words.  Each word is 32 bits on 32 bit arch's, or 64
 44 bits on 64 bit arch's, and local byte order.       44 bits on 64 bit arch's, and local byte order.  Each thread should have
 45 its own thread private 'head'.                     45 its own thread private 'head'.
 46                                                    46 
 47 If a thread is running in 32 bit compatibility     47 If a thread is running in 32 bit compatibility mode on a 64 native arch
 48 kernel, then it can actually have two such str     48 kernel, then it can actually have two such structures - one using 32 bit
 49 words for 32 bit compatibility mode, and one u     49 words for 32 bit compatibility mode, and one using 64 bit words for 64
 50 bit native mode.  The kernel, if it is a 64 bi     50 bit native mode.  The kernel, if it is a 64 bit kernel supporting 32 bit
 51 compatibility mode, will attempt to process bo     51 compatibility mode, will attempt to process both lists on each task
 52 exit, if the corresponding sys_set_robust_list     52 exit, if the corresponding sys_set_robust_list() call has been made to
 53 setup that list.                                   53 setup that list.
 54                                                    54 
 55   The first word in the memory structure at 'h     55   The first word in the memory structure at 'head' contains a
 56   pointer to a single linked list of 'lock ent     56   pointer to a single linked list of 'lock entries', one per lock,
 57   as described below.  If the list is empty, t     57   as described below.  If the list is empty, the pointer will point
 58   to itself, 'head'.  The last 'lock entry' po     58   to itself, 'head'.  The last 'lock entry' points back to the 'head'.
 59                                                    59 
 60   The second word, called 'offset', specifies      60   The second word, called 'offset', specifies the offset from the
 61   address of the associated 'lock entry', plus     61   address of the associated 'lock entry', plus or minus, of what will
 62   be called the 'lock word', from that 'lock e     62   be called the 'lock word', from that 'lock entry'.  The 'lock word'
 63   is always a 32 bit word, unlike the other wo     63   is always a 32 bit word, unlike the other words above.  The 'lock
 64   word' holds 2 flag bits in the upper 2 bits,     64   word' holds 2 flag bits in the upper 2 bits, and the thread id (TID)
 65   of the thread holding the lock in the bottom     65   of the thread holding the lock in the bottom 30 bits.  See further
 66   below for a description of the flag bits.        66   below for a description of the flag bits.
 67                                                    67 
 68   The third word, called 'list_op_pending', co     68   The third word, called 'list_op_pending', contains transient copy of
 69   the address of the 'lock entry', during list     69   the address of the 'lock entry', during list insertion and removal,
 70   and is needed to correctly resolve races sho     70   and is needed to correctly resolve races should a thread exit while
 71   in the middle of a locking or unlocking oper     71   in the middle of a locking or unlocking operation.
 72                                                    72 
 73 Each 'lock entry' on the single linked list st     73 Each 'lock entry' on the single linked list starting at 'head' consists
 74 of just a single word, pointing to the next 'l     74 of just a single word, pointing to the next 'lock entry', or back to
 75 'head' if there are no more entries.  In addit     75 'head' if there are no more entries.  In addition, nearby to each 'lock
 76 entry', at an offset from the 'lock entry' spe     76 entry', at an offset from the 'lock entry' specified by the 'offset'
 77 word, is one 'lock word'.                          77 word, is one 'lock word'.
 78                                                    78 
 79 The 'lock word' is always 32 bits, and is inte     79 The 'lock word' is always 32 bits, and is intended to be the same 32 bit
 80 lock variable used by the futex mechanism, in      80 lock variable used by the futex mechanism, in conjunction with
 81 robust_futexes.  The kernel will only be able      81 robust_futexes.  The kernel will only be able to wakeup the next thread
 82 waiting for a lock on a threads exit if that n     82 waiting for a lock on a threads exit if that next thread used the futex
 83 mechanism to register the address of that 'loc     83 mechanism to register the address of that 'lock word' with the kernel.
 84                                                    84 
 85 For each futex lock currently held by a thread     85 For each futex lock currently held by a thread, if it wants this
 86 robust_futex support for exit cleanup of that      86 robust_futex support for exit cleanup of that lock, it should have one
 87 'lock entry' on this list, with its associated     87 'lock entry' on this list, with its associated 'lock word' at the
 88 specified 'offset'.  Should a thread die while     88 specified 'offset'.  Should a thread die while holding any such locks,
 89 the kernel will walk this list, mark any such      89 the kernel will walk this list, mark any such locks with a bit
 90 indicating their holder died, and wakeup the n     90 indicating their holder died, and wakeup the next thread waiting for
 91 that lock using the futex mechanism.               91 that lock using the futex mechanism.
 92                                                    92 
 93 When a thread has invoked the above system cal     93 When a thread has invoked the above system call to indicate it
 94 anticipates using robust_futexes, the kernel s     94 anticipates using robust_futexes, the kernel stores the passed in 'head'
 95 pointer for that task.  The task may retrieve      95 pointer for that task.  The task may retrieve that value later on by
 96 using the system call::                            96 using the system call::
 97                                                    97 
 98     asmlinkage long                                98     asmlinkage long
 99     sys_get_robust_list(int pid, struct robust     99     sys_get_robust_list(int pid, struct robust_list_head __user **head_ptr,
100                         size_t __user *len_ptr    100                         size_t __user *len_ptr);
101                                                   101 
102 It is anticipated that threads will use robust    102 It is anticipated that threads will use robust_futexes embedded in
103 larger, user level locking structures, one per    103 larger, user level locking structures, one per lock.  The kernel
104 robust_futex mechanism doesn't care what else     104 robust_futex mechanism doesn't care what else is in that structure, so
105 long as the 'offset' to the 'lock word' is the    105 long as the 'offset' to the 'lock word' is the same for all
106 robust_futexes used by that thread.  The threa    106 robust_futexes used by that thread.  The thread should link those locks
107 it currently holds using the 'lock entry' poin    107 it currently holds using the 'lock entry' pointers.  It may also have
108 other links between the locks, such as the rev    108 other links between the locks, such as the reverse side of a double
109 linked list, but that doesn't matter to the ke    109 linked list, but that doesn't matter to the kernel.
110                                                   110 
111 By keeping its locks linked this way, on a lis    111 By keeping its locks linked this way, on a list starting with a 'head'
112 pointer known to the kernel, the kernel can pr    112 pointer known to the kernel, the kernel can provide to a thread the
113 essential service available for robust_futexes    113 essential service available for robust_futexes, which is to help clean
114 up locks held at the time of (a perhaps unexpe    114 up locks held at the time of (a perhaps unexpectedly) exit.
115                                                   115 
116 Actual locking and unlocking, during normal op    116 Actual locking and unlocking, during normal operations, is handled
117 entirely by user level code in the contending     117 entirely by user level code in the contending threads, and by the
118 existing futex mechanism to wait for, and wake    118 existing futex mechanism to wait for, and wakeup, locks.  The kernels
119 only essential involvement in robust_futexes i    119 only essential involvement in robust_futexes is to remember where the
120 list 'head' is, and to walk the list on thread    120 list 'head' is, and to walk the list on thread exit, handling locks
121 still held by the departing thread, as describ    121 still held by the departing thread, as described below.
122                                                   122 
123 There may exist thousands of futex lock struct    123 There may exist thousands of futex lock structures in a threads shared
124 memory, on various data structures, at a given    124 memory, on various data structures, at a given point in time. Only those
125 lock structures for locks currently held by th    125 lock structures for locks currently held by that thread should be on
126 that thread's robust_futex linked lock list a     126 that thread's robust_futex linked lock list a given time.
127                                                   127 
128 A given futex lock structure in a user shared     128 A given futex lock structure in a user shared memory region may be held
129 at different times by any of the threads with     129 at different times by any of the threads with access to that region. The
130 thread currently holding such a lock, if any,     130 thread currently holding such a lock, if any, is marked with the threads
131 TID in the lower 30 bits of the 'lock word'.      131 TID in the lower 30 bits of the 'lock word'.
132                                                   132 
133 When adding or removing a lock from its list o    133 When adding or removing a lock from its list of held locks, in order for
134 the kernel to correctly handle lock cleanup re    134 the kernel to correctly handle lock cleanup regardless of when the task
135 exits (perhaps it gets an unexpected signal 9     135 exits (perhaps it gets an unexpected signal 9 in the middle of
136 manipulating this list), the user code must ob    136 manipulating this list), the user code must observe the following
137 protocol on 'lock entry' insertion and removal    137 protocol on 'lock entry' insertion and removal:
138                                                   138 
139 On insertion:                                     139 On insertion:
140                                                   140 
141  1) set the 'list_op_pending' word to the addr    141  1) set the 'list_op_pending' word to the address of the 'lock entry'
142     to be inserted,                               142     to be inserted,
143  2) acquire the futex lock,                       143  2) acquire the futex lock,
144  3) add the lock entry, with its thread id (TI    144  3) add the lock entry, with its thread id (TID) in the bottom 30 bits
145     of the 'lock word', to the linked list sta    145     of the 'lock word', to the linked list starting at 'head', and
146  4) clear the 'list_op_pending' word.             146  4) clear the 'list_op_pending' word.
147                                                   147 
148 On removal:                                       148 On removal:
149                                                   149 
150  1) set the 'list_op_pending' word to the addr    150  1) set the 'list_op_pending' word to the address of the 'lock entry'
151     to be removed,                                151     to be removed,
152  2) remove the lock entry for this lock from t    152  2) remove the lock entry for this lock from the 'head' list,
153  3) release the futex lock, and                   153  3) release the futex lock, and
154  4) clear the 'lock_op_pending' word.             154  4) clear the 'lock_op_pending' word.
155                                                   155 
156 On exit, the kernel will consider the address     156 On exit, the kernel will consider the address stored in
157 'list_op_pending' and the address of each 'loc    157 'list_op_pending' and the address of each 'lock word' found by walking
158 the list starting at 'head'.  For each such ad    158 the list starting at 'head'.  For each such address, if the bottom 30
159 bits of the 'lock word' at offset 'offset' fro    159 bits of the 'lock word' at offset 'offset' from that address equals the
160 exiting threads TID, then the kernel will do t    160 exiting threads TID, then the kernel will do two things:
161                                                   161 
162  1) if bit 31 (0x80000000) is set in that word    162  1) if bit 31 (0x80000000) is set in that word, then attempt a futex
163     wakeup on that address, which will waken t    163     wakeup on that address, which will waken the next thread that has
164     used to the futex mechanism to wait on tha    164     used to the futex mechanism to wait on that address, and
165  2) atomically set  bit 30 (0x40000000) in the    165  2) atomically set  bit 30 (0x40000000) in the 'lock word'.
166                                                   166 
167 In the above, bit 31 was set by futex waiters     167 In the above, bit 31 was set by futex waiters on that lock to indicate
168 they were waiting, and bit 30 is set by the ke    168 they were waiting, and bit 30 is set by the kernel to indicate that the
169 lock owner died holding the lock.                 169 lock owner died holding the lock.
170                                                   170 
171 The kernel exit code will silently stop scanni    171 The kernel exit code will silently stop scanning the list further if at
172 any point:                                        172 any point:
173                                                   173 
174  1) the 'head' pointer or an subsequent linked    174  1) the 'head' pointer or an subsequent linked list pointer
175     is not a valid address of a user space wor    175     is not a valid address of a user space word
176  2) the calculated location of the 'lock word'    176  2) the calculated location of the 'lock word' (address plus
177     'offset') is not the valid address of a 32    177     'offset') is not the valid address of a 32 bit user space
178     word                                          178     word
179  3) if the list contains more than 1 million (    179  3) if the list contains more than 1 million (subject to
180     future kernel configuration changes) eleme    180     future kernel configuration changes) elements.
181                                                   181 
182 When the kernel sees a list entry whose 'lock     182 When the kernel sees a list entry whose 'lock word' doesn't have the
183 current threads TID in the lower 30 bits, it d    183 current threads TID in the lower 30 bits, it does nothing with that
184 entry, and goes on to the next entry.             184 entry, and goes on to the next entry.
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php