1 ==================== 2 The robust futex ABI 3 ==================== 4 5 :Author: Started by Paul Jackson <pj@sgi.com> 6 7 8 Robust_futexes provide a mechanism that is use 9 futexes, for kernel assist of cleanup of held 10 11 The interesting data as to what futexes a thre 12 linked list in user space, where it can be upd 13 are taken and dropped, without kernel interven 14 kernel intervention required for robust_futexe 15 required for futexes is: 16 17 1) a one time call, per thread, to tell the k 18 held robust_futexes begins, and 19 2) internal kernel code at exit, to handle an 20 by the exiting thread. 21 22 The existing normal futexes already provide a 23 mechanism, which handles uncontested locking w 24 call, and handles contested locking by maintai 25 threads in the kernel. Options on the sys_fut 26 waiting on a particular futex, and waking up t 27 particular futex. 28 29 For robust_futexes to work, the user code (typ 30 as glibc linked with the application) has to m 31 necessary list elements exactly as the kernel 32 to do so, then improperly listed locks will no 33 probably causing deadlock or other such failur 34 waiting on the same locks. 35 36 A thread that anticipates possibly using robus 37 issue the system call:: 38 39 asmlinkage long 40 sys_set_robust_list(struct robust_list_hea 41 42 The pointer 'head' points to a structure in th 43 consisting of three words. Each word is 32 bi 44 bits on 64 bit arch's, and local byte order. 45 its own thread private 'head'. 46 47 If a thread is running in 32 bit compatibility 48 kernel, then it can actually have two such str 49 words for 32 bit compatibility mode, and one u 50 bit native mode. The kernel, if it is a 64 bi 51 compatibility mode, will attempt to process bo 52 exit, if the corresponding sys_set_robust_list 53 setup that list. 54 55 The first word in the memory structure at 'h 56 pointer to a single linked list of 'lock ent 57 as described below. If the list is empty, t 58 to itself, 'head'. The last 'lock entry' po 59 60 The second word, called 'offset', specifies 61 address of the associated 'lock entry', plus 62 be called the 'lock word', from that 'lock e 63 is always a 32 bit word, unlike the other wo 64 word' holds 2 flag bits in the upper 2 bits, 65 of the thread holding the lock in the bottom 66 below for a description of the flag bits. 67 68 The third word, called 'list_op_pending', co 69 the address of the 'lock entry', during list 70 and is needed to correctly resolve races sho 71 in the middle of a locking or unlocking oper 72 73 Each 'lock entry' on the single linked list st 74 of just a single word, pointing to the next 'l 75 'head' if there are no more entries. In addit 76 entry', at an offset from the 'lock entry' spe 77 word, is one 'lock word'. 78 79 The 'lock word' is always 32 bits, and is inte 80 lock variable used by the futex mechanism, in 81 robust_futexes. The kernel will only be able 82 waiting for a lock on a threads exit if that n 83 mechanism to register the address of that 'loc 84 85 For each futex lock currently held by a thread 86 robust_futex support for exit cleanup of that 87 'lock entry' on this list, with its associated 88 specified 'offset'. Should a thread die while 89 the kernel will walk this list, mark any such 90 indicating their holder died, and wakeup the n 91 that lock using the futex mechanism. 92 93 When a thread has invoked the above system cal 94 anticipates using robust_futexes, the kernel s 95 pointer for that task. The task may retrieve 96 using the system call:: 97 98 asmlinkage long 99 sys_get_robust_list(int pid, struct robust 100 size_t __user *len_ptr 101 102 It is anticipated that threads will use robust 103 larger, user level locking structures, one per 104 robust_futex mechanism doesn't care what else 105 long as the 'offset' to the 'lock word' is the 106 robust_futexes used by that thread. The threa 107 it currently holds using the 'lock entry' poin 108 other links between the locks, such as the rev 109 linked list, but that doesn't matter to the ke 110 111 By keeping its locks linked this way, on a lis 112 pointer known to the kernel, the kernel can pr 113 essential service available for robust_futexes 114 up locks held at the time of (a perhaps unexpe 115 116 Actual locking and unlocking, during normal op 117 entirely by user level code in the contending 118 existing futex mechanism to wait for, and wake 119 only essential involvement in robust_futexes i 120 list 'head' is, and to walk the list on thread 121 still held by the departing thread, as describ 122 123 There may exist thousands of futex lock struct 124 memory, on various data structures, at a given 125 lock structures for locks currently held by th 126 that thread's robust_futex linked lock list a 127 128 A given futex lock structure in a user shared 129 at different times by any of the threads with 130 thread currently holding such a lock, if any, 131 TID in the lower 30 bits of the 'lock word'. 132 133 When adding or removing a lock from its list o 134 the kernel to correctly handle lock cleanup re 135 exits (perhaps it gets an unexpected signal 9 136 manipulating this list), the user code must ob 137 protocol on 'lock entry' insertion and removal 138 139 On insertion: 140 141 1) set the 'list_op_pending' word to the addr 142 to be inserted, 143 2) acquire the futex lock, 144 3) add the lock entry, with its thread id (TI 145 of the 'lock word', to the linked list sta 146 4) clear the 'list_op_pending' word. 147 148 On removal: 149 150 1) set the 'list_op_pending' word to the addr 151 to be removed, 152 2) remove the lock entry for this lock from t 153 3) release the futex lock, and 154 4) clear the 'lock_op_pending' word. 155 156 On exit, the kernel will consider the address 157 'list_op_pending' and the address of each 'loc 158 the list starting at 'head'. For each such ad 159 bits of the 'lock word' at offset 'offset' fro 160 exiting threads TID, then the kernel will do t 161 162 1) if bit 31 (0x80000000) is set in that word 163 wakeup on that address, which will waken t 164 used to the futex mechanism to wait on tha 165 2) atomically set bit 30 (0x40000000) in the 166 167 In the above, bit 31 was set by futex waiters 168 they were waiting, and bit 30 is set by the ke 169 lock owner died holding the lock. 170 171 The kernel exit code will silently stop scanni 172 any point: 173 174 1) the 'head' pointer or an subsequent linked 175 is not a valid address of a user space wor 176 2) the calculated location of the 'lock word' 177 'offset') is not the valid address of a 32 178 word 179 3) if the list contains more than 1 million ( 180 future kernel configuration changes) eleme 181 182 When the kernel sees a list entry whose 'lock 183 current threads TID in the lower 30 bits, it d 184 entry, and goes on to the next entry.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.