1 ================================== 1 ================================== 2 RT-mutex subsystem with PI support 2 RT-mutex subsystem with PI support 3 ================================== 3 ================================== 4 4 5 RT-mutexes with priority inheritance are used 5 RT-mutexes with priority inheritance are used to support PI-futexes, 6 which enable pthread_mutex_t priority inherita 6 which enable pthread_mutex_t priority inheritance attributes 7 (PTHREAD_PRIO_INHERIT). [See Documentation/loc !! 7 (PTHREAD_PRIO_INHERIT). [See Documentation/pi-futex.txt for more details 8 about PI-futexes.] 8 about PI-futexes.] 9 9 10 This technology was developed in the -rt tree 10 This technology was developed in the -rt tree and streamlined for 11 pthread_mutex support. 11 pthread_mutex support. 12 12 13 Basic principles: 13 Basic principles: 14 ----------------- 14 ----------------- 15 15 16 RT-mutexes extend the semantics of simple mute 16 RT-mutexes extend the semantics of simple mutexes by the priority 17 inheritance protocol. 17 inheritance protocol. 18 18 19 A low priority owner of a rt-mutex inherits th 19 A low priority owner of a rt-mutex inherits the priority of a higher 20 priority waiter until the rt-mutex is released 20 priority waiter until the rt-mutex is released. If the temporarily 21 boosted owner blocks on a rt-mutex itself it p 21 boosted owner blocks on a rt-mutex itself it propagates the priority 22 boosting to the owner of the other rt_mutex it 22 boosting to the owner of the other rt_mutex it gets blocked on. The 23 priority boosting is immediately removed once 23 priority boosting is immediately removed once the rt_mutex has been 24 unlocked. 24 unlocked. 25 25 26 This approach allows us to shorten the block o 26 This approach allows us to shorten the block of high-prio tasks on 27 mutexes which protect shared resources. Priori 27 mutexes which protect shared resources. Priority inheritance is not a 28 magic bullet for poorly designed applications, 28 magic bullet for poorly designed applications, but it allows 29 well-designed applications to use userspace lo 29 well-designed applications to use userspace locks in critical parts of 30 an high priority thread, without losing determ 30 an high priority thread, without losing determinism. 31 31 32 The enqueueing of the waiters into the rtmutex 32 The enqueueing of the waiters into the rtmutex waiter tree is done in 33 priority order. For same priorities FIFO order 33 priority order. For same priorities FIFO order is chosen. For each 34 rtmutex, only the top priority waiter is enque 34 rtmutex, only the top priority waiter is enqueued into the owner's 35 priority waiters tree. This tree too queues in 35 priority waiters tree. This tree too queues in priority order. Whenever 36 the top priority waiter of a task changes (for 36 the top priority waiter of a task changes (for example it timed out or 37 got a signal), the priority of the owner task 37 got a signal), the priority of the owner task is readjusted. The 38 priority enqueueing is handled by "pi_waiters" 38 priority enqueueing is handled by "pi_waiters". 39 39 40 RT-mutexes are optimized for fastpath operatio 40 RT-mutexes are optimized for fastpath operations and have no internal 41 locking overhead when locking an uncontended m 41 locking overhead when locking an uncontended mutex or unlocking a mutex 42 without waiters. The optimized fastpath operat 42 without waiters. The optimized fastpath operations require cmpxchg 43 support. [If that is not available then the rt 43 support. [If that is not available then the rt-mutex internal spinlock 44 is used] 44 is used] 45 45 46 The state of the rt-mutex is tracked via the o 46 The state of the rt-mutex is tracked via the owner field of the rt-mutex 47 structure: 47 structure: 48 48 49 lock->owner holds the task_struct pointer of t 49 lock->owner holds the task_struct pointer of the owner. Bit 0 is used to 50 keep track of the "lock has waiters" state: 50 keep track of the "lock has waiters" state: 51 51 52 ============ ======= ======================== 52 ============ ======= ================================================ 53 owner bit0 Notes 53 owner bit0 Notes 54 ============ ======= ======================== 54 ============ ======= ================================================ 55 NULL 0 lock is free (fast acqui 55 NULL 0 lock is free (fast acquire possible) 56 NULL 1 lock is free and has wai 56 NULL 1 lock is free and has waiters and the top waiter 57 is going to take the loc 57 is going to take the lock [1]_ 58 taskpointer 0 lock is held (fast relea 58 taskpointer 0 lock is held (fast release possible) 59 taskpointer 1 lock is held and has wai 59 taskpointer 1 lock is held and has waiters [2]_ 60 ============ ======= ======================== 60 ============ ======= ================================================ 61 61 62 The fast atomic compare exchange based acquire 62 The fast atomic compare exchange based acquire and release is only 63 possible when bit 0 of lock->owner is 0. 63 possible when bit 0 of lock->owner is 0. 64 64 65 .. [1] It also can be a transitional state whe 65 .. [1] It also can be a transitional state when grabbing the lock 66 with ->wait_lock is held. To prevent an 66 with ->wait_lock is held. To prevent any fast path cmpxchg to the lock, 67 we need to set the bit0 before looking 67 we need to set the bit0 before looking at the lock, and the owner may 68 be NULL in this small time, hence this 68 be NULL in this small time, hence this can be a transitional state. 69 69 70 .. [2] There is a small time when bit 0 is set 70 .. [2] There is a small time when bit 0 is set but there are no 71 waiters. This can happen when grabbing 71 waiters. This can happen when grabbing the lock in the slow path. 72 To prevent a cmpxchg of the owner relea 72 To prevent a cmpxchg of the owner releasing the lock, we need to 73 set this bit before looking at the lock 73 set this bit before looking at the lock. 74 74 75 BTW, there is still technically a "Pending Own 75 BTW, there is still technically a "Pending Owner", it's just not called 76 that anymore. The pending owner happens to be 76 that anymore. The pending owner happens to be the top_waiter of a lock 77 that has no owner and has been woken up to gra 77 that has no owner and has been woken up to grab the lock.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.