~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/locking/locktypes.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 .. SPDX-License-Identifier: GPL-2.0
  2 
  3 .. _kernel_hacking_locktypes:
  4 
  5 ==========================
  6 Lock types and their rules
  7 ==========================
  8 
  9 Introduction
 10 ============
 11 
 12 The kernel provides a variety of locking primitives which can be divided
 13 into three categories:
 14 
 15  - Sleeping locks
 16  - CPU local locks
 17  - Spinning locks
 18 
 19 This document conceptually describes these lock types and provides rules
 20 for their nesting, including the rules for use under PREEMPT_RT.
 21 
 22 
 23 Lock categories
 24 ===============
 25 
 26 Sleeping locks
 27 --------------
 28 
 29 Sleeping locks can only be acquired in preemptible task context.
 30 
 31 Although implementations allow try_lock() from other contexts, it is
 32 necessary to carefully evaluate the safety of unlock() as well as of
 33 try_lock().  Furthermore, it is also necessary to evaluate the debugging
 34 versions of these primitives.  In short, don't acquire sleeping locks from
 35 other contexts unless there is no other option.
 36 
 37 Sleeping lock types:
 38 
 39  - mutex
 40  - rt_mutex
 41  - semaphore
 42  - rw_semaphore
 43  - ww_mutex
 44  - percpu_rw_semaphore
 45 
 46 On PREEMPT_RT kernels, these lock types are converted to sleeping locks:
 47 
 48  - local_lock
 49  - spinlock_t
 50  - rwlock_t
 51 
 52 
 53 CPU local locks
 54 ---------------
 55 
 56  - local_lock
 57 
 58 On non-PREEMPT_RT kernels, local_lock functions are wrappers around
 59 preemption and interrupt disabling primitives. Contrary to other locking
 60 mechanisms, disabling preemption or interrupts are pure CPU local
 61 concurrency control mechanisms and not suited for inter-CPU concurrency
 62 control.
 63 
 64 
 65 Spinning locks
 66 --------------
 67 
 68  - raw_spinlock_t
 69  - bit spinlocks
 70 
 71 On non-PREEMPT_RT kernels, these lock types are also spinning locks:
 72 
 73  - spinlock_t
 74  - rwlock_t
 75 
 76 Spinning locks implicitly disable preemption and the lock / unlock functions
 77 can have suffixes which apply further protections:
 78 
 79  ===================  ====================================================
 80  _bh()                Disable / enable bottom halves (soft interrupts)
 81  _irq()               Disable / enable interrupts
 82  _irqsave/restore()   Save and disable / restore interrupt disabled state
 83  ===================  ====================================================
 84 
 85 
 86 Owner semantics
 87 ===============
 88 
 89 The aforementioned lock types except semaphores have strict owner
 90 semantics:
 91 
 92   The context (task) that acquired the lock must release it.
 93 
 94 rw_semaphores have a special interface which allows non-owner release for
 95 readers.
 96 
 97 
 98 rtmutex
 99 =======
100 
101 RT-mutexes are mutexes with support for priority inheritance (PI).
102 
103 PI has limitations on non-PREEMPT_RT kernels due to preemption and
104 interrupt disabled sections.
105 
106 PI clearly cannot preempt preemption-disabled or interrupt-disabled
107 regions of code, even on PREEMPT_RT kernels.  Instead, PREEMPT_RT kernels
108 execute most such regions of code in preemptible task context, especially
109 interrupt handlers and soft interrupts.  This conversion allows spinlock_t
110 and rwlock_t to be implemented via RT-mutexes.
111 
112 
113 semaphore
114 =========
115 
116 semaphore is a counting semaphore implementation.
117 
118 Semaphores are often used for both serialization and waiting, but new use
119 cases should instead use separate serialization and wait mechanisms, such
120 as mutexes and completions.
121 
122 semaphores and PREEMPT_RT
123 ----------------------------
124 
125 PREEMPT_RT does not change the semaphore implementation because counting
126 semaphores have no concept of owners, thus preventing PREEMPT_RT from
127 providing priority inheritance for semaphores.  After all, an unknown
128 owner cannot be boosted. As a consequence, blocking on semaphores can
129 result in priority inversion.
130 
131 
132 rw_semaphore
133 ============
134 
135 rw_semaphore is a multiple readers and single writer lock mechanism.
136 
137 On non-PREEMPT_RT kernels the implementation is fair, thus preventing
138 writer starvation.
139 
140 rw_semaphore complies by default with the strict owner semantics, but there
141 exist special-purpose interfaces that allow non-owner release for readers.
142 These interfaces work independent of the kernel configuration.
143 
144 rw_semaphore and PREEMPT_RT
145 ---------------------------
146 
147 PREEMPT_RT kernels map rw_semaphore to a separate rt_mutex-based
148 implementation, thus changing the fairness:
149 
150  Because an rw_semaphore writer cannot grant its priority to multiple
151  readers, a preempted low-priority reader will continue holding its lock,
152  thus starving even high-priority writers.  In contrast, because readers
153  can grant their priority to a writer, a preempted low-priority writer will
154  have its priority boosted until it releases the lock, thus preventing that
155  writer from starving readers.
156 
157 
158 local_lock
159 ==========
160 
161 local_lock provides a named scope to critical sections which are protected
162 by disabling preemption or interrupts.
163 
164 On non-PREEMPT_RT kernels local_lock operations map to the preemption and
165 interrupt disabling and enabling primitives:
166 
167  ===============================  ======================
168  local_lock(&llock)               preempt_disable()
169  local_unlock(&llock)             preempt_enable()
170  local_lock_irq(&llock)           local_irq_disable()
171  local_unlock_irq(&llock)         local_irq_enable()
172  local_lock_irqsave(&llock)       local_irq_save()
173  local_unlock_irqrestore(&llock)  local_irq_restore()
174  ===============================  ======================
175 
176 The named scope of local_lock has two advantages over the regular
177 primitives:
178 
179   - The lock name allows static analysis and is also a clear documentation
180     of the protection scope while the regular primitives are scopeless and
181     opaque.
182 
183   - If lockdep is enabled the local_lock gains a lockmap which allows to
184     validate the correctness of the protection. This can detect cases where
185     e.g. a function using preempt_disable() as protection mechanism is
186     invoked from interrupt or soft-interrupt context. Aside of that
187     lockdep_assert_held(&llock) works as with any other locking primitive.
188 
189 local_lock and PREEMPT_RT
190 -------------------------
191 
192 PREEMPT_RT kernels map local_lock to a per-CPU spinlock_t, thus changing
193 semantics:
194 
195   - All spinlock_t changes also apply to local_lock.
196 
197 local_lock usage
198 ----------------
199 
200 local_lock should be used in situations where disabling preemption or
201 interrupts is the appropriate form of concurrency control to protect
202 per-CPU data structures on a non PREEMPT_RT kernel.
203 
204 local_lock is not suitable to protect against preemption or interrupts on a
205 PREEMPT_RT kernel due to the PREEMPT_RT specific spinlock_t semantics.
206 
207 
208 raw_spinlock_t and spinlock_t
209 =============================
210 
211 raw_spinlock_t
212 --------------
213 
214 raw_spinlock_t is a strict spinning lock implementation in all kernels,
215 including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
216 core code, low-level interrupt handling and places where disabling
217 preemption or interrupts is required, for example, to safely access
218 hardware state.  raw_spinlock_t can sometimes also be used when the
219 critical section is tiny, thus avoiding RT-mutex overhead.
220 
221 spinlock_t
222 ----------
223 
224 The semantics of spinlock_t change with the state of PREEMPT_RT.
225 
226 On a non-PREEMPT_RT kernel spinlock_t is mapped to raw_spinlock_t and has
227 exactly the same semantics.
228 
229 spinlock_t and PREEMPT_RT
230 -------------------------
231 
232 On a PREEMPT_RT kernel spinlock_t is mapped to a separate implementation
233 based on rt_mutex which changes the semantics:
234 
235  - Preemption is not disabled.
236 
237  - The hard interrupt related suffixes for spin_lock / spin_unlock
238    operations (_irq, _irqsave / _irqrestore) do not affect the CPU's
239    interrupt disabled state.
240 
241  - The soft interrupt related suffix (_bh()) still disables softirq
242    handlers.
243 
244    Non-PREEMPT_RT kernels disable preemption to get this effect.
245 
246    PREEMPT_RT kernels use a per-CPU lock for serialization which keeps
247    preemption enabled. The lock disables softirq handlers and also
248    prevents reentrancy due to task preemption.
249 
250 PREEMPT_RT kernels preserve all other spinlock_t semantics:
251 
252  - Tasks holding a spinlock_t do not migrate.  Non-PREEMPT_RT kernels
253    avoid migration by disabling preemption.  PREEMPT_RT kernels instead
254    disable migration, which ensures that pointers to per-CPU variables
255    remain valid even if the task is preempted.
256 
257  - Task state is preserved across spinlock acquisition, ensuring that the
258    task-state rules apply to all kernel configurations.  Non-PREEMPT_RT
259    kernels leave task state untouched.  However, PREEMPT_RT must change
260    task state if the task blocks during acquisition.  Therefore, it saves
261    the current task state before blocking and the corresponding lock wakeup
262    restores it, as shown below::
263 
264     task->state = TASK_INTERRUPTIBLE
265      lock()
266        block()
267          task->saved_state = task->state
268          task->state = TASK_UNINTERRUPTIBLE
269          schedule()
270                                         lock wakeup
271                                           task->state = task->saved_state
272 
273    Other types of wakeups would normally unconditionally set the task state
274    to RUNNING, but that does not work here because the task must remain
275    blocked until the lock becomes available.  Therefore, when a non-lock
276    wakeup attempts to awaken a task blocked waiting for a spinlock, it
277    instead sets the saved state to RUNNING.  Then, when the lock
278    acquisition completes, the lock wakeup sets the task state to the saved
279    state, in this case setting it to RUNNING::
280 
281     task->state = TASK_INTERRUPTIBLE
282      lock()
283        block()
284          task->saved_state = task->state
285          task->state = TASK_UNINTERRUPTIBLE
286          schedule()
287                                         non lock wakeup
288                                           task->saved_state = TASK_RUNNING
289 
290                                         lock wakeup
291                                           task->state = task->saved_state
292 
293    This ensures that the real wakeup cannot be lost.
294 
295 
296 rwlock_t
297 ========
298 
299 rwlock_t is a multiple readers and single writer lock mechanism.
300 
301 Non-PREEMPT_RT kernels implement rwlock_t as a spinning lock and the
302 suffix rules of spinlock_t apply accordingly. The implementation is fair,
303 thus preventing writer starvation.
304 
305 rwlock_t and PREEMPT_RT
306 -----------------------
307 
308 PREEMPT_RT kernels map rwlock_t to a separate rt_mutex-based
309 implementation, thus changing semantics:
310 
311  - All the spinlock_t changes also apply to rwlock_t.
312 
313  - Because an rwlock_t writer cannot grant its priority to multiple
314    readers, a preempted low-priority reader will continue holding its lock,
315    thus starving even high-priority writers.  In contrast, because readers
316    can grant their priority to a writer, a preempted low-priority writer
317    will have its priority boosted until it releases the lock, thus
318    preventing that writer from starving readers.
319 
320 
321 PREEMPT_RT caveats
322 ==================
323 
324 local_lock on RT
325 ----------------
326 
327 The mapping of local_lock to spinlock_t on PREEMPT_RT kernels has a few
328 implications. For example, on a non-PREEMPT_RT kernel the following code
329 sequence works as expected::
330 
331   local_lock_irq(&local_lock);
332   raw_spin_lock(&lock);
333 
334 and is fully equivalent to::
335 
336    raw_spin_lock_irq(&lock);
337 
338 On a PREEMPT_RT kernel this code sequence breaks because local_lock_irq()
339 is mapped to a per-CPU spinlock_t which neither disables interrupts nor
340 preemption. The following code sequence works perfectly correct on both
341 PREEMPT_RT and non-PREEMPT_RT kernels::
342 
343   local_lock_irq(&local_lock);
344   spin_lock(&lock);
345 
346 Another caveat with local locks is that each local_lock has a specific
347 protection scope. So the following substitution is wrong::
348 
349   func1()
350   {
351     local_irq_save(flags);    -> local_lock_irqsave(&local_lock_1, flags);
352     func3();
353     local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock_1, flags);
354   }
355 
356   func2()
357   {
358     local_irq_save(flags);    -> local_lock_irqsave(&local_lock_2, flags);
359     func3();
360     local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock_2, flags);
361   }
362 
363   func3()
364   {
365     lockdep_assert_irqs_disabled();
366     access_protected_data();
367   }
368 
369 On a non-PREEMPT_RT kernel this works correctly, but on a PREEMPT_RT kernel
370 local_lock_1 and local_lock_2 are distinct and cannot serialize the callers
371 of func3(). Also the lockdep assert will trigger on a PREEMPT_RT kernel
372 because local_lock_irqsave() does not disable interrupts due to the
373 PREEMPT_RT-specific semantics of spinlock_t. The correct substitution is::
374 
375   func1()
376   {
377     local_irq_save(flags);    -> local_lock_irqsave(&local_lock, flags);
378     func3();
379     local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock, flags);
380   }
381 
382   func2()
383   {
384     local_irq_save(flags);    -> local_lock_irqsave(&local_lock, flags);
385     func3();
386     local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock, flags);
387   }
388 
389   func3()
390   {
391     lockdep_assert_held(&local_lock);
392     access_protected_data();
393   }
394 
395 
396 spinlock_t and rwlock_t
397 -----------------------
398 
399 The changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels
400 have a few implications.  For example, on a non-PREEMPT_RT kernel the
401 following code sequence works as expected::
402 
403    local_irq_disable();
404    spin_lock(&lock);
405 
406 and is fully equivalent to::
407 
408    spin_lock_irq(&lock);
409 
410 Same applies to rwlock_t and the _irqsave() suffix variants.
411 
412 On PREEMPT_RT kernel this code sequence breaks because RT-mutex requires a
413 fully preemptible context.  Instead, use spin_lock_irq() or
414 spin_lock_irqsave() and their unlock counterparts.  In cases where the
415 interrupt disabling and locking must remain separate, PREEMPT_RT offers a
416 local_lock mechanism.  Acquiring the local_lock pins the task to a CPU,
417 allowing things like per-CPU interrupt disabled locks to be acquired.
418 However, this approach should be used only where absolutely necessary.
419 
420 A typical scenario is protection of per-CPU variables in thread context::
421 
422   struct foo *p = get_cpu_ptr(&var1);
423 
424   spin_lock(&p->lock);
425   p->count += this_cpu_read(var2);
426 
427 This is correct code on a non-PREEMPT_RT kernel, but on a PREEMPT_RT kernel
428 this breaks. The PREEMPT_RT-specific change of spinlock_t semantics does
429 not allow to acquire p->lock because get_cpu_ptr() implicitly disables
430 preemption. The following substitution works on both kernels::
431 
432   struct foo *p;
433 
434   migrate_disable();
435   p = this_cpu_ptr(&var1);
436   spin_lock(&p->lock);
437   p->count += this_cpu_read(var2);
438 
439 migrate_disable() ensures that the task is pinned on the current CPU which
440 in turn guarantees that the per-CPU access to var1 and var2 are staying on
441 the same CPU while the task remains preemptible.
442 
443 The migrate_disable() substitution is not valid for the following
444 scenario::
445 
446   func()
447   {
448     struct foo *p;
449 
450     migrate_disable();
451     p = this_cpu_ptr(&var1);
452     p->val = func2();
453 
454 This breaks because migrate_disable() does not protect against reentrancy from
455 a preempting task. A correct substitution for this case is::
456 
457   func()
458   {
459     struct foo *p;
460 
461     local_lock(&foo_lock);
462     p = this_cpu_ptr(&var1);
463     p->val = func2();
464 
465 On a non-PREEMPT_RT kernel this protects against reentrancy by disabling
466 preemption. On a PREEMPT_RT kernel this is achieved by acquiring the
467 underlying per-CPU spinlock.
468 
469 
470 raw_spinlock_t on RT
471 --------------------
472 
473 Acquiring a raw_spinlock_t disables preemption and possibly also
474 interrupts, so the critical section must avoid acquiring a regular
475 spinlock_t or rwlock_t, for example, the critical section must avoid
476 allocating memory.  Thus, on a non-PREEMPT_RT kernel the following code
477 works perfectly::
478 
479   raw_spin_lock(&lock);
480   p = kmalloc(sizeof(*p), GFP_ATOMIC);
481 
482 But this code fails on PREEMPT_RT kernels because the memory allocator is
483 fully preemptible and therefore cannot be invoked from truly atomic
484 contexts.  However, it is perfectly fine to invoke the memory allocator
485 while holding normal non-raw spinlocks because they do not disable
486 preemption on PREEMPT_RT kernels::
487 
488   spin_lock(&lock);
489   p = kmalloc(sizeof(*p), GFP_ATOMIC);
490 
491 
492 bit spinlocks
493 -------------
494 
495 PREEMPT_RT cannot substitute bit spinlocks because a single bit is too
496 small to accommodate an RT-mutex.  Therefore, the semantics of bit
497 spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
498 caveats also apply to bit spinlocks.
499 
500 Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
501 using conditional (#ifdef'ed) code changes at the usage site.  In contrast,
502 usage-site changes are not needed for the spinlock_t substitution.
503 Instead, conditionals in header files and the core locking implementation
504 enable the compiler to do the substitution transparently.
505 
506 
507 Lock type nesting rules
508 =======================
509 
510 The most basic rules are:
511 
512   - Lock types of the same lock category (sleeping, CPU local, spinning)
513     can nest arbitrarily as long as they respect the general lock ordering
514     rules to prevent deadlocks.
515 
516   - Sleeping lock types cannot nest inside CPU local and spinning lock types.
517 
518   - CPU local and spinning lock types can nest inside sleeping lock types.
519 
520   - Spinning lock types can nest inside all lock types
521 
522 These constraints apply both in PREEMPT_RT and otherwise.
523 
524 The fact that PREEMPT_RT changes the lock category of spinlock_t and
525 rwlock_t from spinning to sleeping and substitutes local_lock with a
526 per-CPU spinlock_t means that they cannot be acquired while holding a raw
527 spinlock.  This results in the following nesting ordering:
528 
529   1) Sleeping locks
530   2) spinlock_t, rwlock_t, local_lock
531   3) raw_spinlock_t and bit spinlocks
532 
533 Lockdep will complain if these constraints are violated, both in
534 PREEMPT_RT and otherwise.

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php