1 .. _whatisrcu_doc: 2 3 What is RCU? -- "Read, Copy, Update" 4 ====================================== 5 6 Please note that the "What is RCU?" LWN series 7 to start learning about RCU: 8 9 | 1. What is RCU, Fundamentally? https://l 10 | 2. What is RCU? Part 2: Usage https://l 11 | 3. RCU part 3: the RCU API https://l 12 | 4. The RCU API, 2010 Edition https://l 13 | 2010 Big API Table https://l 14 | 5. The RCU API, 2014 Edition https://l 15 | 2014 Big API Table https://l 16 | 6. The RCU API, 2019 Edition https://l 17 | 2019 Big API Table https://l 18 19 For those preferring video: 20 21 | 1. Unraveling RCU Mysteries: Fundamentals 22 | 2. Unraveling RCU Mysteries: Additional U 23 24 25 What is RCU? 26 27 RCU is a synchronization mechanism that was ad 28 during the 2.5 development effort that is opti 29 situations. Although RCU is actually quite si 30 of it requires you to think differently about 31 of the problem is the mistaken assumption that 32 describe and to use RCU. Instead, the experie 33 people must take different paths to arrive at 34 depending on their experiences and use cases. 35 several different paths, as follows: 36 37 :ref:`1. RCU OVERVIEW <1_whatisRCU>` 38 39 :ref:`2. WHAT IS RCU'S CORE API? <2_wha 40 41 :ref:`3. WHAT ARE SOME EXAMPLE USES OF 42 43 :ref:`4. WHAT IF MY UPDATING THREAD CAN 44 45 :ref:`5. WHAT ARE SOME SIMPLE IMPLEMENT 46 47 :ref:`6. ANALOGY WITH READER-WRITER LOC 48 49 :ref:`7. ANALOGY WITH REFERENCE COUNTIN 50 51 :ref:`8. FULL LIST OF RCU APIs <8_whati 52 53 :ref:`9. ANSWERS TO QUICK QUIZZES <9_wh 54 55 People who prefer starting with a conceptual o 56 Section 1, though most readers will profit by 57 some point. People who prefer to start with a 58 experiment with should focus on Section 2. Pe 59 with example uses should focus on Sections 3 a 60 understand the RCU implementation should focus 61 into the kernel source code. People who reaso 62 focus on Section 6 and 7. Section 8 serves as 63 API documentation, and Section 9 is the tradit 64 65 So, start with the section that makes the most 66 preferred method of learning. If you need to 67 everything, feel free to read the whole thing 68 that type of person, you have perused the sour 69 never need this document anyway. ;-) 70 71 .. _1_whatisRCU: 72 73 1. RCU OVERVIEW 74 ---------------- 75 76 The basic idea behind RCU is to split updates 77 "reclamation" phases. The removal phase remov 78 within a data structure (possibly by replacing 79 new versions of these data items), and can run 80 The reason that it is safe to run the removal 81 readers is the semantics of modern CPUs guaran 82 either the old or the new version of the data 83 partially updated reference. The reclamation 84 (e.g., freeing) the data items removed from th 85 removal phase. Because reclaiming data items 86 concurrently referencing those data items, the 87 not start until readers no longer hold referen 88 89 Splitting the update into removal and reclamat 90 updater to perform the removal phase immediate 91 reclamation phase until all readers active dur 92 completed, either by blocking until they finis 93 callback that is invoked after they finish. O 94 during the removal phase need be considered, b 95 after the removal phase will be unable to gain 96 data items, and therefore cannot be disrupted 97 98 So the typical RCU update sequence goes someth 99 100 a. Remove pointers to a data structure, s 101 readers cannot gain a reference to it. 102 103 b. Wait for all previous readers to compl 104 critical sections. 105 106 c. At this point, there cannot be any rea 107 to the data structure, so it now may s 108 (e.g., kfree()d). 109 110 Step (b) above is the key idea underlying RCU' 111 The ability to wait until all readers are done 112 use much lighter-weight synchronization, in so 113 synchronization at all. In contrast, in more 114 schemes, readers must use heavy-weight synchro 115 prevent an updater from deleting the data stru 116 This is because lock-based updaters typically 117 and must therefore exclude readers. In contra 118 typically take advantage of the fact that writ 119 pointers are atomic on modern CPUs, allowing a 120 and replacement of data items in a linked stru 121 readers. Concurrent RCU readers can then cont 122 versions, and can dispense with the atomic ope 123 and communications cache misses that are so ex 124 SMP computer systems, even in absence of lock 125 126 In the three-step procedure shown above, the u 127 the removal and the reclamation step, but it i 128 entirely different thread to do the reclamatio 129 in the Linux kernel's directory-entry cache (d 130 thread performs both the update step (step (a) 131 step (step (c) above), it is often helpful to 132 For example, RCU readers and updaters need not 133 but RCU provides implicit low-overhead communi 134 and reclaimers, namely, in step (b) above. 135 136 So how the heck can a reclaimer tell when a re 137 that readers are not doing any sort of synchro 138 Read on to learn about how RCU's API makes thi 139 140 .. _2_whatisRCU: 141 142 2. WHAT IS RCU'S CORE API? 143 --------------------------- 144 145 The core RCU API is quite small: 146 147 a. rcu_read_lock() 148 b. rcu_read_unlock() 149 c. synchronize_rcu() / call_rcu() 150 d. rcu_assign_pointer() 151 e. rcu_dereference() 152 153 There are many other members of the RCU API, b 154 expressed in terms of these five, though most 155 express synchronize_rcu() in terms of the call 156 157 The five core RCU APIs are described below, th 158 later. See the kernel docbook documentation f 159 at the function header comments. 160 161 rcu_read_lock() 162 ^^^^^^^^^^^^^^^ 163 void rcu_read_lock(void); 164 165 This temporal primitive is used by a r 166 reclaimer that the reader is entering 167 section. It is illegal to block while 168 critical section, though kernels built 169 can preempt RCU read-side critical sec 170 data structure accessed during an RCU 171 is guaranteed to remain unreclaimed fo 172 critical section. Reference counts ma 173 with RCU to maintain longer-term refer 174 175 Note that anything that disables botto 176 or interrupts also enters an RCU read- 177 Acquiring a spinlock also enters an RC 178 sections, even for spinlocks that do n 179 as is the case in kernels built with C 180 Sleeplocks do *not* enter RCU read-sid 181 182 rcu_read_unlock() 183 ^^^^^^^^^^^^^^^^^ 184 void rcu_read_unlock(void); 185 186 This temporal primitives is used by a 187 reclaimer that the reader is exiting a 188 section. Anything that enables bottom 189 or interrupts also exits an RCU read-s 190 Releasing a spinlock also exits an RCU 191 192 Note that RCU read-side critical secti 193 overlapping. 194 195 synchronize_rcu() 196 ^^^^^^^^^^^^^^^^^ 197 void synchronize_rcu(void); 198 199 This temporal primitive marks the end 200 beginning of reclaimer code. It does 201 all pre-existing RCU read-side critica 202 have completed. Note that synchronize 203 necessarily wait for any subsequent RC 204 sections to complete. For example, co 205 sequence of events:: 206 207 CPU 0 CPU 1 208 ----------------- --------------- 209 1. rcu_read_lock() 210 2. enters synchron 211 3. 212 4. rcu_read_unlock() 213 5. exits synchron 214 6. 215 216 To reiterate, synchronize_rcu() waits 217 read-side critical sections to complet 218 any that begin after synchronize_rcu() 219 220 Of course, synchronize_rcu() does not 221 **immediately** after the last pre-exi 222 section completes. For one thing, the 223 delays. For another thing, many RCU i 224 requests in batches in order to improv 225 further delay synchronize_rcu(). 226 227 Since synchronize_rcu() is the API tha 228 readers are done, its implementation i 229 to be useful in all but the most read- 230 synchronize_rcu()'s overhead must also 231 232 The call_rcu() API is an asynchronous 233 synchronize_rcu(), and is described in 234 section. Instead of blocking, it regi 235 argument which are invoked after all o 236 critical sections have completed. Thi 237 particularly useful in situations wher 238 or where update-side performance is cr 239 240 However, the call_rcu() API should not 241 of the synchronize_rcu() API generally 242 In addition, the synchronize_rcu() API 243 of automatically limiting update rate 244 be delayed. This property results in 245 of denial-of-service attacks. Code us 246 update rate in order to gain this same 247 checklist.rst for some approaches to l 248 249 rcu_assign_pointer() 250 ^^^^^^^^^^^^^^^^^^^^ 251 void rcu_assign_pointer(p, typeof(p) v 252 253 Yes, rcu_assign_pointer() **is** imple 254 it would be cool to be able to declare 255 (And there has been some discussion of 256 to the C language, so who knows?) 257 258 The updater uses this spatial macro to 259 RCU-protected pointer, in order to saf 260 in value from the updater to the reade 261 opposed to temporal) macro. It does n 262 but it does provide any compiler direc 263 instructions required for a given comp 264 Its ordering properties are that of a 265 that is, any prior loads and stores re 266 structure are ordered before the store 267 to that structure. 268 269 Perhaps just as important, rcu_assign_ 270 (1) which pointers are protected by RC 271 a given structure becomes accessible t 272 rcu_assign_pointer() is most frequentl 273 the _rcu list-manipulation primitives 274 275 rcu_dereference() 276 ^^^^^^^^^^^^^^^^^ 277 typeof(p) rcu_dereference(p); 278 279 Like rcu_assign_pointer(), rcu_derefer 280 as a macro. 281 282 The reader uses the spatial rcu_derefe 283 an RCU-protected pointer, which return 284 then be safely dereferenced. Note tha 285 does not actually dereference the poin 286 protects the pointer for later derefer 287 executes any needed memory-barrier ins 288 CPU architecture. Currently, only Alp 289 within rcu_dereference() -- on other C 290 volatile load. However, no mainstream 291 address dependencies, so rcu_dereferen 292 which, in combination with the coding 293 rcu_dereference.rst, prevent current c 294 these dependencies. 295 296 Common coding practice uses rcu_derefe 297 RCU-protected pointer to a local varia 298 this local variable, for example as fo 299 300 p = rcu_dereference(head.next) 301 return p->data; 302 303 However, in this case, one could just 304 into one statement:: 305 306 return rcu_dereference(head.ne 307 308 If you are going to be fetching multip 309 RCU-protected structure, using the loc 310 course preferred. Repeated rcu_derefe 311 ugly, do not guarantee that the same p 312 if an update happened while in the cri 313 unnecessary overhead on Alpha CPUs. 314 315 Note that the value returned by rcu_de 316 only within the enclosing RCU read-sid 317 For example, the following is **not** 318 319 rcu_read_lock(); 320 p = rcu_dereference(head.next) 321 rcu_read_unlock(); 322 x = p->address; /* BUG!!! */ 323 rcu_read_lock(); 324 y = p->data; /* BUG!!! */ 325 rcu_read_unlock(); 326 327 Holding a reference from one RCU read- 328 to another is just as illegal as holdi 329 one lock-based critical section to ano 330 using a reference outside of the criti 331 it was acquired is just as illegal as 332 locking. 333 334 As with rcu_assign_pointer(), an impor 335 rcu_dereference() is to document which 336 RCU, in particular, flagging a pointer 337 at any time, including immediately aft 338 And, again like rcu_assign_pointer(), 339 typically used indirectly, via the _rc 340 primitives, such as list_for_each_entr 341 342 .. [1] The variant rcu_dereference_protec 343 of an RCU read-side critical section a 344 protected by locks acquired by the upd 345 avoids the lockdep warning that would 346 example) rcu_dereference() without rcu 347 Using rcu_dereference_protected() also 348 of permitting compiler optimizations t 349 must prohibit. The rcu_dereference_pr 350 a lockdep expression to indicate which 351 by the caller. If the indicated protec 352 a lockdep splat is emitted. See Desig 353 and the API's code comments for more d 354 355 .. [2] If the list_for_each_entry_rcu() i 356 update-side code as well as by RCU rea 357 lockdep expression can be added to its 358 For example, given an additional "lock 359 the RCU lockdep code would complain on 360 invoked outside of an RCU read-side cr 361 the protection of mylock. 362 363 The following diagram shows how each API commu 364 reader, updater, and reclaimer. 365 :: 366 367 368 rcu_assign_pointer() 369 +--------+ 370 +---------------------->| reader | 371 | +--------+ 372 | | 373 | | 374 | | 375 | | 376 | rcu_dereference() | 377 +---------+ | 378 | updater |<----------------+ 379 +---------+ 380 | 381 +--------------------------------- 382 383 Defer: 384 synchronize_rcu() & call_rcu() 385 386 387 The RCU infrastructure observes the temporal s 388 rcu_read_unlock(), synchronize_rcu(), and call 389 order to determine when (1) synchronize_rcu() 390 to their callers and (2) call_rcu() callbacks 391 implementations of the RCU infrastructure make 392 order to amortize their overhead over many use 393 The rcu_assign_pointer() and rcu_dereference() 394 spatial changes via stores to and loads from t 395 question. 396 397 There are at least three flavors of RCU usage 398 above shows the most common one. On the update 399 synchronize_rcu() and call_rcu() primitives us 400 flavors. However for protection (on the reader 401 depending on the flavor: 402 403 a. rcu_read_lock() / rcu_read_unlock() 404 rcu_dereference() 405 406 b. rcu_read_lock_bh() / rcu_read_unlock_b 407 local_bh_disable() / local_bh_enable() 408 rcu_dereference_bh() 409 410 c. rcu_read_lock_sched() / rcu_read_unloc 411 preempt_disable() / preempt_enable() 412 local_irq_save() / local_irq_restore() 413 hardirq enter / hardirq exit 414 NMI enter / NMI exit 415 rcu_dereference_sched() 416 417 These three flavors are used as follows: 418 419 a. RCU applied to normal data structures. 420 421 b. RCU applied to networking data structu 422 to remote denial-of-service attacks. 423 424 c. RCU applied to scheduler and interrupt 425 426 Again, most uses will be of (a). The (b) and 427 for specialized uses, but are relatively uncom 428 RCU-Tasks-Rude, and RCU-Tasks-Trace have simil 429 their assorted primitives. 430 431 .. _3_whatisRCU: 432 433 3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API 434 ---------------------------------------------- 435 436 This section shows a simple use of the core RC 437 global pointer to a dynamically allocated stru 438 uses of RCU may be found in listRCU.rst and NM 439 :: 440 441 struct foo { 442 int a; 443 char b; 444 long c; 445 }; 446 DEFINE_SPINLOCK(foo_mutex); 447 448 struct foo __rcu *gbl_foo; 449 450 /* 451 * Create a new struct foo that is the 452 * pointed to by gbl_foo, except that 453 * with "new_a". Points gbl_foo to th 454 * frees up the old structure after a 455 * 456 * Uses rcu_assign_pointer() to ensure 457 * see the initialized version of the 458 * 459 * Uses synchronize_rcu() to ensure th 460 * have references to the old structur 461 * the old structure. 462 */ 463 void foo_update_a(int new_a) 464 { 465 struct foo *new_fp; 466 struct foo *old_fp; 467 468 new_fp = kmalloc(sizeof(*new_f 469 spin_lock(&foo_mutex); 470 old_fp = rcu_dereference_prote 471 *new_fp = *old_fp; 472 new_fp->a = new_a; 473 rcu_assign_pointer(gbl_foo, ne 474 spin_unlock(&foo_mutex); 475 synchronize_rcu(); 476 kfree(old_fp); 477 } 478 479 /* 480 * Return the value of field "a" of th 481 * structure. Use rcu_read_lock() and 482 * to ensure that the structure does n 483 * from under us, and use rcu_derefere 484 * we see the initialized version of t 485 * for DEC Alpha and for people readin 486 */ 487 int foo_get_a(void) 488 { 489 int retval; 490 491 rcu_read_lock(); 492 retval = rcu_dereference(gbl_f 493 rcu_read_unlock(); 494 return retval; 495 } 496 497 So, to sum up: 498 499 - Use rcu_read_lock() and rcu_read_unloc 500 read-side critical sections. 501 502 - Within an RCU read-side critical secti 503 to dereference RCU-protected pointers. 504 505 - Use some solid design (such as locks o 506 keep concurrent updates from interferi 507 508 - Use rcu_assign_pointer() to update an 509 This primitive protects concurrent rea 510 **not** concurrent updates from each o 511 need to use locking (or something simi 512 rcu_assign_pointer() primitives from i 513 514 - Use synchronize_rcu() **after** removi 515 RCU-protected data structure, but **be 516 the data element, in order to wait for 517 RCU read-side critical sections that m 518 data item. 519 520 See checklist.rst for additional rules to foll 521 And again, more-typical uses of RCU may be fou 522 and NMI-RCU.rst. 523 524 .. _4_whatisRCU: 525 526 4. WHAT IF MY UPDATING THREAD CANNOT BLOCK? 527 -------------------------------------------- 528 529 In the example above, foo_update_a() blocks un 530 This is quite simple, but in some cases one ca 531 long -- there might be other high-priority wor 532 533 In such cases, one uses call_rcu() rather than 534 The call_rcu() API is as follows:: 535 536 void call_rcu(struct rcu_head *head, r 537 538 This function invokes func(head) after a grace 539 This invocation might happen from either softi 540 so the function is not permitted to block. Th 541 have an rcu_head structure added, perhaps as f 542 543 struct foo { 544 int a; 545 char b; 546 long c; 547 struct rcu_head rcu; 548 }; 549 550 The foo_update_a() function might then be writ 551 552 /* 553 * Create a new struct foo that is the 554 * pointed to by gbl_foo, except that 555 * with "new_a". Points gbl_foo to th 556 * frees up the old structure after a 557 * 558 * Uses rcu_assign_pointer() to ensure 559 * see the initialized version of the 560 * 561 * Uses call_rcu() to ensure that any 562 * references to the old structure com 563 * old structure. 564 */ 565 void foo_update_a(int new_a) 566 { 567 struct foo *new_fp; 568 struct foo *old_fp; 569 570 new_fp = kmalloc(sizeof(*new_f 571 spin_lock(&foo_mutex); 572 old_fp = rcu_dereference_prote 573 *new_fp = *old_fp; 574 new_fp->a = new_a; 575 rcu_assign_pointer(gbl_foo, ne 576 spin_unlock(&foo_mutex); 577 call_rcu(&old_fp->rcu, foo_rec 578 } 579 580 The foo_reclaim() function might appear as fol 581 582 void foo_reclaim(struct rcu_head *rp) 583 { 584 struct foo *fp = container_of( 585 586 foo_cleanup(fp->a); 587 588 kfree(fp); 589 } 590 591 The container_of() primitive is a macro that, 592 struct, the type of the struct, and the pointe 593 struct, returns a pointer to the beginning of 594 595 The use of call_rcu() permits the caller of fo 596 immediately regain control, without needing to 597 old version of the newly updated element. It 598 RCU distinction between updater, namely foo_up 599 namely foo_reclaim(). 600 601 The summary of advice is the same as for the p 602 that we are now using call_rcu() rather than s 603 604 - Use call_rcu() **after** removing a da 605 RCU-protected data structure in order 606 function that will be invoked after th 607 read-side critical sections that might 608 data item. 609 610 If the callback for call_rcu() is not doing an 611 kfree() on the structure, you can use kfree_rc 612 to avoid having to write your own callback:: 613 614 kfree_rcu(old_fp, rcu); 615 616 If the occasional sleep is permitted, the sing 617 be used, omitting the rcu_head structure from 618 619 kfree_rcu_mightsleep(old_fp); 620 621 This variant almost never blocks, but might do 622 synchronize_rcu() in response to memory-alloca 623 624 Again, see checklist.rst for additional rules 625 626 .. _5_whatisRCU: 627 628 5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RC 629 ---------------------------------------------- 630 631 One of the nice things about RCU is that it ha 632 implementations that are a good first step tow 633 production-quality implementations in the Linu 634 presents two such "toy" implementations of RCU 635 in terms of familiar locking primitives, and a 636 resembles "classic" RCU. Both are way too sim 637 lacking both functionality and performance. H 638 in getting a feel for how RCU works. See kern 639 production-quality implementation, and see: 640 641 https://docs.google.com/document/d/1X0 642 643 for papers describing the Linux kernel RCU imp 644 and OLS'02 papers are a good introduction, and 645 more details on the current implementation as 646 647 648 5A. "TOY" IMPLEMENTATION #1: LOCKING 649 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 650 This section presents a "toy" RCU implementati 651 familiar locking primitives. Its overhead mak 652 real-life use, as does its lack of scalability 653 for realtime use, since it allows scheduling l 654 one read-side critical section to another. It 655 reader-writer locks: If you try this with non 656 you allow nested rcu_read_lock() calls, you ca 657 658 However, it is probably the easiest implementa 659 a good starting point. 660 661 It is extremely simple:: 662 663 static DEFINE_RWLOCK(rcu_gp_mutex); 664 665 void rcu_read_lock(void) 666 { 667 read_lock(&rcu_gp_mutex); 668 } 669 670 void rcu_read_unlock(void) 671 { 672 read_unlock(&rcu_gp_mutex); 673 } 674 675 void synchronize_rcu(void) 676 { 677 write_lock(&rcu_gp_mutex); 678 smp_mb__after_spinlock(); 679 write_unlock(&rcu_gp_mutex); 680 } 681 682 [You can ignore rcu_assign_pointer() and rcu_d 683 much. But here are simplified versions anyway 684 don't forget about them when submitting patche 685 686 #define rcu_assign_pointer(p, v) \ 687 ({ \ 688 smp_store_release(&(p), (v)); 689 }) 690 691 #define rcu_dereference(p) \ 692 ({ \ 693 typeof(p) _________p1 = READ_O 694 (_________p1); \ 695 }) 696 697 698 The rcu_read_lock() and rcu_read_unlock() prim 699 and release a global reader-writer lock. The 700 primitive write-acquires this same lock, then 701 that once synchronize_rcu() exits, all RCU rea 702 that were in progress before synchronize_rcu() 703 to have completed -- there is no way that sync 704 been able to write-acquire the lock otherwise. 705 promotes synchronize_rcu() to a full memory ba 706 the "Memory-Barrier Guarantees" listed in: 707 708 Design/Requirements/Requirements.rst 709 710 It is possible to nest rcu_read_lock(), since 711 be recursively acquired. Note also that rcu_r 712 from deadlock (an important property of RCU). 713 that the only thing that can block rcu_read_lo 714 But synchronize_rcu() does not acquire any loc 715 so there can be no deadlock cycle. 716 717 .. _quiz_1: 718 719 Quick Quiz #1: 720 Why is this argument naive? H 721 occur when using this algorith 722 kernel? How could this deadlo 723 724 :ref:`Answers to Quick Quiz <9_whatisRCU>` 725 726 5B. "TOY" EXAMPLE #2: CLASSIC RCU 727 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 728 This section presents a "toy" RCU implementati 729 "classic RCU". It is also short on performanc 730 on features such as hotplug CPU and the abilit 731 kernels. The definitions of rcu_dereference() 732 are the same as those shown in the preceding s 733 :: 734 735 void rcu_read_lock(void) { } 736 737 void rcu_read_unlock(void) { } 738 739 void synchronize_rcu(void) 740 { 741 int cpu; 742 743 for_each_possible_cpu(cpu) 744 run_on(cpu); 745 } 746 747 Note that rcu_read_lock() and rcu_read_unlock( 748 This is the great strength of classic RCU in a 749 read-side overhead is precisely zero, at least 750 And there is absolutely no way that rcu_read_l 751 participate in a deadlock cycle! 752 753 The implementation of synchronize_rcu() simply 754 CPU in turn. The run_on() primitive can be im 755 in terms of the sched_setaffinity() primitive. 756 "toy" implementation would restore the affinit 757 than just leaving all tasks running on the las 758 "toy", I meant **toy**! 759 760 So how the heck is this supposed to work??? 761 762 Remember that it is illegal to block while in 763 section. Therefore, if a given CPU executes a 764 that it must have completed all preceding RCU 765 Once **all** CPUs have executed a context swit 766 RCU read-side critical sections will have comp 767 768 So, suppose that we remove a data item from it 769 synchronize_rcu(). Once synchronize_rcu() ret 770 that there are no RCU read-side critical secti 771 to that data item, so we can safely reclaim it 772 773 .. _quiz_2: 774 775 Quick Quiz #2: 776 Give an example where Classic 777 overhead is **negative**. 778 779 :ref:`Answers to Quick Quiz <9_whatisRCU>` 780 781 .. _quiz_3: 782 783 Quick Quiz #3: 784 If it is illegal to block in a 785 critical section, what the hec 786 CONFIG_PREEMPT_RT, where norma 787 788 :ref:`Answers to Quick Quiz <9_whatisRCU>` 789 790 .. _6_whatisRCU: 791 792 6. ANALOGY WITH READER-WRITER LOCKING 793 -------------------------------------- 794 795 Although RCU can be used in many different way 796 RCU is analogous to reader-writer locking. Th 797 diff shows how closely related RCU and reader- 798 :: 799 800 @@ -5,5 +5,5 @@ struct el { 801 int data; 802 /* Other data fields */ 803 }; 804 -rwlock_t listmutex; 805 +spinlock_t listmutex; 806 struct el head; 807 808 @@ -13,15 +14,15 @@ 809 struct list_head *lp; 810 struct el *p; 811 812 - read_lock(&listmutex); 813 - list_for_each_entry(p, head, l 814 + rcu_read_lock(); 815 + list_for_each_entry_rcu(p, hea 816 if (p->key == key) { 817 *result = p->d 818 - read_unlock(&l 819 + rcu_read_unloc 820 return 1; 821 } 822 } 823 - read_unlock(&listmutex); 824 + rcu_read_unlock(); 825 return 0; 826 } 827 828 @@ -29,15 +30,16 @@ 829 { 830 struct el *p; 831 832 - write_lock(&listmutex); 833 + spin_lock(&listmutex); 834 list_for_each_entry(p, head, l 835 if (p->key == key) { 836 - list_del(&p->l 837 - write_unlock(& 838 + list_del_rcu(& 839 + spin_unlock(&l 840 + synchronize_rc 841 kfree(p); 842 return 1; 843 } 844 } 845 - write_unlock(&listmutex); 846 + spin_unlock(&listmutex); 847 return 0; 848 } 849 850 Or, for those who prefer a side-by-side listin 851 852 1 struct el { 1 stru 853 2 struct list_head list; 2 st 854 3 long key; 3 lo 855 4 spinlock_t mutex; 4 sp 856 5 int data; 5 in 857 6 /* Other data fields */ 6 /* 858 7 }; 7 }; 859 8 rwlock_t listmutex; 8 spin 860 9 struct el head; 9 stru 861 862 :: 863 864 1 int search(long key, int *result) 1 int 865 2 { 2 { 866 3 struct list_head *lp; 3 s 867 4 struct el *p; 4 s 868 5 5 869 6 read_lock(&listmutex); 6 r 870 7 list_for_each_entry(p, head, lp) { 7 l 871 8 if (p->key == key) { 8 872 9 *result = p->data; 9 873 10 read_unlock(&listmutex); 10 874 11 return 1; 11 875 12 } 12 876 13 } 13 } 877 14 read_unlock(&listmutex); 14 r 878 15 return 0; 15 r 879 16 } 16 } 880 881 :: 882 883 1 int delete(long key) 1 int 884 2 { 2 { 885 3 struct el *p; 3 s 886 4 4 887 5 write_lock(&listmutex); 5 s 888 6 list_for_each_entry(p, head, lp) { 6 l 889 7 if (p->key == key) { 7 890 8 list_del(&p->list); 8 891 9 write_unlock(&listmutex); 9 892 10 893 10 kfree(p); 11 894 11 return 1; 12 895 12 } 13 896 13 } 14 } 897 14 write_unlock(&listmutex); 15 s 898 15 return 0; 16 r 899 16 } 17 } 900 901 Either way, the differences are quite small. 902 to rcu_read_lock() and rcu_read_unlock, update 903 a reader-writer lock to a simple spinlock, and 904 precedes the kfree(). 905 906 However, there is one potential catch: the rea 907 critical sections can now run concurrently. I 908 not be a problem, but it is necessary to check 909 For example, if multiple independent list upda 910 a single atomic update, converting to RCU will 911 912 Also, the presence of synchronize_rcu() means 913 delete() can now block. If this is a problem, 914 mechanism that never blocks, namely call_rcu() 915 be used in place of synchronize_rcu(). 916 917 .. _7_whatisRCU: 918 919 7. ANALOGY WITH REFERENCE COUNTING 920 ----------------------------------- 921 922 The reader-writer analogy (illustrated by the 923 always the best way to think about using RCU. 924 considers RCU an effective reference count on 925 protected by RCU. 926 927 A reference count typically does not prevent t 928 values from changing, but does prevent changes 929 gross change of type that happens when that ob 930 re-allocated for some other purpose. Once a t 931 object is obtained, some other mechanism is ne 932 access to the data in the object. This could 933 but with RCU the typical approach is to perfor 934 operations such as smp_load_acquire(), to perf 935 read-modify-write operations, and to provide t 936 RCU provides a number of support functions tha 937 operations and ordering, such as the list_for_ 938 used in the previous section. 939 940 A more focused view of the reference counting 941 between rcu_read_lock() and rcu_read_unlock(), 942 rcu_dereference() on a pointer marked as ``__r 943 though a reference-count on that object has be 944 This prevents the object from changing type. 945 will depend on normal expectations of objects 946 typically includes that spinlocks can still be 947 reference counters can be safely manipulated, 948 can be safely dereferenced. 949 950 Some operations that one might expect to see o 951 which an RCU reference is held include: 952 953 - Copying out data that is guaranteed to be s 954 - Using kref_get_unless_zero() or similar to 955 reference. This may fail of course. 956 - Acquiring a spinlock in the object, and che 957 is the expected object and if so, manipulat 958 959 The understanding that RCU provides a referenc 960 change of type is particularly visible with ob 961 slab cache marked ``SLAB_TYPESAFE_BY_RCU``. R 962 reference to an object from such a cache that 963 and the memory reallocated to a completely dif 964 the same type. In this case RCU doesn't even 965 object from changing, only its type. So the o 966 one expected, but it will be one where it is s 967 (and then potentially acquiring a spinlock), a 968 to check whether the identity matches expectat 969 to simply acquire the spinlock without first t 970 unfortunately any spinlock in a ``SLAB_TYPESAF 971 initialized after each and every call to kmem_ 972 reference-free spinlock acquisition completely 973 using ``SLAB_TYPESAFE_BY_RCU``, make proper us 974 (Those willing to initialize their locks in a 975 may also use locking, including cache-friendly 976 977 With traditional reference counting -- such as 978 kref library in Linux -- there is typically co 979 reference to an object is dropped. With kref, 980 passed to kref_put(). When RCU is being used, 981 must not be run until all ``__rcu`` pointers r 982 been updated, and then a grace period has pass 983 globally visible pointer to the object must be 984 potential counted reference, and the finalizat 985 using call_rcu() only after all those pointers 986 987 To see how to choose between these two analogi 988 reader-writer lock and RCU as a reference coun 989 to reflect on the scale of the thing being pro 990 lock analogy looks at larger multi-part object 991 and shows how RCU can facilitate concurrency w 992 to, and removed from, the list. The reference 993 the individual objects and looks at how they c 994 within whatever whole they are a part of. 995 996 .. _8_whatisRCU: 997 998 8. FULL LIST OF RCU APIs 999 ------------------------- 1000 1001 The RCU APIs are documented in docbook-format 1002 Linux-kernel source code, but it helps to hav 1003 APIs, since there does not appear to be a way 1004 in docbook. Here is the list, by category. 1005 1006 RCU list traversal:: 1007 1008 list_entry_rcu 1009 list_entry_lockless 1010 list_first_entry_rcu 1011 list_next_rcu 1012 list_for_each_entry_rcu 1013 list_for_each_entry_continue_rcu 1014 list_for_each_entry_from_rcu 1015 list_first_or_null_rcu 1016 list_next_or_null_rcu 1017 hlist_first_rcu 1018 hlist_next_rcu 1019 hlist_pprev_rcu 1020 hlist_for_each_entry_rcu 1021 hlist_for_each_entry_rcu_bh 1022 hlist_for_each_entry_from_rcu 1023 hlist_for_each_entry_continue_rcu 1024 hlist_for_each_entry_continue_rcu_bh 1025 hlist_nulls_first_rcu 1026 hlist_nulls_for_each_entry_rcu 1027 hlist_bl_first_rcu 1028 hlist_bl_for_each_entry_rcu 1029 1030 RCU pointer/list update:: 1031 1032 rcu_assign_pointer 1033 list_add_rcu 1034 list_add_tail_rcu 1035 list_del_rcu 1036 list_replace_rcu 1037 hlist_add_behind_rcu 1038 hlist_add_before_rcu 1039 hlist_add_head_rcu 1040 hlist_add_tail_rcu 1041 hlist_del_rcu 1042 hlist_del_init_rcu 1043 hlist_replace_rcu 1044 list_splice_init_rcu 1045 list_splice_tail_init_rcu 1046 hlist_nulls_del_init_rcu 1047 hlist_nulls_del_rcu 1048 hlist_nulls_add_head_rcu 1049 hlist_bl_add_head_rcu 1050 hlist_bl_del_init_rcu 1051 hlist_bl_del_rcu 1052 hlist_bl_set_first_rcu 1053 1054 RCU:: 1055 1056 Critical sections Grace period 1057 1058 rcu_read_lock synchronize_n 1059 rcu_read_unlock synchronize_r 1060 rcu_dereference synchronize_r 1061 rcu_read_lock_held call_rcu 1062 rcu_dereference_check kfree_rcu 1063 rcu_dereference_protected 1064 1065 bh:: 1066 1067 Critical sections Grace period 1068 1069 rcu_read_lock_bh call_rcu 1070 rcu_read_unlock_bh synchronize_r 1071 [local_bh_disable] synchronize_r 1072 [and friends] 1073 rcu_dereference_bh 1074 rcu_dereference_bh_check 1075 rcu_dereference_bh_protected 1076 rcu_read_lock_bh_held 1077 1078 sched:: 1079 1080 Critical sections Grace period 1081 1082 rcu_read_lock_sched call_rcu 1083 rcu_read_unlock_sched synchronize_r 1084 [preempt_disable] synchronize_r 1085 [and friends] 1086 rcu_read_lock_sched_notrace 1087 rcu_read_unlock_sched_notrace 1088 rcu_dereference_sched 1089 rcu_dereference_sched_check 1090 rcu_dereference_sched_protected 1091 rcu_read_lock_sched_held 1092 1093 1094 RCU-Tasks:: 1095 1096 Critical sections Grace period 1097 1098 N/A call_rcu_task 1099 synchronize_r 1100 1101 1102 RCU-Tasks-Rude:: 1103 1104 Critical sections Grace period 1105 1106 N/A call_rcu_task 1107 synchronize_r 1108 1109 1110 RCU-Tasks-Trace:: 1111 1112 Critical sections Grace period 1113 1114 rcu_read_lock_trace call_rcu_task 1115 rcu_read_unlock_trace synchronize_r 1116 1117 1118 SRCU:: 1119 1120 Critical sections Grace period 1121 1122 srcu_read_lock call_srcu 1123 srcu_read_unlock synchronize_s 1124 srcu_dereference synchronize_s 1125 srcu_dereference_check 1126 srcu_read_lock_held 1127 1128 SRCU: Initialization/cleanup:: 1129 1130 DEFINE_SRCU 1131 DEFINE_STATIC_SRCU 1132 init_srcu_struct 1133 cleanup_srcu_struct 1134 1135 All: lockdep-checked RCU utility APIs:: 1136 1137 RCU_LOCKDEP_WARN 1138 rcu_sleep_check 1139 1140 All: Unchecked RCU-protected pointer access:: 1141 1142 rcu_dereference_raw 1143 1144 All: Unchecked RCU-protected pointer access w 1145 1146 rcu_access_pointer 1147 1148 See the comment headers in the source code (o 1149 from them) for more information. 1150 1151 However, given that there are no fewer than f 1152 in the Linux kernel, how do you choose which 1153 list can be helpful: 1154 1155 a. Will readers need to block? If so, y 1156 1157 b. Will readers need to block and are yo 1158 example, ftrace or BPF? If so, you n 1159 RCU-tasks-rude, and/or RCU-tasks-trac 1160 1161 c. What about the -rt patchset? If read 1162 an non-rt kernel, you need SRCU. If 1163 acquiring spinlocks in a -rt kernel, 1164 SRCU is not necessary. (The -rt patc 1165 sleeplocks, hence this distinction.) 1166 1167 d. Do you need to treat NMI handlers, ha 1168 and code segments with preemption dis 1169 via preempt_disable(), local_irq_save 1170 or some other mechanism) as if they w 1171 If so, RCU-sched readers are the only 1172 for you, but since about v4.20 you us 1173 update primitives. 1174 1175 e. Do you need RCU grace periods to comp 1176 softirq monopolization of one or more 1177 is your code subject to network-based 1178 If so, you should disable softirq acr 1179 example, by using rcu_read_lock_bh(). 1180 use can use the vanilla RCU update pr 1181 1182 f. Is your workload too update-intensive 1183 RCU, but inappropriate for other sync 1184 If so, consider SLAB_TYPESAFE_BY_RCU 1185 named SLAB_DESTROY_BY_RCU). But plea 1186 1187 g. Do you need read-side critical sectio 1188 on CPUs that are deep in the idle loo 1189 from user-mode execution, or on an of 1190 and RCU Tasks Trace are the only choi 1191 with SRCU being strongly preferred in 1192 1193 h. Otherwise, use RCU. 1194 1195 Of course, this all assumes that you have det 1196 the right tool for your job. 1197 1198 .. _9_whatisRCU: 1199 1200 9. ANSWERS TO QUICK QUIZZES 1201 ---------------------------- 1202 1203 Quick Quiz #1: 1204 Why is this argument naive? 1205 occur when using this algorit 1206 kernel? [Referring to the lo 1207 algorithm.] 1208 1209 Answer: 1210 Consider the following sequen 1211 1212 1. CPU 0 acquires some u 1213 "problematic_lock", d 1214 spin_lock_irqsave(). 1215 1216 2. CPU 1 enters synchron 1217 rcu_gp_mutex. 1218 1219 3. CPU 0 enters rcu_read 1220 because CPU 1 holds r 1221 1222 4. CPU 1 is interrupted, 1223 attempts to acquire p 1224 1225 The system is now deadlocked. 1226 1227 One way to avoid this deadloc 1228 that of CONFIG_PREEMPT_RT, wh 1229 become blocking locks, and al 1230 the context of special tasks. 1231 above, the irq handler would 1232 release rcu_gp_mutex, avoidin 1233 1234 Even in the absence of deadlo 1235 allows latency to "bleed" fro 1236 readers through synchronize_r 1237 consider task A in an RCU rea 1238 (thus read-holding rcu_gp_mut 1239 attempting to write-acquire r 1240 task C blocked in rcu_read_lo 1241 read_acquire rcu_gp_mutex. T 1242 latency is holding up task C, 1243 task B. 1244 1245 Realtime RCU implementations 1246 approach where tasks in RCU r 1247 cannot be blocked by tasks ex 1248 1249 :ref:`Back to Quick Quiz #1 <quiz_1>` 1250 1251 Quick Quiz #2: 1252 Give an example where Classic 1253 overhead is **negative**. 1254 1255 Answer: 1256 Imagine a single-CPU system w 1257 kernel where a routing table 1258 code, but can be updated by i 1259 by an "ICMP REDIRECT" packet) 1260 this would be to have the pro 1261 interrupts while searching th 1262 RCU allows such interrupt-dis 1263 Thus, without RCU, you pay th 1264 and with RCU you don't. 1265 1266 One can argue that the overhe 1267 case is negative with respect 1268 interrupt-disabling approach. 1269 the overhead of RCU is merely 1270 the positive overhead of the 1271 with the zero-overhead RCU sc 1272 negative overhead. 1273 1274 In real life, of course, thin 1275 even the theoretical possibil 1276 a synchronization primitive i 1277 1278 :ref:`Back to Quick Quiz #2 <quiz_2>` 1279 1280 Quick Quiz #3: 1281 If it is illegal to block in 1282 critical section, what the he 1283 CONFIG_PREEMPT_RT, where norm 1284 1285 Answer: 1286 Just as CONFIG_PREEMPT_RT per 1287 critical sections, it permits 1288 read-side critical sections. 1289 spinlocks blocking while in R 1290 sections. 1291 1292 Why the apparent inconsistenc 1293 possible to use priority boos 1294 grace periods short if need b 1295 short of memory). In contras 1296 for (say) network reception, 1297 what should be boosted. Espe 1298 process we need to boost migh 1299 who just went out for a pizza 1300 a computer-operated cattle pr 1301 interest, it might also provo 1302 Besides, how does the compute 1303 the human being went to??? 1304 1305 :ref:`Back to Quick Quiz #3 <quiz_3>` 1306 1307 ACKNOWLEDGEMENTS 1308 1309 My thanks to the people who helped make this 1310 Jon Walpole, Josh Triplett, Serge Hallyn, Suz 1311 1312 1313 For more information, see http://www.rdrop.co
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.