~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/RCU/whatisRCU.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/RCU/whatisRCU.rst (Version linux-6.11.5) and /Documentation/RCU/whatisRCU.rst (Version linux-5.16.20)


  1 .. _whatisrcu_doc:                                  1 .. _whatisrcu_doc:
  2                                                     2 
  3 What is RCU?  --  "Read, Copy, Update"              3 What is RCU?  --  "Read, Copy, Update"
  4 ======================================              4 ======================================
  5                                                     5 
  6 Please note that the "What is RCU?" LWN series      6 Please note that the "What is RCU?" LWN series is an excellent place
  7 to start learning about RCU:                        7 to start learning about RCU:
  8                                                     8 
  9 | 1.    What is RCU, Fundamentally?  https://l !!   9 | 1.    What is RCU, Fundamentally?  http://lwn.net/Articles/262464/
 10 | 2.    What is RCU? Part 2: Usage   https://l !!  10 | 2.    What is RCU? Part 2: Usage   http://lwn.net/Articles/263130/
 11 | 3.    RCU part 3: the RCU API      https://l !!  11 | 3.    RCU part 3: the RCU API      http://lwn.net/Articles/264090/
 12 | 4.    The RCU API, 2010 Edition    https://l !!  12 | 4.    The RCU API, 2010 Edition    http://lwn.net/Articles/418853/
 13 |       2010 Big API Table           https://l !!  13 |       2010 Big API Table           http://lwn.net/Articles/419086/
 14 | 5.    The RCU API, 2014 Edition    https://l !!  14 | 5.    The RCU API, 2014 Edition    http://lwn.net/Articles/609904/
 15 |       2014 Big API Table           https://l !!  15 |       2014 Big API Table           http://lwn.net/Articles/609973/
 16 | 6.    The RCU API, 2019 Edition    https://l << 
 17 |       2019 Big API Table           https://l << 
 18                                                << 
 19 For those preferring video:                    << 
 20                                                << 
 21 | 1.    Unraveling RCU Mysteries: Fundamentals << 
 22 | 2.    Unraveling RCU Mysteries: Additional U << 
 23                                                    16 
 24                                                    17 
 25 What is RCU?                                       18 What is RCU?
 26                                                    19 
 27 RCU is a synchronization mechanism that was ad     20 RCU is a synchronization mechanism that was added to the Linux kernel
 28 during the 2.5 development effort that is opti     21 during the 2.5 development effort that is optimized for read-mostly
 29 situations.  Although RCU is actually quite si !!  22 situations.  Although RCU is actually quite simple once you understand it,
 30 of it requires you to think differently about  !!  23 getting there can sometimes be a challenge.  Part of the problem is that
 31 of the problem is the mistaken assumption that !!  24 most of the past descriptions of RCU have been written with the mistaken
 32 describe and to use RCU.  Instead, the experie !!  25 assumption that there is "one true way" to describe RCU.  Instead,
 33 people must take different paths to arrive at  !!  26 the experience has been that different people must take different paths
 34 depending on their experiences and use cases.  !!  27 to arrive at an understanding of RCU.  This document provides several
 35 several different paths, as follows:           !!  28 different paths, as follows:
 36                                                    29 
 37 :ref:`1.        RCU OVERVIEW <1_whatisRCU>`        30 :ref:`1.        RCU OVERVIEW <1_whatisRCU>`
 38                                                    31 
 39 :ref:`2.        WHAT IS RCU'S CORE API? <2_wha     32 :ref:`2.        WHAT IS RCU'S CORE API? <2_whatisRCU>`
 40                                                    33 
 41 :ref:`3.        WHAT ARE SOME EXAMPLE USES OF      34 :ref:`3.        WHAT ARE SOME EXAMPLE USES OF CORE RCU API? <3_whatisRCU>`
 42                                                    35 
 43 :ref:`4.        WHAT IF MY UPDATING THREAD CAN     36 :ref:`4.        WHAT IF MY UPDATING THREAD CANNOT BLOCK? <4_whatisRCU>`
 44                                                    37 
 45 :ref:`5.        WHAT ARE SOME SIMPLE IMPLEMENT     38 :ref:`5.        WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? <5_whatisRCU>`
 46                                                    39 
 47 :ref:`6.        ANALOGY WITH READER-WRITER LOC     40 :ref:`6.        ANALOGY WITH READER-WRITER LOCKING <6_whatisRCU>`
 48                                                    41 
 49 :ref:`7.        ANALOGY WITH REFERENCE COUNTIN !!  42 :ref:`7.        FULL LIST OF RCU APIs <7_whatisRCU>`
 50                                                << 
 51 :ref:`8.        FULL LIST OF RCU APIs <8_whati << 
 52                                                    43 
 53 :ref:`9.        ANSWERS TO QUICK QUIZZES <9_wh !!  44 :ref:`8.        ANSWERS TO QUICK QUIZZES <8_whatisRCU>`
 54                                                    45 
 55 People who prefer starting with a conceptual o     46 People who prefer starting with a conceptual overview should focus on
 56 Section 1, though most readers will profit by      47 Section 1, though most readers will profit by reading this section at
 57 some point.  People who prefer to start with a     48 some point.  People who prefer to start with an API that they can then
 58 experiment with should focus on Section 2.  Pe     49 experiment with should focus on Section 2.  People who prefer to start
 59 with example uses should focus on Sections 3 a     50 with example uses should focus on Sections 3 and 4.  People who need to
 60 understand the RCU implementation should focus     51 understand the RCU implementation should focus on Section 5, then dive
 61 into the kernel source code.  People who reaso     52 into the kernel source code.  People who reason best by analogy should
 62 focus on Section 6 and 7.  Section 8 serves as !!  53 focus on Section 6.  Section 7 serves as an index to the docbook API
 63 API documentation, and Section 9 is the tradit !!  54 documentation, and Section 8 is the traditional answer key.
 64                                                    55 
 65 So, start with the section that makes the most     56 So, start with the section that makes the most sense to you and your
 66 preferred method of learning.  If you need to      57 preferred method of learning.  If you need to know everything about
 67 everything, feel free to read the whole thing      58 everything, feel free to read the whole thing -- but if you are really
 68 that type of person, you have perused the sour     59 that type of person, you have perused the source code and will therefore
 69 never need this document anyway.  ;-)              60 never need this document anyway.  ;-)
 70                                                    61 
 71 .. _1_whatisRCU:                                   62 .. _1_whatisRCU:
 72                                                    63 
 73 1.  RCU OVERVIEW                                   64 1.  RCU OVERVIEW
 74 ----------------                                   65 ----------------
 75                                                    66 
 76 The basic idea behind RCU is to split updates      67 The basic idea behind RCU is to split updates into "removal" and
 77 "reclamation" phases.  The removal phase remov     68 "reclamation" phases.  The removal phase removes references to data items
 78 within a data structure (possibly by replacing     69 within a data structure (possibly by replacing them with references to
 79 new versions of these data items), and can run     70 new versions of these data items), and can run concurrently with readers.
 80 The reason that it is safe to run the removal      71 The reason that it is safe to run the removal phase concurrently with
 81 readers is the semantics of modern CPUs guaran     72 readers is the semantics of modern CPUs guarantee that readers will see
 82 either the old or the new version of the data      73 either the old or the new version of the data structure rather than a
 83 partially updated reference.  The reclamation      74 partially updated reference.  The reclamation phase does the work of reclaiming
 84 (e.g., freeing) the data items removed from th     75 (e.g., freeing) the data items removed from the data structure during the
 85 removal phase.  Because reclaiming data items      76 removal phase.  Because reclaiming data items can disrupt any readers
 86 concurrently referencing those data items, the     77 concurrently referencing those data items, the reclamation phase must
 87 not start until readers no longer hold referen     78 not start until readers no longer hold references to those data items.
 88                                                    79 
 89 Splitting the update into removal and reclamat     80 Splitting the update into removal and reclamation phases permits the
 90 updater to perform the removal phase immediate     81 updater to perform the removal phase immediately, and to defer the
 91 reclamation phase until all readers active dur     82 reclamation phase until all readers active during the removal phase have
 92 completed, either by blocking until they finis     83 completed, either by blocking until they finish or by registering a
 93 callback that is invoked after they finish.  O     84 callback that is invoked after they finish.  Only readers that are active
 94 during the removal phase need be considered, b     85 during the removal phase need be considered, because any reader starting
 95 after the removal phase will be unable to gain     86 after the removal phase will be unable to gain a reference to the removed
 96 data items, and therefore cannot be disrupted      87 data items, and therefore cannot be disrupted by the reclamation phase.
 97                                                    88 
 98 So the typical RCU update sequence goes someth     89 So the typical RCU update sequence goes something like the following:
 99                                                    90 
100 a.      Remove pointers to a data structure, s     91 a.      Remove pointers to a data structure, so that subsequent
101         readers cannot gain a reference to it.     92         readers cannot gain a reference to it.
102                                                    93 
103 b.      Wait for all previous readers to compl     94 b.      Wait for all previous readers to complete their RCU read-side
104         critical sections.                         95         critical sections.
105                                                    96 
106 c.      At this point, there cannot be any rea     97 c.      At this point, there cannot be any readers who hold references
107         to the data structure, so it now may s     98         to the data structure, so it now may safely be reclaimed
108         (e.g., kfree()d).                          99         (e.g., kfree()d).
109                                                   100 
110 Step (b) above is the key idea underlying RCU'    101 Step (b) above is the key idea underlying RCU's deferred destruction.
111 The ability to wait until all readers are done    102 The ability to wait until all readers are done allows RCU readers to
112 use much lighter-weight synchronization, in so    103 use much lighter-weight synchronization, in some cases, absolutely no
113 synchronization at all.  In contrast, in more     104 synchronization at all.  In contrast, in more conventional lock-based
114 schemes, readers must use heavy-weight synchro    105 schemes, readers must use heavy-weight synchronization in order to
115 prevent an updater from deleting the data stru    106 prevent an updater from deleting the data structure out from under them.
116 This is because lock-based updaters typically     107 This is because lock-based updaters typically update data items in place,
117 and must therefore exclude readers.  In contra    108 and must therefore exclude readers.  In contrast, RCU-based updaters
118 typically take advantage of the fact that writ    109 typically take advantage of the fact that writes to single aligned
119 pointers are atomic on modern CPUs, allowing a    110 pointers are atomic on modern CPUs, allowing atomic insertion, removal,
120 and replacement of data items in a linked stru    111 and replacement of data items in a linked structure without disrupting
121 readers.  Concurrent RCU readers can then cont    112 readers.  Concurrent RCU readers can then continue accessing the old
122 versions, and can dispense with the atomic ope    113 versions, and can dispense with the atomic operations, memory barriers,
123 and communications cache misses that are so ex    114 and communications cache misses that are so expensive on present-day
124 SMP computer systems, even in absence of lock     115 SMP computer systems, even in absence of lock contention.
125                                                   116 
126 In the three-step procedure shown above, the u    117 In the three-step procedure shown above, the updater is performing both
127 the removal and the reclamation step, but it i    118 the removal and the reclamation step, but it is often helpful for an
128 entirely different thread to do the reclamatio    119 entirely different thread to do the reclamation, as is in fact the case
129 in the Linux kernel's directory-entry cache (d    120 in the Linux kernel's directory-entry cache (dcache).  Even if the same
130 thread performs both the update step (step (a)    121 thread performs both the update step (step (a) above) and the reclamation
131 step (step (c) above), it is often helpful to     122 step (step (c) above), it is often helpful to think of them separately.
132 For example, RCU readers and updaters need not    123 For example, RCU readers and updaters need not communicate at all,
133 but RCU provides implicit low-overhead communi    124 but RCU provides implicit low-overhead communication between readers
134 and reclaimers, namely, in step (b) above.        125 and reclaimers, namely, in step (b) above.
135                                                   126 
136 So how the heck can a reclaimer tell when a re    127 So how the heck can a reclaimer tell when a reader is done, given
137 that readers are not doing any sort of synchro    128 that readers are not doing any sort of synchronization operations???
138 Read on to learn about how RCU's API makes thi    129 Read on to learn about how RCU's API makes this easy.
139                                                   130 
140 .. _2_whatisRCU:                                  131 .. _2_whatisRCU:
141                                                   132 
142 2.  WHAT IS RCU'S CORE API?                       133 2.  WHAT IS RCU'S CORE API?
143 ---------------------------                       134 ---------------------------
144                                                   135 
145 The core RCU API is quite small:                  136 The core RCU API is quite small:
146                                                   137 
147 a.      rcu_read_lock()                           138 a.      rcu_read_lock()
148 b.      rcu_read_unlock()                         139 b.      rcu_read_unlock()
149 c.      synchronize_rcu() / call_rcu()            140 c.      synchronize_rcu() / call_rcu()
150 d.      rcu_assign_pointer()                      141 d.      rcu_assign_pointer()
151 e.      rcu_dereference()                         142 e.      rcu_dereference()
152                                                   143 
153 There are many other members of the RCU API, b    144 There are many other members of the RCU API, but the rest can be
154 expressed in terms of these five, though most     145 expressed in terms of these five, though most implementations instead
155 express synchronize_rcu() in terms of the call    146 express synchronize_rcu() in terms of the call_rcu() callback API.
156                                                   147 
157 The five core RCU APIs are described below, th    148 The five core RCU APIs are described below, the other 18 will be enumerated
158 later.  See the kernel docbook documentation f    149 later.  See the kernel docbook documentation for more info, or look directly
159 at the function header comments.                  150 at the function header comments.
160                                                   151 
161 rcu_read_lock()                                   152 rcu_read_lock()
162 ^^^^^^^^^^^^^^^                                   153 ^^^^^^^^^^^^^^^
163         void rcu_read_lock(void);                 154         void rcu_read_lock(void);
164                                                   155 
165         This temporal primitive is used by a r !! 156         Used by a reader to inform the reclaimer that the reader is
166         reclaimer that the reader is entering  !! 157         entering an RCU read-side critical section.  It is illegal
167         section.  It is illegal to block while !! 158         to block while in an RCU read-side critical section, though
168         critical section, though kernels built !! 159         kernels built with CONFIG_PREEMPT_RCU can preempt RCU
169         can preempt RCU read-side critical sec !! 160         read-side critical sections.  Any RCU-protected data structure
170         data structure accessed during an RCU  !! 161         accessed during an RCU read-side critical section is guaranteed to
171         is guaranteed to remain unreclaimed fo !! 162         remain unreclaimed for the full duration of that critical section.
172         critical section.  Reference counts ma !! 163         Reference counts may be used in conjunction with RCU to maintain
173         with RCU to maintain longer-term refer !! 164         longer-term references to data structures.
174                                                << 
175         Note that anything that disables botto << 
176         or interrupts also enters an RCU read- << 
177         Acquiring a spinlock also enters an RC << 
178         sections, even for spinlocks that do n << 
179         as is the case in kernels built with C << 
180         Sleeplocks do *not* enter RCU read-sid << 
181                                                   165 
182 rcu_read_unlock()                                 166 rcu_read_unlock()
183 ^^^^^^^^^^^^^^^^^                                 167 ^^^^^^^^^^^^^^^^^
184         void rcu_read_unlock(void);               168         void rcu_read_unlock(void);
185                                                   169 
186         This temporal primitives is used by a  !! 170         Used by a reader to inform the reclaimer that the reader is
187         reclaimer that the reader is exiting a !! 171         exiting an RCU read-side critical section.  Note that RCU
188         section.  Anything that enables bottom !! 172         read-side critical sections may be nested and/or overlapping.
189         or interrupts also exits an RCU read-s << 
190         Releasing a spinlock also exits an RCU << 
191                                                << 
192         Note that RCU read-side critical secti << 
193         overlapping.                           << 
194                                                   173 
195 synchronize_rcu()                                 174 synchronize_rcu()
196 ^^^^^^^^^^^^^^^^^                                 175 ^^^^^^^^^^^^^^^^^
197         void synchronize_rcu(void);               176         void synchronize_rcu(void);
198                                                   177 
199         This temporal primitive marks the end  !! 178         Marks the end of updater code and the beginning of reclaimer
200         beginning of reclaimer code.  It does  !! 179         code.  It does this by blocking until all pre-existing RCU
201         all pre-existing RCU read-side critica !! 180         read-side critical sections on all CPUs have completed.
202         have completed.  Note that synchronize !! 181         Note that synchronize_rcu() will **not** necessarily wait for
203         necessarily wait for any subsequent RC !! 182         any subsequent RCU read-side critical sections to complete.
204         sections to complete.  For example, co !! 183         For example, consider the following sequence of events::
205         sequence of events::                   << 
206                                                   184 
207                  CPU 0                  CPU 1     185                  CPU 0                  CPU 1                 CPU 2
208              ----------------- ---------------    186              ----------------- ------------------------- ---------------
209          1.  rcu_read_lock()                      187          1.  rcu_read_lock()
210          2.                    enters synchron    188          2.                    enters synchronize_rcu()
211          3.                                       189          3.                                               rcu_read_lock()
212          4.  rcu_read_unlock()                    190          4.  rcu_read_unlock()
213          5.                     exits synchron    191          5.                     exits synchronize_rcu()
214          6.                                       192          6.                                              rcu_read_unlock()
215                                                   193 
216         To reiterate, synchronize_rcu() waits     194         To reiterate, synchronize_rcu() waits only for ongoing RCU
217         read-side critical sections to complet    195         read-side critical sections to complete, not necessarily for
218         any that begin after synchronize_rcu()    196         any that begin after synchronize_rcu() is invoked.
219                                                   197 
220         Of course, synchronize_rcu() does not     198         Of course, synchronize_rcu() does not necessarily return
221         **immediately** after the last pre-exi    199         **immediately** after the last pre-existing RCU read-side critical
222         section completes.  For one thing, the    200         section completes.  For one thing, there might well be scheduling
223         delays.  For another thing, many RCU i    201         delays.  For another thing, many RCU implementations process
224         requests in batches in order to improv    202         requests in batches in order to improve efficiencies, which can
225         further delay synchronize_rcu().          203         further delay synchronize_rcu().
226                                                   204 
227         Since synchronize_rcu() is the API tha    205         Since synchronize_rcu() is the API that must figure out when
228         readers are done, its implementation i    206         readers are done, its implementation is key to RCU.  For RCU
229         to be useful in all but the most read-    207         to be useful in all but the most read-intensive situations,
230         synchronize_rcu()'s overhead must also    208         synchronize_rcu()'s overhead must also be quite small.
231                                                   209 
232         The call_rcu() API is an asynchronous  !! 210         The call_rcu() API is a callback form of synchronize_rcu(),
233         synchronize_rcu(), and is described in !! 211         and is described in more detail in a later section.  Instead of
234         section.  Instead of blocking, it regi !! 212         blocking, it registers a function and argument which are invoked
235         argument which are invoked after all o !! 213         after all ongoing RCU read-side critical sections have completed.
236         critical sections have completed.  Thi !! 214         This callback variant is particularly useful in situations where
237         particularly useful in situations wher !! 215         it is illegal to block or where update-side performance is
238         or where update-side performance is cr !! 216         critically important.
239                                                   217 
240         However, the call_rcu() API should not    218         However, the call_rcu() API should not be used lightly, as use
241         of the synchronize_rcu() API generally    219         of the synchronize_rcu() API generally results in simpler code.
242         In addition, the synchronize_rcu() API    220         In addition, the synchronize_rcu() API has the nice property
243         of automatically limiting update rate     221         of automatically limiting update rate should grace periods
244         be delayed.  This property results in     222         be delayed.  This property results in system resilience in face
245         of denial-of-service attacks.  Code us    223         of denial-of-service attacks.  Code using call_rcu() should limit
246         update rate in order to gain this same    224         update rate in order to gain this same sort of resilience.  See
247         checklist.rst for some approaches to l !! 225         checklist.txt for some approaches to limiting the update rate.
248                                                   226 
249 rcu_assign_pointer()                              227 rcu_assign_pointer()
250 ^^^^^^^^^^^^^^^^^^^^                              228 ^^^^^^^^^^^^^^^^^^^^
251         void rcu_assign_pointer(p, typeof(p) v    229         void rcu_assign_pointer(p, typeof(p) v);
252                                                   230 
253         Yes, rcu_assign_pointer() **is** imple !! 231         Yes, rcu_assign_pointer() **is** implemented as a macro, though it
254         it would be cool to be able to declare !! 232         would be cool to be able to declare a function in this manner.
255         (And there has been some discussion of !! 233         (Compiler experts will no doubt disagree.)
256         to the C language, so who knows?)      << 
257                                                   234 
258         The updater uses this spatial macro to !! 235         The updater uses this function to assign a new value to an
259         RCU-protected pointer, in order to saf    236         RCU-protected pointer, in order to safely communicate the change
260         in value from the updater to the reade !! 237         in value from the updater to the reader.  This macro does not
261         opposed to temporal) macro.  It does n !! 238         evaluate to an rvalue, but it does execute any memory-barrier
262         but it does provide any compiler direc !! 239         instructions required for a given CPU architecture.
263         instructions required for a given comp !! 240 
264         Its ordering properties are that of a  !! 241         Perhaps just as important, it serves to document (1) which
265         that is, any prior loads and stores re !! 242         pointers are protected by RCU and (2) the point at which a
266         structure are ordered before the store !! 243         given structure becomes accessible to other CPUs.  That said,
267         to that structure.                     << 
268                                                << 
269         Perhaps just as important, rcu_assign_ << 
270         (1) which pointers are protected by RC << 
271         a given structure becomes accessible t << 
272         rcu_assign_pointer() is most frequentl    244         rcu_assign_pointer() is most frequently used indirectly, via
273         the _rcu list-manipulation primitives     245         the _rcu list-manipulation primitives such as list_add_rcu().
274                                                   246 
275 rcu_dereference()                                 247 rcu_dereference()
276 ^^^^^^^^^^^^^^^^^                                 248 ^^^^^^^^^^^^^^^^^
277         typeof(p) rcu_dereference(p);             249         typeof(p) rcu_dereference(p);
278                                                   250 
279         Like rcu_assign_pointer(), rcu_derefer    251         Like rcu_assign_pointer(), rcu_dereference() must be implemented
280         as a macro.                               252         as a macro.
281                                                   253 
282         The reader uses the spatial rcu_derefe !! 254         The reader uses rcu_dereference() to fetch an RCU-protected
283         an RCU-protected pointer, which return !! 255         pointer, which returns a value that may then be safely
284         then be safely dereferenced.  Note tha !! 256         dereferenced.  Note that rcu_dereference() does not actually
285         does not actually dereference the poin !! 257         dereference the pointer, instead, it protects the pointer for
286         protects the pointer for later derefer !! 258         later dereferencing.  It also executes any needed memory-barrier
287         executes any needed memory-barrier ins !! 259         instructions for a given CPU architecture.  Currently, only Alpha
288         CPU architecture.  Currently, only Alp !! 260         needs memory barriers within rcu_dereference() -- on other CPUs,
289         within rcu_dereference() -- on other C !! 261         it compiles to nothing, not even a compiler directive.
290         volatile load.  However, no mainstream << 
291         address dependencies, so rcu_dereferen << 
292         which, in combination with the coding  << 
293         rcu_dereference.rst, prevent current c << 
294         these dependencies.                    << 
295                                                   262 
296         Common coding practice uses rcu_derefe    263         Common coding practice uses rcu_dereference() to copy an
297         RCU-protected pointer to a local varia    264         RCU-protected pointer to a local variable, then dereferences
298         this local variable, for example as fo    265         this local variable, for example as follows::
299                                                   266 
300                 p = rcu_dereference(head.next)    267                 p = rcu_dereference(head.next);
301                 return p->data;                   268                 return p->data;
302                                                   269 
303         However, in this case, one could just     270         However, in this case, one could just as easily combine these
304         into one statement::                      271         into one statement::
305                                                   272 
306                 return rcu_dereference(head.ne    273                 return rcu_dereference(head.next)->data;
307                                                   274 
308         If you are going to be fetching multip    275         If you are going to be fetching multiple fields from the
309         RCU-protected structure, using the loc    276         RCU-protected structure, using the local variable is of
310         course preferred.  Repeated rcu_derefe    277         course preferred.  Repeated rcu_dereference() calls look
311         ugly, do not guarantee that the same p    278         ugly, do not guarantee that the same pointer will be returned
312         if an update happened while in the cri    279         if an update happened while in the critical section, and incur
313         unnecessary overhead on Alpha CPUs.       280         unnecessary overhead on Alpha CPUs.
314                                                   281 
315         Note that the value returned by rcu_de    282         Note that the value returned by rcu_dereference() is valid
316         only within the enclosing RCU read-sid    283         only within the enclosing RCU read-side critical section [1]_.
317         For example, the following is **not**     284         For example, the following is **not** legal::
318                                                   285 
319                 rcu_read_lock();                  286                 rcu_read_lock();
320                 p = rcu_dereference(head.next)    287                 p = rcu_dereference(head.next);
321                 rcu_read_unlock();                288                 rcu_read_unlock();
322                 x = p->address; /* BUG!!! */      289                 x = p->address; /* BUG!!! */
323                 rcu_read_lock();                  290                 rcu_read_lock();
324                 y = p->data;    /* BUG!!! */      291                 y = p->data;    /* BUG!!! */
325                 rcu_read_unlock();                292                 rcu_read_unlock();
326                                                   293 
327         Holding a reference from one RCU read-    294         Holding a reference from one RCU read-side critical section
328         to another is just as illegal as holdi    295         to another is just as illegal as holding a reference from
329         one lock-based critical section to ano    296         one lock-based critical section to another!  Similarly,
330         using a reference outside of the criti    297         using a reference outside of the critical section in which
331         it was acquired is just as illegal as     298         it was acquired is just as illegal as doing so with normal
332         locking.                                  299         locking.
333                                                   300 
334         As with rcu_assign_pointer(), an impor    301         As with rcu_assign_pointer(), an important function of
335         rcu_dereference() is to document which    302         rcu_dereference() is to document which pointers are protected by
336         RCU, in particular, flagging a pointer    303         RCU, in particular, flagging a pointer that is subject to changing
337         at any time, including immediately aft    304         at any time, including immediately after the rcu_dereference().
338         And, again like rcu_assign_pointer(),     305         And, again like rcu_assign_pointer(), rcu_dereference() is
339         typically used indirectly, via the _rc    306         typically used indirectly, via the _rcu list-manipulation
340         primitives, such as list_for_each_entr    307         primitives, such as list_for_each_entry_rcu() [2]_.
341                                                   308 
342 ..      [1] The variant rcu_dereference_protec    309 ..      [1] The variant rcu_dereference_protected() can be used outside
343         of an RCU read-side critical section a    310         of an RCU read-side critical section as long as the usage is
344         protected by locks acquired by the upd    311         protected by locks acquired by the update-side code.  This variant
345         avoids the lockdep warning that would     312         avoids the lockdep warning that would happen when using (for
346         example) rcu_dereference() without rcu    313         example) rcu_dereference() without rcu_read_lock() protection.
347         Using rcu_dereference_protected() also    314         Using rcu_dereference_protected() also has the advantage
348         of permitting compiler optimizations t    315         of permitting compiler optimizations that rcu_dereference()
349         must prohibit.  The rcu_dereference_pr    316         must prohibit.  The rcu_dereference_protected() variant takes
350         a lockdep expression to indicate which    317         a lockdep expression to indicate which locks must be acquired
351         by the caller. If the indicated protec    318         by the caller. If the indicated protection is not provided,
352         a lockdep splat is emitted.  See Desig !! 319         a lockdep splat is emitted.  See Documentation/RCU/Design/Requirements/Requirements.rst
353         and the API's code comments for more d    320         and the API's code comments for more details and example usage.
354                                                   321 
355 ..      [2] If the list_for_each_entry_rcu() i    322 ..      [2] If the list_for_each_entry_rcu() instance might be used by
356         update-side code as well as by RCU rea    323         update-side code as well as by RCU readers, then an additional
357         lockdep expression can be added to its    324         lockdep expression can be added to its list of arguments.
358         For example, given an additional "lock    325         For example, given an additional "lock_is_held(&mylock)" argument,
359         the RCU lockdep code would complain on    326         the RCU lockdep code would complain only if this instance was
360         invoked outside of an RCU read-side cr    327         invoked outside of an RCU read-side critical section and without
361         the protection of mylock.                 328         the protection of mylock.
362                                                   329 
363 The following diagram shows how each API commu    330 The following diagram shows how each API communicates among the
364 reader, updater, and reclaimer.                   331 reader, updater, and reclaimer.
365 ::                                                332 ::
366                                                   333 
367                                                   334 
368             rcu_assign_pointer()                  335             rcu_assign_pointer()
369                                     +--------+    336                                     +--------+
370             +---------------------->| reader |    337             +---------------------->| reader |---------+
371             |                       +--------+    338             |                       +--------+         |
372             |                           |         339             |                           |              |
373             |                           |         340             |                           |              | Protect:
374             |                           |         341             |                           |              | rcu_read_lock()
375             |                           |         342             |                           |              | rcu_read_unlock()
376             |        rcu_dereference()  |         343             |        rcu_dereference()  |              |
377             +---------+                 |         344             +---------+                 |              |
378             | updater |<----------------+         345             | updater |<----------------+              |
379             +---------+                           346             +---------+                                V
380             |                                     347             |                                    +-----------+
381             +---------------------------------    348             +----------------------------------->| reclaimer |
382                                                   349                                                  +-----------+
383               Defer:                              350               Defer:
384               synchronize_rcu() & call_rcu()      351               synchronize_rcu() & call_rcu()
385                                                   352 
386                                                   353 
387 The RCU infrastructure observes the temporal s !! 354 The RCU infrastructure observes the time sequence of rcu_read_lock(),
388 rcu_read_unlock(), synchronize_rcu(), and call    355 rcu_read_unlock(), synchronize_rcu(), and call_rcu() invocations in
389 order to determine when (1) synchronize_rcu()     356 order to determine when (1) synchronize_rcu() invocations may return
390 to their callers and (2) call_rcu() callbacks     357 to their callers and (2) call_rcu() callbacks may be invoked.  Efficient
391 implementations of the RCU infrastructure make    358 implementations of the RCU infrastructure make heavy use of batching in
392 order to amortize their overhead over many use    359 order to amortize their overhead over many uses of the corresponding APIs.
393 The rcu_assign_pointer() and rcu_dereference() << 
394 spatial changes via stores to and loads from t << 
395 question.                                      << 
396                                                   360 
397 There are at least three flavors of RCU usage     361 There are at least three flavors of RCU usage in the Linux kernel. The diagram
398 above shows the most common one. On the update    362 above shows the most common one. On the updater side, the rcu_assign_pointer(),
399 synchronize_rcu() and call_rcu() primitives us    363 synchronize_rcu() and call_rcu() primitives used are the same for all three
400 flavors. However for protection (on the reader    364 flavors. However for protection (on the reader side), the primitives used vary
401 depending on the flavor:                          365 depending on the flavor:
402                                                   366 
403 a.      rcu_read_lock() / rcu_read_unlock()       367 a.      rcu_read_lock() / rcu_read_unlock()
404         rcu_dereference()                         368         rcu_dereference()
405                                                   369 
406 b.      rcu_read_lock_bh() / rcu_read_unlock_b    370 b.      rcu_read_lock_bh() / rcu_read_unlock_bh()
407         local_bh_disable() / local_bh_enable()    371         local_bh_disable() / local_bh_enable()
408         rcu_dereference_bh()                      372         rcu_dereference_bh()
409                                                   373 
410 c.      rcu_read_lock_sched() / rcu_read_unloc    374 c.      rcu_read_lock_sched() / rcu_read_unlock_sched()
411         preempt_disable() / preempt_enable()      375         preempt_disable() / preempt_enable()
412         local_irq_save() / local_irq_restore()    376         local_irq_save() / local_irq_restore()
413         hardirq enter / hardirq exit              377         hardirq enter / hardirq exit
414         NMI enter / NMI exit                      378         NMI enter / NMI exit
415         rcu_dereference_sched()                   379         rcu_dereference_sched()
416                                                   380 
417 These three flavors are used as follows:          381 These three flavors are used as follows:
418                                                   382 
419 a.      RCU applied to normal data structures.    383 a.      RCU applied to normal data structures.
420                                                   384 
421 b.      RCU applied to networking data structu    385 b.      RCU applied to networking data structures that may be subjected
422         to remote denial-of-service attacks.      386         to remote denial-of-service attacks.
423                                                   387 
424 c.      RCU applied to scheduler and interrupt    388 c.      RCU applied to scheduler and interrupt/NMI-handler tasks.
425                                                   389 
426 Again, most uses will be of (a).  The (b) and     390 Again, most uses will be of (a).  The (b) and (c) cases are important
427 for specialized uses, but are relatively uncom !! 391 for specialized uses, but are relatively uncommon.
428 RCU-Tasks-Rude, and RCU-Tasks-Trace have simil << 
429 their assorted primitives.                     << 
430                                                   392 
431 .. _3_whatisRCU:                                  393 .. _3_whatisRCU:
432                                                   394 
433 3.  WHAT ARE SOME EXAMPLE USES OF CORE RCU API    395 3.  WHAT ARE SOME EXAMPLE USES OF CORE RCU API?
434 ----------------------------------------------    396 -----------------------------------------------
435                                                   397 
436 This section shows a simple use of the core RC    398 This section shows a simple use of the core RCU API to protect a
437 global pointer to a dynamically allocated stru    399 global pointer to a dynamically allocated structure.  More-typical
438 uses of RCU may be found in listRCU.rst and NM !! 400 uses of RCU may be found in :ref:`listRCU.rst <list_rcu_doc>`,
                                                   >> 401 :ref:`arrayRCU.rst <array_rcu_doc>`, and :ref:`NMI-RCU.rst <NMI_rcu_doc>`.
439 ::                                                402 ::
440                                                   403 
441         struct foo {                              404         struct foo {
442                 int a;                            405                 int a;
443                 char b;                           406                 char b;
444                 long c;                           407                 long c;
445         };                                        408         };
446         DEFINE_SPINLOCK(foo_mutex);               409         DEFINE_SPINLOCK(foo_mutex);
447                                                   410 
448         struct foo __rcu *gbl_foo;                411         struct foo __rcu *gbl_foo;
449                                                   412 
450         /*                                        413         /*
451          * Create a new struct foo that is the    414          * Create a new struct foo that is the same as the one currently
452          * pointed to by gbl_foo, except that     415          * pointed to by gbl_foo, except that field "a" is replaced
453          * with "new_a".  Points gbl_foo to th    416          * with "new_a".  Points gbl_foo to the new structure, and
454          * frees up the old structure after a     417          * frees up the old structure after a grace period.
455          *                                        418          *
456          * Uses rcu_assign_pointer() to ensure    419          * Uses rcu_assign_pointer() to ensure that concurrent readers
457          * see the initialized version of the     420          * see the initialized version of the new structure.
458          *                                        421          *
459          * Uses synchronize_rcu() to ensure th    422          * Uses synchronize_rcu() to ensure that any readers that might
460          * have references to the old structur    423          * have references to the old structure complete before freeing
461          * the old structure.                     424          * the old structure.
462          */                                       425          */
463         void foo_update_a(int new_a)              426         void foo_update_a(int new_a)
464         {                                         427         {
465                 struct foo *new_fp;               428                 struct foo *new_fp;
466                 struct foo *old_fp;               429                 struct foo *old_fp;
467                                                   430 
468                 new_fp = kmalloc(sizeof(*new_f    431                 new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
469                 spin_lock(&foo_mutex);            432                 spin_lock(&foo_mutex);
470                 old_fp = rcu_dereference_prote    433                 old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex));
471                 *new_fp = *old_fp;                434                 *new_fp = *old_fp;
472                 new_fp->a = new_a;                435                 new_fp->a = new_a;
473                 rcu_assign_pointer(gbl_foo, ne    436                 rcu_assign_pointer(gbl_foo, new_fp);
474                 spin_unlock(&foo_mutex);          437                 spin_unlock(&foo_mutex);
475                 synchronize_rcu();                438                 synchronize_rcu();
476                 kfree(old_fp);                    439                 kfree(old_fp);
477         }                                         440         }
478                                                   441 
479         /*                                        442         /*
480          * Return the value of field "a" of th    443          * Return the value of field "a" of the current gbl_foo
481          * structure.  Use rcu_read_lock() and    444          * structure.  Use rcu_read_lock() and rcu_read_unlock()
482          * to ensure that the structure does n    445          * to ensure that the structure does not get deleted out
483          * from under us, and use rcu_derefere    446          * from under us, and use rcu_dereference() to ensure that
484          * we see the initialized version of t    447          * we see the initialized version of the structure (important
485          * for DEC Alpha and for people readin    448          * for DEC Alpha and for people reading the code).
486          */                                       449          */
487         int foo_get_a(void)                       450         int foo_get_a(void)
488         {                                         451         {
489                 int retval;                       452                 int retval;
490                                                   453 
491                 rcu_read_lock();                  454                 rcu_read_lock();
492                 retval = rcu_dereference(gbl_f    455                 retval = rcu_dereference(gbl_foo)->a;
493                 rcu_read_unlock();                456                 rcu_read_unlock();
494                 return retval;                    457                 return retval;
495         }                                         458         }
496                                                   459 
497 So, to sum up:                                    460 So, to sum up:
498                                                   461 
499 -       Use rcu_read_lock() and rcu_read_unloc    462 -       Use rcu_read_lock() and rcu_read_unlock() to guard RCU
500         read-side critical sections.              463         read-side critical sections.
501                                                   464 
502 -       Within an RCU read-side critical secti    465 -       Within an RCU read-side critical section, use rcu_dereference()
503         to dereference RCU-protected pointers.    466         to dereference RCU-protected pointers.
504                                                   467 
505 -       Use some solid design (such as locks o !! 468 -       Use some solid scheme (such as locks or semaphores) to
506         keep concurrent updates from interferi    469         keep concurrent updates from interfering with each other.
507                                                   470 
508 -       Use rcu_assign_pointer() to update an     471 -       Use rcu_assign_pointer() to update an RCU-protected pointer.
509         This primitive protects concurrent rea    472         This primitive protects concurrent readers from the updater,
510         **not** concurrent updates from each o    473         **not** concurrent updates from each other!  You therefore still
511         need to use locking (or something simi    474         need to use locking (or something similar) to keep concurrent
512         rcu_assign_pointer() primitives from i    475         rcu_assign_pointer() primitives from interfering with each other.
513                                                   476 
514 -       Use synchronize_rcu() **after** removi    477 -       Use synchronize_rcu() **after** removing a data element from an
515         RCU-protected data structure, but **be    478         RCU-protected data structure, but **before** reclaiming/freeing
516         the data element, in order to wait for    479         the data element, in order to wait for the completion of all
517         RCU read-side critical sections that m    480         RCU read-side critical sections that might be referencing that
518         data item.                                481         data item.
519                                                   482 
520 See checklist.rst for additional rules to foll !! 483 See checklist.txt for additional rules to follow when using RCU.
521 And again, more-typical uses of RCU may be fou !! 484 And again, more-typical uses of RCU may be found in :ref:`listRCU.rst
522 and NMI-RCU.rst.                               !! 485 <list_rcu_doc>`, :ref:`arrayRCU.rst <array_rcu_doc>`, and :ref:`NMI-RCU.rst
                                                   >> 486 <NMI_rcu_doc>`.
523                                                   487 
524 .. _4_whatisRCU:                                  488 .. _4_whatisRCU:
525                                                   489 
526 4.  WHAT IF MY UPDATING THREAD CANNOT BLOCK?      490 4.  WHAT IF MY UPDATING THREAD CANNOT BLOCK?
527 --------------------------------------------      491 --------------------------------------------
528                                                   492 
529 In the example above, foo_update_a() blocks un    493 In the example above, foo_update_a() blocks until a grace period elapses.
530 This is quite simple, but in some cases one ca    494 This is quite simple, but in some cases one cannot afford to wait so
531 long -- there might be other high-priority wor    495 long -- there might be other high-priority work to be done.
532                                                   496 
533 In such cases, one uses call_rcu() rather than    497 In such cases, one uses call_rcu() rather than synchronize_rcu().
534 The call_rcu() API is as follows::                498 The call_rcu() API is as follows::
535                                                   499 
536         void call_rcu(struct rcu_head *head, r    500         void call_rcu(struct rcu_head *head, rcu_callback_t func);
537                                                   501 
538 This function invokes func(head) after a grace    502 This function invokes func(head) after a grace period has elapsed.
539 This invocation might happen from either softi    503 This invocation might happen from either softirq or process context,
540 so the function is not permitted to block.  Th    504 so the function is not permitted to block.  The foo struct needs to
541 have an rcu_head structure added, perhaps as f    505 have an rcu_head structure added, perhaps as follows::
542                                                   506 
543         struct foo {                              507         struct foo {
544                 int a;                            508                 int a;
545                 char b;                           509                 char b;
546                 long c;                           510                 long c;
547                 struct rcu_head rcu;              511                 struct rcu_head rcu;
548         };                                        512         };
549                                                   513 
550 The foo_update_a() function might then be writ    514 The foo_update_a() function might then be written as follows::
551                                                   515 
552         /*                                        516         /*
553          * Create a new struct foo that is the    517          * Create a new struct foo that is the same as the one currently
554          * pointed to by gbl_foo, except that     518          * pointed to by gbl_foo, except that field "a" is replaced
555          * with "new_a".  Points gbl_foo to th    519          * with "new_a".  Points gbl_foo to the new structure, and
556          * frees up the old structure after a     520          * frees up the old structure after a grace period.
557          *                                        521          *
558          * Uses rcu_assign_pointer() to ensure    522          * Uses rcu_assign_pointer() to ensure that concurrent readers
559          * see the initialized version of the     523          * see the initialized version of the new structure.
560          *                                        524          *
561          * Uses call_rcu() to ensure that any     525          * Uses call_rcu() to ensure that any readers that might have
562          * references to the old structure com    526          * references to the old structure complete before freeing the
563          * old structure.                         527          * old structure.
564          */                                       528          */
565         void foo_update_a(int new_a)              529         void foo_update_a(int new_a)
566         {                                         530         {
567                 struct foo *new_fp;               531                 struct foo *new_fp;
568                 struct foo *old_fp;               532                 struct foo *old_fp;
569                                                   533 
570                 new_fp = kmalloc(sizeof(*new_f    534                 new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
571                 spin_lock(&foo_mutex);            535                 spin_lock(&foo_mutex);
572                 old_fp = rcu_dereference_prote    536                 old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex));
573                 *new_fp = *old_fp;                537                 *new_fp = *old_fp;
574                 new_fp->a = new_a;                538                 new_fp->a = new_a;
575                 rcu_assign_pointer(gbl_foo, ne    539                 rcu_assign_pointer(gbl_foo, new_fp);
576                 spin_unlock(&foo_mutex);          540                 spin_unlock(&foo_mutex);
577                 call_rcu(&old_fp->rcu, foo_rec    541                 call_rcu(&old_fp->rcu, foo_reclaim);
578         }                                         542         }
579                                                   543 
580 The foo_reclaim() function might appear as fol    544 The foo_reclaim() function might appear as follows::
581                                                   545 
582         void foo_reclaim(struct rcu_head *rp)     546         void foo_reclaim(struct rcu_head *rp)
583         {                                         547         {
584                 struct foo *fp = container_of(    548                 struct foo *fp = container_of(rp, struct foo, rcu);
585                                                   549 
586                 foo_cleanup(fp->a);               550                 foo_cleanup(fp->a);
587                                                   551 
588                 kfree(fp);                        552                 kfree(fp);
589         }                                         553         }
590                                                   554 
591 The container_of() primitive is a macro that,     555 The container_of() primitive is a macro that, given a pointer into a
592 struct, the type of the struct, and the pointe    556 struct, the type of the struct, and the pointed-to field within the
593 struct, returns a pointer to the beginning of     557 struct, returns a pointer to the beginning of the struct.
594                                                   558 
595 The use of call_rcu() permits the caller of fo    559 The use of call_rcu() permits the caller of foo_update_a() to
596 immediately regain control, without needing to    560 immediately regain control, without needing to worry further about the
597 old version of the newly updated element.  It     561 old version of the newly updated element.  It also clearly shows the
598 RCU distinction between updater, namely foo_up    562 RCU distinction between updater, namely foo_update_a(), and reclaimer,
599 namely foo_reclaim().                             563 namely foo_reclaim().
600                                                   564 
601 The summary of advice is the same as for the p    565 The summary of advice is the same as for the previous section, except
602 that we are now using call_rcu() rather than s    566 that we are now using call_rcu() rather than synchronize_rcu():
603                                                   567 
604 -       Use call_rcu() **after** removing a da    568 -       Use call_rcu() **after** removing a data element from an
605         RCU-protected data structure in order     569         RCU-protected data structure in order to register a callback
606         function that will be invoked after th    570         function that will be invoked after the completion of all RCU
607         read-side critical sections that might    571         read-side critical sections that might be referencing that
608         data item.                                572         data item.
609                                                   573 
610 If the callback for call_rcu() is not doing an    574 If the callback for call_rcu() is not doing anything more than calling
611 kfree() on the structure, you can use kfree_rc    575 kfree() on the structure, you can use kfree_rcu() instead of call_rcu()
612 to avoid having to write your own callback::      576 to avoid having to write your own callback::
613                                                   577 
614         kfree_rcu(old_fp, rcu);                   578         kfree_rcu(old_fp, rcu);
615                                                   579 
616 If the occasional sleep is permitted, the sing !! 580 Again, see checklist.txt for additional rules governing the use of RCU.
617 be used, omitting the rcu_head structure from  << 
618                                                << 
619         kfree_rcu_mightsleep(old_fp);          << 
620                                                << 
621 This variant almost never blocks, but might do << 
622 synchronize_rcu() in response to memory-alloca << 
623                                                << 
624 Again, see checklist.rst for additional rules  << 
625                                                   581 
626 .. _5_whatisRCU:                                  582 .. _5_whatisRCU:
627                                                   583 
628 5.  WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RC    584 5.  WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU?
629 ----------------------------------------------    585 ------------------------------------------------
630                                                   586 
631 One of the nice things about RCU is that it ha    587 One of the nice things about RCU is that it has extremely simple "toy"
632 implementations that are a good first step tow    588 implementations that are a good first step towards understanding the
633 production-quality implementations in the Linu    589 production-quality implementations in the Linux kernel.  This section
634 presents two such "toy" implementations of RCU    590 presents two such "toy" implementations of RCU, one that is implemented
635 in terms of familiar locking primitives, and a    591 in terms of familiar locking primitives, and another that more closely
636 resembles "classic" RCU.  Both are way too sim    592 resembles "classic" RCU.  Both are way too simple for real-world use,
637 lacking both functionality and performance.  H    593 lacking both functionality and performance.  However, they are useful
638 in getting a feel for how RCU works.  See kern    594 in getting a feel for how RCU works.  See kernel/rcu/update.c for a
639 production-quality implementation, and see:       595 production-quality implementation, and see:
640                                                   596 
641         https://docs.google.com/document/d/1X0 !! 597         http://www.rdrop.com/users/paulmck/RCU
642                                                   598 
643 for papers describing the Linux kernel RCU imp    599 for papers describing the Linux kernel RCU implementation.  The OLS'01
644 and OLS'02 papers are a good introduction, and    600 and OLS'02 papers are a good introduction, and the dissertation provides
645 more details on the current implementation as     601 more details on the current implementation as of early 2004.
646                                                   602 
647                                                   603 
648 5A.  "TOY" IMPLEMENTATION #1: LOCKING             604 5A.  "TOY" IMPLEMENTATION #1: LOCKING
649 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^             605 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
650 This section presents a "toy" RCU implementati    606 This section presents a "toy" RCU implementation that is based on
651 familiar locking primitives.  Its overhead mak    607 familiar locking primitives.  Its overhead makes it a non-starter for
652 real-life use, as does its lack of scalability    608 real-life use, as does its lack of scalability.  It is also unsuitable
653 for realtime use, since it allows scheduling l    609 for realtime use, since it allows scheduling latency to "bleed" from
654 one read-side critical section to another.  It    610 one read-side critical section to another.  It also assumes recursive
655 reader-writer locks:  If you try this with non    611 reader-writer locks:  If you try this with non-recursive locks, and
656 you allow nested rcu_read_lock() calls, you ca    612 you allow nested rcu_read_lock() calls, you can deadlock.
657                                                   613 
658 However, it is probably the easiest implementa    614 However, it is probably the easiest implementation to relate to, so is
659 a good starting point.                            615 a good starting point.
660                                                   616 
661 It is extremely simple::                          617 It is extremely simple::
662                                                   618 
663         static DEFINE_RWLOCK(rcu_gp_mutex);       619         static DEFINE_RWLOCK(rcu_gp_mutex);
664                                                   620 
665         void rcu_read_lock(void)                  621         void rcu_read_lock(void)
666         {                                         622         {
667                 read_lock(&rcu_gp_mutex);         623                 read_lock(&rcu_gp_mutex);
668         }                                         624         }
669                                                   625 
670         void rcu_read_unlock(void)                626         void rcu_read_unlock(void)
671         {                                         627         {
672                 read_unlock(&rcu_gp_mutex);       628                 read_unlock(&rcu_gp_mutex);
673         }                                         629         }
674                                                   630 
675         void synchronize_rcu(void)                631         void synchronize_rcu(void)
676         {                                         632         {
677                 write_lock(&rcu_gp_mutex);        633                 write_lock(&rcu_gp_mutex);
678                 smp_mb__after_spinlock();         634                 smp_mb__after_spinlock();
679                 write_unlock(&rcu_gp_mutex);      635                 write_unlock(&rcu_gp_mutex);
680         }                                         636         }
681                                                   637 
682 [You can ignore rcu_assign_pointer() and rcu_d    638 [You can ignore rcu_assign_pointer() and rcu_dereference() without missing
683 much.  But here are simplified versions anyway    639 much.  But here are simplified versions anyway.  And whatever you do,
684 don't forget about them when submitting patche    640 don't forget about them when submitting patches making use of RCU!]::
685                                                   641 
686         #define rcu_assign_pointer(p, v) \        642         #define rcu_assign_pointer(p, v) \
687         ({ \                                      643         ({ \
688                 smp_store_release(&(p), (v));     644                 smp_store_release(&(p), (v)); \
689         })                                        645         })
690                                                   646 
691         #define rcu_dereference(p) \              647         #define rcu_dereference(p) \
692         ({ \                                      648         ({ \
693                 typeof(p) _________p1 = READ_O    649                 typeof(p) _________p1 = READ_ONCE(p); \
694                 (_________p1); \                  650                 (_________p1); \
695         })                                        651         })
696                                                   652 
697                                                   653 
698 The rcu_read_lock() and rcu_read_unlock() prim    654 The rcu_read_lock() and rcu_read_unlock() primitive read-acquire
699 and release a global reader-writer lock.  The     655 and release a global reader-writer lock.  The synchronize_rcu()
700 primitive write-acquires this same lock, then     656 primitive write-acquires this same lock, then releases it.  This means
701 that once synchronize_rcu() exits, all RCU rea    657 that once synchronize_rcu() exits, all RCU read-side critical sections
702 that were in progress before synchronize_rcu()    658 that were in progress before synchronize_rcu() was called are guaranteed
703 to have completed -- there is no way that sync    659 to have completed -- there is no way that synchronize_rcu() would have
704 been able to write-acquire the lock otherwise.    660 been able to write-acquire the lock otherwise.  The smp_mb__after_spinlock()
705 promotes synchronize_rcu() to a full memory ba    661 promotes synchronize_rcu() to a full memory barrier in compliance with
706 the "Memory-Barrier Guarantees" listed in:        662 the "Memory-Barrier Guarantees" listed in:
707                                                   663 
708         Design/Requirements/Requirements.rst   !! 664         Documentation/RCU/Design/Requirements/Requirements.rst
709                                                   665 
710 It is possible to nest rcu_read_lock(), since     666 It is possible to nest rcu_read_lock(), since reader-writer locks may
711 be recursively acquired.  Note also that rcu_r    667 be recursively acquired.  Note also that rcu_read_lock() is immune
712 from deadlock (an important property of RCU).     668 from deadlock (an important property of RCU).  The reason for this is
713 that the only thing that can block rcu_read_lo    669 that the only thing that can block rcu_read_lock() is a synchronize_rcu().
714 But synchronize_rcu() does not acquire any loc    670 But synchronize_rcu() does not acquire any locks while holding rcu_gp_mutex,
715 so there can be no deadlock cycle.                671 so there can be no deadlock cycle.
716                                                   672 
717 .. _quiz_1:                                       673 .. _quiz_1:
718                                                   674 
719 Quick Quiz #1:                                    675 Quick Quiz #1:
720                 Why is this argument naive?  H    676                 Why is this argument naive?  How could a deadlock
721                 occur when using this algorith    677                 occur when using this algorithm in a real-world Linux
722                 kernel?  How could this deadlo    678                 kernel?  How could this deadlock be avoided?
723                                                   679 
724 :ref:`Answers to Quick Quiz <9_whatisRCU>`     !! 680 :ref:`Answers to Quick Quiz <8_whatisRCU>`
725                                                   681 
726 5B.  "TOY" EXAMPLE #2: CLASSIC RCU                682 5B.  "TOY" EXAMPLE #2: CLASSIC RCU
727 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                683 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
728 This section presents a "toy" RCU implementati    684 This section presents a "toy" RCU implementation that is based on
729 "classic RCU".  It is also short on performanc    685 "classic RCU".  It is also short on performance (but only for updates) and
730 on features such as hotplug CPU and the abilit    686 on features such as hotplug CPU and the ability to run in CONFIG_PREEMPTION
731 kernels.  The definitions of rcu_dereference()    687 kernels.  The definitions of rcu_dereference() and rcu_assign_pointer()
732 are the same as those shown in the preceding s    688 are the same as those shown in the preceding section, so they are omitted.
733 ::                                                689 ::
734                                                   690 
735         void rcu_read_lock(void) { }              691         void rcu_read_lock(void) { }
736                                                   692 
737         void rcu_read_unlock(void) { }            693         void rcu_read_unlock(void) { }
738                                                   694 
739         void synchronize_rcu(void)                695         void synchronize_rcu(void)
740         {                                         696         {
741                 int cpu;                          697                 int cpu;
742                                                   698 
743                 for_each_possible_cpu(cpu)        699                 for_each_possible_cpu(cpu)
744                         run_on(cpu);              700                         run_on(cpu);
745         }                                         701         }
746                                                   702 
747 Note that rcu_read_lock() and rcu_read_unlock(    703 Note that rcu_read_lock() and rcu_read_unlock() do absolutely nothing.
748 This is the great strength of classic RCU in a    704 This is the great strength of classic RCU in a non-preemptive kernel:
749 read-side overhead is precisely zero, at least    705 read-side overhead is precisely zero, at least on non-Alpha CPUs.
750 And there is absolutely no way that rcu_read_l    706 And there is absolutely no way that rcu_read_lock() can possibly
751 participate in a deadlock cycle!                  707 participate in a deadlock cycle!
752                                                   708 
753 The implementation of synchronize_rcu() simply    709 The implementation of synchronize_rcu() simply schedules itself on each
754 CPU in turn.  The run_on() primitive can be im    710 CPU in turn.  The run_on() primitive can be implemented straightforwardly
755 in terms of the sched_setaffinity() primitive.    711 in terms of the sched_setaffinity() primitive.  Of course, a somewhat less
756 "toy" implementation would restore the affinit    712 "toy" implementation would restore the affinity upon completion rather
757 than just leaving all tasks running on the las    713 than just leaving all tasks running on the last CPU, but when I said
758 "toy", I meant **toy**!                           714 "toy", I meant **toy**!
759                                                   715 
760 So how the heck is this supposed to work???       716 So how the heck is this supposed to work???
761                                                   717 
762 Remember that it is illegal to block while in     718 Remember that it is illegal to block while in an RCU read-side critical
763 section.  Therefore, if a given CPU executes a    719 section.  Therefore, if a given CPU executes a context switch, we know
764 that it must have completed all preceding RCU     720 that it must have completed all preceding RCU read-side critical sections.
765 Once **all** CPUs have executed a context swit    721 Once **all** CPUs have executed a context switch, then **all** preceding
766 RCU read-side critical sections will have comp    722 RCU read-side critical sections will have completed.
767                                                   723 
768 So, suppose that we remove a data item from it    724 So, suppose that we remove a data item from its structure and then invoke
769 synchronize_rcu().  Once synchronize_rcu() ret    725 synchronize_rcu().  Once synchronize_rcu() returns, we are guaranteed
770 that there are no RCU read-side critical secti    726 that there are no RCU read-side critical sections holding a reference
771 to that data item, so we can safely reclaim it    727 to that data item, so we can safely reclaim it.
772                                                   728 
773 .. _quiz_2:                                       729 .. _quiz_2:
774                                                   730 
775 Quick Quiz #2:                                    731 Quick Quiz #2:
776                 Give an example where Classic     732                 Give an example where Classic RCU's read-side
777                 overhead is **negative**.         733                 overhead is **negative**.
778                                                   734 
779 :ref:`Answers to Quick Quiz <9_whatisRCU>`     !! 735 :ref:`Answers to Quick Quiz <8_whatisRCU>`
780                                                   736 
781 .. _quiz_3:                                       737 .. _quiz_3:
782                                                   738 
783 Quick Quiz #3:                                    739 Quick Quiz #3:
784                 If it is illegal to block in a    740                 If it is illegal to block in an RCU read-side
785                 critical section, what the hec    741                 critical section, what the heck do you do in
786                 CONFIG_PREEMPT_RT, where norma    742                 CONFIG_PREEMPT_RT, where normal spinlocks can block???
787                                                   743 
788 :ref:`Answers to Quick Quiz <9_whatisRCU>`     !! 744 :ref:`Answers to Quick Quiz <8_whatisRCU>`
789                                                   745 
790 .. _6_whatisRCU:                                  746 .. _6_whatisRCU:
791                                                   747 
792 6.  ANALOGY WITH READER-WRITER LOCKING            748 6.  ANALOGY WITH READER-WRITER LOCKING
793 --------------------------------------            749 --------------------------------------
794                                                   750 
795 Although RCU can be used in many different way    751 Although RCU can be used in many different ways, a very common use of
796 RCU is analogous to reader-writer locking.  Th    752 RCU is analogous to reader-writer locking.  The following unified
797 diff shows how closely related RCU and reader-    753 diff shows how closely related RCU and reader-writer locking can be.
798 ::                                                754 ::
799                                                   755 
800         @@ -5,5 +5,5 @@ struct el {               756         @@ -5,5 +5,5 @@ struct el {
801                 int data;                         757                 int data;
802                 /* Other data fields */           758                 /* Other data fields */
803          };                                       759          };
804         -rwlock_t listmutex;                      760         -rwlock_t listmutex;
805         +spinlock_t listmutex;                    761         +spinlock_t listmutex;
806          struct el head;                          762          struct el head;
807                                                   763 
808         @@ -13,15 +14,15 @@                       764         @@ -13,15 +14,15 @@
809                 struct list_head *lp;             765                 struct list_head *lp;
810                 struct el *p;                     766                 struct el *p;
811                                                   767 
812         -       read_lock(&listmutex);            768         -       read_lock(&listmutex);
813         -       list_for_each_entry(p, head, l    769         -       list_for_each_entry(p, head, lp) {
814         +       rcu_read_lock();                  770         +       rcu_read_lock();
815         +       list_for_each_entry_rcu(p, hea    771         +       list_for_each_entry_rcu(p, head, lp) {
816                         if (p->key == key) {      772                         if (p->key == key) {
817                                 *result = p->d    773                                 *result = p->data;
818         -                       read_unlock(&l    774         -                       read_unlock(&listmutex);
819         +                       rcu_read_unloc    775         +                       rcu_read_unlock();
820                                 return 1;         776                                 return 1;
821                         }                         777                         }
822                 }                                 778                 }
823         -       read_unlock(&listmutex);          779         -       read_unlock(&listmutex);
824         +       rcu_read_unlock();                780         +       rcu_read_unlock();
825                 return 0;                         781                 return 0;
826          }                                        782          }
827                                                   783 
828         @@ -29,15 +30,16 @@                       784         @@ -29,15 +30,16 @@
829          {                                        785          {
830                 struct el *p;                     786                 struct el *p;
831                                                   787 
832         -       write_lock(&listmutex);           788         -       write_lock(&listmutex);
833         +       spin_lock(&listmutex);            789         +       spin_lock(&listmutex);
834                 list_for_each_entry(p, head, l    790                 list_for_each_entry(p, head, lp) {
835                         if (p->key == key) {      791                         if (p->key == key) {
836         -                       list_del(&p->l    792         -                       list_del(&p->list);
837         -                       write_unlock(&    793         -                       write_unlock(&listmutex);
838         +                       list_del_rcu(&    794         +                       list_del_rcu(&p->list);
839         +                       spin_unlock(&l    795         +                       spin_unlock(&listmutex);
840         +                       synchronize_rc    796         +                       synchronize_rcu();
841                                 kfree(p);         797                                 kfree(p);
842                                 return 1;         798                                 return 1;
843                         }                         799                         }
844                 }                                 800                 }
845         -       write_unlock(&listmutex);         801         -       write_unlock(&listmutex);
846         +       spin_unlock(&listmutex);          802         +       spin_unlock(&listmutex);
847                 return 0;                         803                 return 0;
848          }                                        804          }
849                                                   805 
850 Or, for those who prefer a side-by-side listin    806 Or, for those who prefer a side-by-side listing::
851                                                   807 
852  1 struct el {                          1 stru    808  1 struct el {                          1 struct el {
853  2   struct list_head list;             2   st    809  2   struct list_head list;             2   struct list_head list;
854  3   long key;                          3   lo    810  3   long key;                          3   long key;
855  4   spinlock_t mutex;                  4   sp    811  4   spinlock_t mutex;                  4   spinlock_t mutex;
856  5   int data;                          5   in    812  5   int data;                          5   int data;
857  6   /* Other data fields */            6   /*    813  6   /* Other data fields */            6   /* Other data fields */
858  7 };                                   7 };      814  7 };                                   7 };
859  8 rwlock_t listmutex;                  8 spin    815  8 rwlock_t listmutex;                  8 spinlock_t listmutex;
860  9 struct el head;                      9 stru    816  9 struct el head;                      9 struct el head;
861                                                   817 
862 ::                                                818 ::
863                                                   819 
864   1 int search(long key, int *result)    1 int    820   1 int search(long key, int *result)    1 int search(long key, int *result)
865   2 {                                    2 {      821   2 {                                    2 {
866   3   struct list_head *lp;              3   s    822   3   struct list_head *lp;              3   struct list_head *lp;
867   4   struct el *p;                      4   s    823   4   struct el *p;                      4   struct el *p;
868   5                                      5        824   5                                      5
869   6   read_lock(&listmutex);             6   r    825   6   read_lock(&listmutex);             6   rcu_read_lock();
870   7   list_for_each_entry(p, head, lp) { 7   l    826   7   list_for_each_entry(p, head, lp) { 7   list_for_each_entry_rcu(p, head, lp) {
871   8     if (p->key == key) {             8        827   8     if (p->key == key) {             8     if (p->key == key) {
872   9       *result = p->data;             9        828   9       *result = p->data;             9       *result = p->data;
873  10       read_unlock(&listmutex);      10        829  10       read_unlock(&listmutex);      10       rcu_read_unlock();
874  11       return 1;                     11        830  11       return 1;                     11       return 1;
875  12     }                               12        831  12     }                               12     }
876  13   }                                 13   }    832  13   }                                 13   }
877  14   read_unlock(&listmutex);          14   r    833  14   read_unlock(&listmutex);          14   rcu_read_unlock();
878  15   return 0;                         15   r    834  15   return 0;                         15   return 0;
879  16 }                                   16 }      835  16 }                                   16 }
880                                                   836 
881 ::                                                837 ::
882                                                   838 
883   1 int delete(long key)                 1 int    839   1 int delete(long key)                 1 int delete(long key)
884   2 {                                    2 {      840   2 {                                    2 {
885   3   struct el *p;                      3   s    841   3   struct el *p;                      3   struct el *p;
886   4                                      4        842   4                                      4
887   5   write_lock(&listmutex);            5   s    843   5   write_lock(&listmutex);            5   spin_lock(&listmutex);
888   6   list_for_each_entry(p, head, lp) { 6   l    844   6   list_for_each_entry(p, head, lp) { 6   list_for_each_entry(p, head, lp) {
889   7     if (p->key == key) {             7        845   7     if (p->key == key) {             7     if (p->key == key) {
890   8       list_del(&p->list);            8        846   8       list_del(&p->list);            8       list_del_rcu(&p->list);
891   9       write_unlock(&listmutex);      9        847   9       write_unlock(&listmutex);      9       spin_unlock(&listmutex);
892                                         10        848                                         10       synchronize_rcu();
893  10       kfree(p);                     11        849  10       kfree(p);                     11       kfree(p);
894  11       return 1;                     12        850  11       return 1;                     12       return 1;
895  12     }                               13        851  12     }                               13     }
896  13   }                                 14   }    852  13   }                                 14   }
897  14   write_unlock(&listmutex);         15   s    853  14   write_unlock(&listmutex);         15   spin_unlock(&listmutex);
898  15   return 0;                         16   r    854  15   return 0;                         16   return 0;
899  16 }                                   17 }      855  16 }                                   17 }
900                                                   856 
901 Either way, the differences are quite small.      857 Either way, the differences are quite small.  Read-side locking moves
902 to rcu_read_lock() and rcu_read_unlock, update    858 to rcu_read_lock() and rcu_read_unlock, update-side locking moves from
903 a reader-writer lock to a simple spinlock, and    859 a reader-writer lock to a simple spinlock, and a synchronize_rcu()
904 precedes the kfree().                             860 precedes the kfree().
905                                                   861 
906 However, there is one potential catch: the rea    862 However, there is one potential catch: the read-side and update-side
907 critical sections can now run concurrently.  I    863 critical sections can now run concurrently.  In many cases, this will
908 not be a problem, but it is necessary to check    864 not be a problem, but it is necessary to check carefully regardless.
909 For example, if multiple independent list upda    865 For example, if multiple independent list updates must be seen as
910 a single atomic update, converting to RCU will    866 a single atomic update, converting to RCU will require special care.
911                                                   867 
912 Also, the presence of synchronize_rcu() means     868 Also, the presence of synchronize_rcu() means that the RCU version of
913 delete() can now block.  If this is a problem,    869 delete() can now block.  If this is a problem, there is a callback-based
914 mechanism that never blocks, namely call_rcu()    870 mechanism that never blocks, namely call_rcu() or kfree_rcu(), that can
915 be used in place of synchronize_rcu().            871 be used in place of synchronize_rcu().
916                                                   872 
917 .. _7_whatisRCU:                                  873 .. _7_whatisRCU:
918                                                   874 
919 7.  ANALOGY WITH REFERENCE COUNTING            !! 875 7.  FULL LIST OF RCU APIs
920 -----------------------------------            << 
921                                                << 
922 The reader-writer analogy (illustrated by the  << 
923 always the best way to think about using RCU.  << 
924 considers RCU an effective reference count on  << 
925 protected by RCU.                              << 
926                                                << 
927 A reference count typically does not prevent t << 
928 values from changing, but does prevent changes << 
929 gross change of type that happens when that ob << 
930 re-allocated for some other purpose.  Once a t << 
931 object is obtained, some other mechanism is ne << 
932 access to the data in the object.  This could  << 
933 but with RCU the typical approach is to perfor << 
934 operations such as smp_load_acquire(), to perf << 
935 read-modify-write operations, and to provide t << 
936 RCU provides a number of support functions tha << 
937 operations and ordering, such as the list_for_ << 
938 used in the previous section.                  << 
939                                                << 
940 A more focused view of the reference counting  << 
941 between rcu_read_lock() and rcu_read_unlock(), << 
942 rcu_dereference() on a pointer marked as ``__r << 
943 though a reference-count on that object has be << 
944 This prevents the object from changing type.   << 
945 will depend on normal expectations of objects  << 
946 typically includes that spinlocks can still be << 
947 reference counters can be safely manipulated,  << 
948 can be safely dereferenced.                    << 
949                                                << 
950 Some operations that one might expect to see o << 
951 which an RCU reference is held include:        << 
952                                                << 
953  - Copying out data that is guaranteed to be s << 
954  - Using kref_get_unless_zero() or similar to  << 
955    reference.  This may fail of course.        << 
956  - Acquiring a spinlock in the object, and che << 
957    is the expected object and if so, manipulat << 
958                                                << 
959 The understanding that RCU provides a referenc << 
960 change of type is particularly visible with ob << 
961 slab cache marked ``SLAB_TYPESAFE_BY_RCU``.  R << 
962 reference to an object from such a cache that  << 
963 and the memory reallocated to a completely dif << 
964 the same type.  In this case RCU doesn't even  << 
965 object from changing, only its type.  So the o << 
966 one expected, but it will be one where it is s << 
967 (and then potentially acquiring a spinlock), a << 
968 to check whether the identity matches expectat << 
969 to simply acquire the spinlock without first t << 
970 unfortunately any spinlock in a ``SLAB_TYPESAF << 
971 initialized after each and every call to kmem_ << 
972 reference-free spinlock acquisition completely << 
973 using ``SLAB_TYPESAFE_BY_RCU``, make proper us << 
974 (Those willing to initialize their locks in a  << 
975 may also use locking, including cache-friendly << 
976                                                << 
977 With traditional reference counting -- such as << 
978 kref library in Linux -- there is typically co << 
979 reference to an object is dropped.  With kref, << 
980 passed to kref_put().  When RCU is being used, << 
981 must not be run until all ``__rcu`` pointers r << 
982 been updated, and then a grace period has pass << 
983 globally visible pointer to the object must be << 
984 potential counted reference, and the finalizat << 
985 using call_rcu() only after all those pointers << 
986                                                << 
987 To see how to choose between these two analogi << 
988 reader-writer lock and RCU as a reference coun << 
989 to reflect on the scale of the thing being pro << 
990 lock analogy looks at larger multi-part object << 
991 and shows how RCU can facilitate concurrency w << 
992 to, and removed from, the list.  The reference << 
993 the individual objects and looks at how they c << 
994 within whatever whole they are a part of.      << 
995                                                << 
996 .. _8_whatisRCU:                               << 
997                                                << 
998 8.  FULL LIST OF RCU APIs                      << 
999 -------------------------                         876 -------------------------
1000                                                  877 
1001 The RCU APIs are documented in docbook-format    878 The RCU APIs are documented in docbook-format header comments in the
1002 Linux-kernel source code, but it helps to hav    879 Linux-kernel source code, but it helps to have a full list of the
1003 APIs, since there does not appear to be a way    880 APIs, since there does not appear to be a way to categorize them
1004 in docbook.  Here is the list, by category.      881 in docbook.  Here is the list, by category.
1005                                                  882 
1006 RCU list traversal::                             883 RCU list traversal::
1007                                                  884 
1008         list_entry_rcu                           885         list_entry_rcu
1009         list_entry_lockless                      886         list_entry_lockless
1010         list_first_entry_rcu                     887         list_first_entry_rcu
1011         list_next_rcu                            888         list_next_rcu
1012         list_for_each_entry_rcu                  889         list_for_each_entry_rcu
1013         list_for_each_entry_continue_rcu         890         list_for_each_entry_continue_rcu
1014         list_for_each_entry_from_rcu             891         list_for_each_entry_from_rcu
1015         list_first_or_null_rcu                   892         list_first_or_null_rcu
1016         list_next_or_null_rcu                    893         list_next_or_null_rcu
1017         hlist_first_rcu                          894         hlist_first_rcu
1018         hlist_next_rcu                           895         hlist_next_rcu
1019         hlist_pprev_rcu                          896         hlist_pprev_rcu
1020         hlist_for_each_entry_rcu                 897         hlist_for_each_entry_rcu
1021         hlist_for_each_entry_rcu_bh              898         hlist_for_each_entry_rcu_bh
1022         hlist_for_each_entry_from_rcu            899         hlist_for_each_entry_from_rcu
1023         hlist_for_each_entry_continue_rcu        900         hlist_for_each_entry_continue_rcu
1024         hlist_for_each_entry_continue_rcu_bh     901         hlist_for_each_entry_continue_rcu_bh
1025         hlist_nulls_first_rcu                    902         hlist_nulls_first_rcu
1026         hlist_nulls_for_each_entry_rcu           903         hlist_nulls_for_each_entry_rcu
1027         hlist_bl_first_rcu                       904         hlist_bl_first_rcu
1028         hlist_bl_for_each_entry_rcu              905         hlist_bl_for_each_entry_rcu
1029                                                  906 
1030 RCU pointer/list update::                        907 RCU pointer/list update::
1031                                                  908 
1032         rcu_assign_pointer                       909         rcu_assign_pointer
1033         list_add_rcu                             910         list_add_rcu
1034         list_add_tail_rcu                        911         list_add_tail_rcu
1035         list_del_rcu                             912         list_del_rcu
1036         list_replace_rcu                         913         list_replace_rcu
1037         hlist_add_behind_rcu                     914         hlist_add_behind_rcu
1038         hlist_add_before_rcu                     915         hlist_add_before_rcu
1039         hlist_add_head_rcu                       916         hlist_add_head_rcu
1040         hlist_add_tail_rcu                       917         hlist_add_tail_rcu
1041         hlist_del_rcu                            918         hlist_del_rcu
1042         hlist_del_init_rcu                       919         hlist_del_init_rcu
1043         hlist_replace_rcu                        920         hlist_replace_rcu
1044         list_splice_init_rcu                     921         list_splice_init_rcu
1045         list_splice_tail_init_rcu                922         list_splice_tail_init_rcu
1046         hlist_nulls_del_init_rcu                 923         hlist_nulls_del_init_rcu
1047         hlist_nulls_del_rcu                      924         hlist_nulls_del_rcu
1048         hlist_nulls_add_head_rcu                 925         hlist_nulls_add_head_rcu
1049         hlist_bl_add_head_rcu                    926         hlist_bl_add_head_rcu
1050         hlist_bl_del_init_rcu                    927         hlist_bl_del_init_rcu
1051         hlist_bl_del_rcu                         928         hlist_bl_del_rcu
1052         hlist_bl_set_first_rcu                   929         hlist_bl_set_first_rcu
1053                                                  930 
1054 RCU::                                            931 RCU::
1055                                                  932 
1056         Critical sections       Grace period     933         Critical sections       Grace period            Barrier
1057                                                  934 
1058         rcu_read_lock           synchronize_n    935         rcu_read_lock           synchronize_net         rcu_barrier
1059         rcu_read_unlock         synchronize_r    936         rcu_read_unlock         synchronize_rcu
1060         rcu_dereference         synchronize_r    937         rcu_dereference         synchronize_rcu_expedited
1061         rcu_read_lock_held      call_rcu         938         rcu_read_lock_held      call_rcu
1062         rcu_dereference_check   kfree_rcu        939         rcu_dereference_check   kfree_rcu
1063         rcu_dereference_protected                940         rcu_dereference_protected
1064                                                  941 
1065 bh::                                             942 bh::
1066                                                  943 
1067         Critical sections       Grace period     944         Critical sections       Grace period            Barrier
1068                                                  945 
1069         rcu_read_lock_bh        call_rcu         946         rcu_read_lock_bh        call_rcu                rcu_barrier
1070         rcu_read_unlock_bh      synchronize_r    947         rcu_read_unlock_bh      synchronize_rcu
1071         [local_bh_disable]      synchronize_r    948         [local_bh_disable]      synchronize_rcu_expedited
1072         [and friends]                            949         [and friends]
1073         rcu_dereference_bh                       950         rcu_dereference_bh
1074         rcu_dereference_bh_check                 951         rcu_dereference_bh_check
1075         rcu_dereference_bh_protected             952         rcu_dereference_bh_protected
1076         rcu_read_lock_bh_held                    953         rcu_read_lock_bh_held
1077                                                  954 
1078 sched::                                          955 sched::
1079                                                  956 
1080         Critical sections       Grace period     957         Critical sections       Grace period            Barrier
1081                                                  958 
1082         rcu_read_lock_sched     call_rcu         959         rcu_read_lock_sched     call_rcu                rcu_barrier
1083         rcu_read_unlock_sched   synchronize_r    960         rcu_read_unlock_sched   synchronize_rcu
1084         [preempt_disable]       synchronize_r    961         [preempt_disable]       synchronize_rcu_expedited
1085         [and friends]                            962         [and friends]
1086         rcu_read_lock_sched_notrace              963         rcu_read_lock_sched_notrace
1087         rcu_read_unlock_sched_notrace            964         rcu_read_unlock_sched_notrace
1088         rcu_dereference_sched                    965         rcu_dereference_sched
1089         rcu_dereference_sched_check              966         rcu_dereference_sched_check
1090         rcu_dereference_sched_protected          967         rcu_dereference_sched_protected
1091         rcu_read_lock_sched_held                 968         rcu_read_lock_sched_held
1092                                                  969 
1093                                                  970 
1094 RCU-Tasks::                                   << 
1095                                               << 
1096         Critical sections       Grace period  << 
1097                                               << 
1098         N/A                     call_rcu_task << 
1099                                 synchronize_r << 
1100                                               << 
1101                                               << 
1102 RCU-Tasks-Rude::                              << 
1103                                               << 
1104         Critical sections       Grace period  << 
1105                                               << 
1106         N/A                     call_rcu_task << 
1107                                 synchronize_r << 
1108                                               << 
1109                                               << 
1110 RCU-Tasks-Trace::                             << 
1111                                               << 
1112         Critical sections       Grace period  << 
1113                                               << 
1114         rcu_read_lock_trace     call_rcu_task << 
1115         rcu_read_unlock_trace   synchronize_r << 
1116                                               << 
1117                                               << 
1118 SRCU::                                           971 SRCU::
1119                                                  972 
1120         Critical sections       Grace period     973         Critical sections       Grace period            Barrier
1121                                                  974 
1122         srcu_read_lock          call_srcu        975         srcu_read_lock          call_srcu               srcu_barrier
1123         srcu_read_unlock        synchronize_s    976         srcu_read_unlock        synchronize_srcu
1124         srcu_dereference        synchronize_s    977         srcu_dereference        synchronize_srcu_expedited
1125         srcu_dereference_check                   978         srcu_dereference_check
1126         srcu_read_lock_held                      979         srcu_read_lock_held
1127                                                  980 
1128 SRCU: Initialization/cleanup::                   981 SRCU: Initialization/cleanup::
1129                                                  982 
1130         DEFINE_SRCU                              983         DEFINE_SRCU
1131         DEFINE_STATIC_SRCU                       984         DEFINE_STATIC_SRCU
1132         init_srcu_struct                         985         init_srcu_struct
1133         cleanup_srcu_struct                      986         cleanup_srcu_struct
1134                                                  987 
1135 All: lockdep-checked RCU utility APIs::       !! 988 All: lockdep-checked RCU-protected pointer access::
1136                                                  989 
                                                   >> 990         rcu_access_pointer
                                                   >> 991         rcu_dereference_raw
1137         RCU_LOCKDEP_WARN                         992         RCU_LOCKDEP_WARN
1138         rcu_sleep_check                          993         rcu_sleep_check
1139                                               !! 994         RCU_NONIDLE
1140 All: Unchecked RCU-protected pointer access:: << 
1141                                               << 
1142         rcu_dereference_raw                   << 
1143                                               << 
1144 All: Unchecked RCU-protected pointer access w << 
1145                                               << 
1146         rcu_access_pointer                    << 
1147                                                  995 
1148 See the comment headers in the source code (o    996 See the comment headers in the source code (or the docbook generated
1149 from them) for more information.                 997 from them) for more information.
1150                                                  998 
1151 However, given that there are no fewer than f    999 However, given that there are no fewer than four families of RCU APIs
1152 in the Linux kernel, how do you choose which     1000 in the Linux kernel, how do you choose which one to use?  The following
1153 list can be helpful:                             1001 list can be helpful:
1154                                                  1002 
1155 a.      Will readers need to block?  If so, y    1003 a.      Will readers need to block?  If so, you need SRCU.
1156                                                  1004 
1157 b.      Will readers need to block and are yo !! 1005 b.      What about the -rt patchset?  If readers would need to block
1158         example, ftrace or BPF?  If so, you n !! 1006         in an non-rt kernel, you need SRCU.  If readers would block
1159         RCU-tasks-rude, and/or RCU-tasks-trac !! 1007         in a -rt kernel, but not in a non-rt kernel, SRCU is not
1160                                               !! 1008         necessary.  (The -rt patchset turns spinlocks into sleeplocks,
1161 c.      What about the -rt patchset?  If read !! 1009         hence this distinction.)
1162         an non-rt kernel, you need SRCU.  If  << 
1163         acquiring spinlocks in a -rt kernel,  << 
1164         SRCU is not necessary.  (The -rt patc << 
1165         sleeplocks, hence this distinction.)  << 
1166                                                  1010 
1167 d.      Do you need to treat NMI handlers, ha !! 1011 c.      Do you need to treat NMI handlers, hardirq handlers,
1168         and code segments with preemption dis    1012         and code segments with preemption disabled (whether
1169         via preempt_disable(), local_irq_save    1013         via preempt_disable(), local_irq_save(), local_bh_disable(),
1170         or some other mechanism) as if they w    1014         or some other mechanism) as if they were explicit RCU readers?
1171         If so, RCU-sched readers are the only !! 1015         If so, RCU-sched is the only choice that will work for you.
1172         for you, but since about v4.20 you us << 
1173         update primitives.                    << 
1174                                               << 
1175 e.      Do you need RCU grace periods to comp << 
1176         softirq monopolization of one or more << 
1177         is your code subject to network-based << 
1178         If so, you should disable softirq acr << 
1179         example, by using rcu_read_lock_bh(). << 
1180         use can use the vanilla RCU update pr << 
1181                                                  1016 
1182 f.      Is your workload too update-intensive !! 1017 d.      Do you need RCU grace periods to complete even in the face
                                                   >> 1018         of softirq monopolization of one or more of the CPUs?  For
                                                   >> 1019         example, is your code subject to network-based denial-of-service
                                                   >> 1020         attacks?  If so, you should disable softirq across your readers,
                                                   >> 1021         for example, by using rcu_read_lock_bh().
                                                   >> 1022 
                                                   >> 1023 e.      Is your workload too update-intensive for normal use of
1183         RCU, but inappropriate for other sync    1024         RCU, but inappropriate for other synchronization mechanisms?
1184         If so, consider SLAB_TYPESAFE_BY_RCU     1025         If so, consider SLAB_TYPESAFE_BY_RCU (which was originally
1185         named SLAB_DESTROY_BY_RCU).  But plea    1026         named SLAB_DESTROY_BY_RCU).  But please be careful!
1186                                                  1027 
1187 g.      Do you need read-side critical sectio !! 1028 f.      Do you need read-side critical sections that are respected
1188         on CPUs that are deep in the idle loo !! 1029         even though they are in the middle of the idle loop, during
1189         from user-mode execution, or on an of !! 1030         user-mode execution, or on an offlined CPU?  If so, SRCU is the
1190         and RCU Tasks Trace are the only choi !! 1031         only choice that will work for you.
1191         with SRCU being strongly preferred in << 
1192                                                  1032 
1193 h.      Otherwise, use RCU.                   !! 1033 g.      Otherwise, use RCU.
1194                                                  1034 
1195 Of course, this all assumes that you have det    1035 Of course, this all assumes that you have determined that RCU is in fact
1196 the right tool for your job.                     1036 the right tool for your job.
1197                                                  1037 
1198 .. _9_whatisRCU:                              !! 1038 .. _8_whatisRCU:
1199                                                  1039 
1200 9.  ANSWERS TO QUICK QUIZZES                  !! 1040 8.  ANSWERS TO QUICK QUIZZES
1201 ----------------------------                     1041 ----------------------------
1202                                                  1042 
1203 Quick Quiz #1:                                   1043 Quick Quiz #1:
1204                 Why is this argument naive?      1044                 Why is this argument naive?  How could a deadlock
1205                 occur when using this algorit    1045                 occur when using this algorithm in a real-world Linux
1206                 kernel?  [Referring to the lo    1046                 kernel?  [Referring to the lock-based "toy" RCU
1207                 algorithm.]                      1047                 algorithm.]
1208                                                  1048 
1209 Answer:                                          1049 Answer:
1210                 Consider the following sequen    1050                 Consider the following sequence of events:
1211                                                  1051 
1212                 1.      CPU 0 acquires some u    1052                 1.      CPU 0 acquires some unrelated lock, call it
1213                         "problematic_lock", d    1053                         "problematic_lock", disabling irq via
1214                         spin_lock_irqsave().     1054                         spin_lock_irqsave().
1215                                                  1055 
1216                 2.      CPU 1 enters synchron    1056                 2.      CPU 1 enters synchronize_rcu(), write-acquiring
1217                         rcu_gp_mutex.            1057                         rcu_gp_mutex.
1218                                                  1058 
1219                 3.      CPU 0 enters rcu_read    1059                 3.      CPU 0 enters rcu_read_lock(), but must wait
1220                         because CPU 1 holds r    1060                         because CPU 1 holds rcu_gp_mutex.
1221                                                  1061 
1222                 4.      CPU 1 is interrupted,    1062                 4.      CPU 1 is interrupted, and the irq handler
1223                         attempts to acquire p    1063                         attempts to acquire problematic_lock.
1224                                                  1064 
1225                 The system is now deadlocked.    1065                 The system is now deadlocked.
1226                                                  1066 
1227                 One way to avoid this deadloc    1067                 One way to avoid this deadlock is to use an approach like
1228                 that of CONFIG_PREEMPT_RT, wh    1068                 that of CONFIG_PREEMPT_RT, where all normal spinlocks
1229                 become blocking locks, and al    1069                 become blocking locks, and all irq handlers execute in
1230                 the context of special tasks.    1070                 the context of special tasks.  In this case, in step 4
1231                 above, the irq handler would     1071                 above, the irq handler would block, allowing CPU 1 to
1232                 release rcu_gp_mutex, avoidin    1072                 release rcu_gp_mutex, avoiding the deadlock.
1233                                                  1073 
1234                 Even in the absence of deadlo    1074                 Even in the absence of deadlock, this RCU implementation
1235                 allows latency to "bleed" fro    1075                 allows latency to "bleed" from readers to other
1236                 readers through synchronize_r    1076                 readers through synchronize_rcu().  To see this,
1237                 consider task A in an RCU rea    1077                 consider task A in an RCU read-side critical section
1238                 (thus read-holding rcu_gp_mut    1078                 (thus read-holding rcu_gp_mutex), task B blocked
1239                 attempting to write-acquire r    1079                 attempting to write-acquire rcu_gp_mutex, and
1240                 task C blocked in rcu_read_lo    1080                 task C blocked in rcu_read_lock() attempting to
1241                 read_acquire rcu_gp_mutex.  T    1081                 read_acquire rcu_gp_mutex.  Task A's RCU read-side
1242                 latency is holding up task C,    1082                 latency is holding up task C, albeit indirectly via
1243                 task B.                          1083                 task B.
1244                                                  1084 
1245                 Realtime RCU implementations     1085                 Realtime RCU implementations therefore use a counter-based
1246                 approach where tasks in RCU r    1086                 approach where tasks in RCU read-side critical sections
1247                 cannot be blocked by tasks ex    1087                 cannot be blocked by tasks executing synchronize_rcu().
1248                                                  1088 
1249 :ref:`Back to Quick Quiz #1 <quiz_1>`            1089 :ref:`Back to Quick Quiz #1 <quiz_1>`
1250                                                  1090 
1251 Quick Quiz #2:                                   1091 Quick Quiz #2:
1252                 Give an example where Classic    1092                 Give an example where Classic RCU's read-side
1253                 overhead is **negative**.        1093                 overhead is **negative**.
1254                                                  1094 
1255 Answer:                                          1095 Answer:
1256                 Imagine a single-CPU system w    1096                 Imagine a single-CPU system with a non-CONFIG_PREEMPTION
1257                 kernel where a routing table     1097                 kernel where a routing table is used by process-context
1258                 code, but can be updated by i    1098                 code, but can be updated by irq-context code (for example,
1259                 by an "ICMP REDIRECT" packet)    1099                 by an "ICMP REDIRECT" packet).  The usual way of handling
1260                 this would be to have the pro    1100                 this would be to have the process-context code disable
1261                 interrupts while searching th    1101                 interrupts while searching the routing table.  Use of
1262                 RCU allows such interrupt-dis    1102                 RCU allows such interrupt-disabling to be dispensed with.
1263                 Thus, without RCU, you pay th    1103                 Thus, without RCU, you pay the cost of disabling interrupts,
1264                 and with RCU you don't.          1104                 and with RCU you don't.
1265                                                  1105 
1266                 One can argue that the overhe    1106                 One can argue that the overhead of RCU in this
1267                 case is negative with respect    1107                 case is negative with respect to the single-CPU
1268                 interrupt-disabling approach.    1108                 interrupt-disabling approach.  Others might argue that
1269                 the overhead of RCU is merely    1109                 the overhead of RCU is merely zero, and that replacing
1270                 the positive overhead of the     1110                 the positive overhead of the interrupt-disabling scheme
1271                 with the zero-overhead RCU sc    1111                 with the zero-overhead RCU scheme does not constitute
1272                 negative overhead.               1112                 negative overhead.
1273                                                  1113 
1274                 In real life, of course, thin    1114                 In real life, of course, things are more complex.  But
1275                 even the theoretical possibil    1115                 even the theoretical possibility of negative overhead for
1276                 a synchronization primitive i    1116                 a synchronization primitive is a bit unexpected.  ;-)
1277                                                  1117 
1278 :ref:`Back to Quick Quiz #2 <quiz_2>`            1118 :ref:`Back to Quick Quiz #2 <quiz_2>`
1279                                                  1119 
1280 Quick Quiz #3:                                   1120 Quick Quiz #3:
1281                 If it is illegal to block in     1121                 If it is illegal to block in an RCU read-side
1282                 critical section, what the he    1122                 critical section, what the heck do you do in
1283                 CONFIG_PREEMPT_RT, where norm    1123                 CONFIG_PREEMPT_RT, where normal spinlocks can block???
1284                                                  1124 
1285 Answer:                                          1125 Answer:
1286                 Just as CONFIG_PREEMPT_RT per    1126                 Just as CONFIG_PREEMPT_RT permits preemption of spinlock
1287                 critical sections, it permits    1127                 critical sections, it permits preemption of RCU
1288                 read-side critical sections.     1128                 read-side critical sections.  It also permits
1289                 spinlocks blocking while in R    1129                 spinlocks blocking while in RCU read-side critical
1290                 sections.                        1130                 sections.
1291                                                  1131 
1292                 Why the apparent inconsistenc    1132                 Why the apparent inconsistency?  Because it is
1293                 possible to use priority boos    1133                 possible to use priority boosting to keep the RCU
1294                 grace periods short if need b    1134                 grace periods short if need be (for example, if running
1295                 short of memory).  In contras    1135                 short of memory).  In contrast, if blocking waiting
1296                 for (say) network reception,     1136                 for (say) network reception, there is no way to know
1297                 what should be boosted.  Espe    1137                 what should be boosted.  Especially given that the
1298                 process we need to boost migh    1138                 process we need to boost might well be a human being
1299                 who just went out for a pizza    1139                 who just went out for a pizza or something.  And although
1300                 a computer-operated cattle pr    1140                 a computer-operated cattle prod might arouse serious
1301                 interest, it might also provo    1141                 interest, it might also provoke serious objections.
1302                 Besides, how does the compute    1142                 Besides, how does the computer know what pizza parlor
1303                 the human being went to???       1143                 the human being went to???
1304                                                  1144 
1305 :ref:`Back to Quick Quiz #3 <quiz_3>`            1145 :ref:`Back to Quick Quiz #3 <quiz_3>`
1306                                                  1146 
1307 ACKNOWLEDGEMENTS                                 1147 ACKNOWLEDGEMENTS
1308                                                  1148 
1309 My thanks to the people who helped make this     1149 My thanks to the people who helped make this human-readable, including
1310 Jon Walpole, Josh Triplett, Serge Hallyn, Suz    1150 Jon Walpole, Josh Triplett, Serge Hallyn, Suzanne Wood, and Alan Stern.
1311                                                  1151 
1312                                                  1152 
1313 For more information, see http://www.rdrop.co    1153 For more information, see http://www.rdrop.com/users/paulmck/RCU.
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php