1 .. _rcu_barrier: 1 .. _rcu_barrier: 2 2 3 RCU and Unloadable Modules 3 RCU and Unloadable Modules 4 ========================== 4 ========================== 5 5 6 [Originally published in LWN Jan. 14, 2007: ht 6 [Originally published in LWN Jan. 14, 2007: http://lwn.net/Articles/217484/] 7 7 8 RCU updaters sometimes use call_rcu() to initi !! 8 RCU (read-copy update) is a synchronization mechanism that can be thought 9 a grace period to elapse. This primitive take !! 9 of as a replacement for read-writer locking (among other things), but with 10 struct placed within the RCU-protected data st !! 10 very low-overhead readers that are immune to deadlock, priority inversion, 11 to a function that may be invoked later to fre !! 11 and unbounded latency. RCU read-side critical sections are delimited 12 delete an element p from the linked list from !! 12 by rcu_read_lock() and rcu_read_unlock(), which, in non-CONFIG_PREEMPTION 13 as follows:: !! 13 kernels, generate no code whatsoever. >> 14 >> 15 This means that RCU writers are unaware of the presence of concurrent >> 16 readers, so that RCU updates to shared data must be undertaken quite >> 17 carefully, leaving an old version of the data structure in place until all >> 18 pre-existing readers have finished. These old versions are needed because >> 19 such readers might hold a reference to them. RCU updates can therefore be >> 20 rather expensive, and RCU is thus best suited for read-mostly situations. >> 21 >> 22 How can an RCU writer possibly determine when all readers are finished, >> 23 given that readers might well leave absolutely no trace of their >> 24 presence? There is a synchronize_rcu() primitive that blocks until all >> 25 pre-existing readers have completed. An updater wishing to delete an >> 26 element p from a linked list might do the following, while holding an >> 27 appropriate lock, of course:: >> 28 >> 29 list_del_rcu(p); >> 30 synchronize_rcu(); >> 31 kfree(p); >> 32 >> 33 But the above code cannot be used in IRQ context -- the call_rcu() >> 34 primitive must be used instead. This primitive takes a pointer to an >> 35 rcu_head struct placed within the RCU-protected data structure and >> 36 another pointer to a function that may be invoked later to free that >> 37 structure. Code to delete an element p from the linked list from IRQ >> 38 context might then be as follows:: 14 39 15 list_del_rcu(p); 40 list_del_rcu(p); 16 call_rcu(&p->rcu, p_callback); 41 call_rcu(&p->rcu, p_callback); 17 42 18 Since call_rcu() never blocks, this code can s 43 Since call_rcu() never blocks, this code can safely be used from within 19 IRQ context. The function p_callback() might b 44 IRQ context. The function p_callback() might be defined as follows:: 20 45 21 static void p_callback(struct rcu_head 46 static void p_callback(struct rcu_head *rp) 22 { 47 { 23 struct pstruct *p = container_ 48 struct pstruct *p = container_of(rp, struct pstruct, rcu); 24 49 25 kfree(p); 50 kfree(p); 26 } 51 } 27 52 28 53 29 Unloading Modules That Use call_rcu() 54 Unloading Modules That Use call_rcu() 30 ------------------------------------- 55 ------------------------------------- 31 56 32 But what if the p_callback() function is defin !! 57 But what if p_callback is defined in an unloadable module? 33 58 34 If we unload the module while some RCU callbac 59 If we unload the module while some RCU callbacks are pending, 35 the CPUs executing these callbacks are going t 60 the CPUs executing these callbacks are going to be severely 36 disappointed when they are later invoked, as f 61 disappointed when they are later invoked, as fancifully depicted at 37 http://lwn.net/images/ns/kernel/rcu-drop.jpg. 62 http://lwn.net/images/ns/kernel/rcu-drop.jpg. 38 63 39 We could try placing a synchronize_rcu() in th 64 We could try placing a synchronize_rcu() in the module-exit code path, 40 but this is not sufficient. Although synchroni 65 but this is not sufficient. Although synchronize_rcu() does wait for a 41 grace period to elapse, it does not wait for t 66 grace period to elapse, it does not wait for the callbacks to complete. 42 67 43 One might be tempted to try several back-to-ba 68 One might be tempted to try several back-to-back synchronize_rcu() 44 calls, but this is still not guaranteed to wor 69 calls, but this is still not guaranteed to work. If there is a very 45 heavy RCU-callback load, then some of the call !! 70 heavy RCU-callback load, then some of the callbacks might be deferred 46 order to allow other processing to proceed. Fo !! 71 in order to allow other processing to proceed. Such deferral is required 47 deferral is required in realtime kernels in or !! 72 in realtime kernels in order to avoid excessive scheduling latencies. 48 scheduling latencies. << 49 73 50 74 51 rcu_barrier() 75 rcu_barrier() 52 ------------- 76 ------------- 53 77 54 This situation can be handled by the rcu_barri !! 78 We instead need the rcu_barrier() primitive. Rather than waiting for 55 than waiting for a grace period to elapse, rcu !! 79 a grace period to elapse, rcu_barrier() waits for all outstanding RCU 56 outstanding RCU callbacks to complete. Please !! 80 callbacks to complete. Please note that rcu_barrier() does **not** imply 57 does **not** imply synchronize_rcu(), in parti !! 81 synchronize_rcu(), in particular, if there are no RCU callbacks queued 58 callbacks queued anywhere, rcu_barrier() is wi !! 82 anywhere, rcu_barrier() is within its rights to return immediately, 59 immediately, without waiting for anything, let !! 83 without waiting for a grace period to elapse. 60 84 61 Pseudo-code using rcu_barrier() is as follows: 85 Pseudo-code using rcu_barrier() is as follows: 62 86 63 1. Prevent any new RCU callbacks from being 87 1. Prevent any new RCU callbacks from being posted. 64 2. Execute rcu_barrier(). 88 2. Execute rcu_barrier(). 65 3. Allow the module to be unloaded. 89 3. Allow the module to be unloaded. 66 90 67 There is also an srcu_barrier() function for S 91 There is also an srcu_barrier() function for SRCU, and you of course 68 must match the flavor of srcu_barrier() with t !! 92 must match the flavor of rcu_barrier() with that of call_rcu(). If your 69 If your module uses multiple srcu_struct struc !! 93 module uses multiple flavors of call_rcu(), then it must also use multiple 70 use multiple invocations of srcu_barrier() whe !! 94 flavors of rcu_barrier() when unloading that module. For example, if 71 For example, if it uses call_rcu(), call_srcu( !! 95 it uses call_rcu(), call_srcu() on srcu_struct_1, and call_srcu() on 72 call_srcu() on srcu_struct_2, then the followi !! 96 srcu_struct_2, then the following three lines of code will be required 73 will be required when unloading:: !! 97 when unloading:: 74 !! 98 75 1 rcu_barrier(); !! 99 1 rcu_barrier(); 76 2 srcu_barrier(&srcu_struct_1); !! 100 2 srcu_barrier(&srcu_struct_1); 77 3 srcu_barrier(&srcu_struct_2); !! 101 3 srcu_barrier(&srcu_struct_2); 78 !! 102 79 If latency is of the essence, workqueues could !! 103 The rcutorture module makes use of rcu_barrier() in its exit function 80 three functions concurrently. !! 104 as follows:: 81 !! 105 82 An ancient version of the rcutorture module ma !! 106 1 static void 83 in its exit function as follows:: !! 107 2 rcu_torture_cleanup(void) 84 !! 108 3 { 85 1 static void !! 109 4 int i; 86 2 rcu_torture_cleanup(void) !! 110 5 87 3 { !! 111 6 fullstop = 1; 88 4 int i; !! 112 7 if (shuffler_task != NULL) { 89 5 !! 113 8 VERBOSE_PRINTK_STRING("Stopping rcu_torture_shuffle task"); 90 6 fullstop = 1; !! 114 9 kthread_stop(shuffler_task); 91 7 if (shuffler_task != NULL) { !! 115 10 } 92 8 VERBOSE_PRINTK_STRING("Stopping rcu_t !! 116 11 shuffler_task = NULL; 93 9 kthread_stop(shuffler_task); << 94 10 } << 95 11 shuffler_task = NULL; << 96 12 117 12 97 13 if (writer_task != NULL) { !! 118 13 if (writer_task != NULL) { 98 14 VERBOSE_PRINTK_STRING("Stopping rcu_t !! 119 14 VERBOSE_PRINTK_STRING("Stopping rcu_torture_writer task"); 99 15 kthread_stop(writer_task); !! 120 15 kthread_stop(writer_task); 100 16 } !! 121 16 } 101 17 writer_task = NULL; !! 122 17 writer_task = NULL; 102 18 123 18 103 19 if (reader_tasks != NULL) { !! 124 19 if (reader_tasks != NULL) { 104 20 for (i = 0; i < nrealreaders; i++) { !! 125 20 for (i = 0; i < nrealreaders; i++) { 105 21 if (reader_tasks[i] != NULL) { !! 126 21 if (reader_tasks[i] != NULL) { 106 22 VERBOSE_PRINTK_STRING( !! 127 22 VERBOSE_PRINTK_STRING( 107 23 "Stopping rcu_torture_reader ta !! 128 23 "Stopping rcu_torture_reader task"); 108 24 kthread_stop(reader_tasks[i]); !! 129 24 kthread_stop(reader_tasks[i]); 109 25 } !! 130 25 } 110 26 reader_tasks[i] = NULL; !! 131 26 reader_tasks[i] = NULL; 111 27 } !! 132 27 } 112 28 kfree(reader_tasks); !! 133 28 kfree(reader_tasks); 113 29 reader_tasks = NULL; !! 134 29 reader_tasks = NULL; 114 30 } !! 135 30 } 115 31 rcu_torture_current = NULL; !! 136 31 rcu_torture_current = NULL; 116 32 137 32 117 33 if (fakewriter_tasks != NULL) { !! 138 33 if (fakewriter_tasks != NULL) { 118 34 for (i = 0; i < nfakewriters; i++) { !! 139 34 for (i = 0; i < nfakewriters; i++) { 119 35 if (fakewriter_tasks[i] != NULL) { !! 140 35 if (fakewriter_tasks[i] != NULL) { 120 36 VERBOSE_PRINTK_STRING( !! 141 36 VERBOSE_PRINTK_STRING( 121 37 "Stopping rcu_torture_fakewrite !! 142 37 "Stopping rcu_torture_fakewriter task"); 122 38 kthread_stop(fakewriter_tasks[i]) !! 143 38 kthread_stop(fakewriter_tasks[i]); 123 39 } !! 144 39 } 124 40 fakewriter_tasks[i] = NULL; !! 145 40 fakewriter_tasks[i] = NULL; 125 41 } !! 146 41 } 126 42 kfree(fakewriter_tasks); !! 147 42 kfree(fakewriter_tasks); 127 43 fakewriter_tasks = NULL; !! 148 43 fakewriter_tasks = NULL; 128 44 } !! 149 44 } 129 45 150 45 130 46 if (stats_task != NULL) { !! 151 46 if (stats_task != NULL) { 131 47 VERBOSE_PRINTK_STRING("Stopping rcu_t !! 152 47 VERBOSE_PRINTK_STRING("Stopping rcu_torture_stats task"); 132 48 kthread_stop(stats_task); !! 153 48 kthread_stop(stats_task); 133 49 } !! 154 49 } 134 50 stats_task = NULL; !! 155 50 stats_task = NULL; 135 51 156 51 136 52 /* Wait for all RCU callbacks to fire. !! 157 52 /* Wait for all RCU callbacks to fire. */ 137 53 rcu_barrier(); !! 158 53 rcu_barrier(); 138 54 159 54 139 55 rcu_torture_stats_print(); /* -After- t !! 160 55 rcu_torture_stats_print(); /* -After- the stats thread is stopped! */ 140 56 161 56 141 57 if (cur_ops->cleanup != NULL) !! 162 57 if (cur_ops->cleanup != NULL) 142 58 cur_ops->cleanup(); !! 163 58 cur_ops->cleanup(); 143 59 if (atomic_read(&n_rcu_torture_error)) !! 164 59 if (atomic_read(&n_rcu_torture_error)) 144 60 rcu_torture_print_module_parms("End o !! 165 60 rcu_torture_print_module_parms("End of test: FAILURE"); 145 61 else !! 166 61 else 146 62 rcu_torture_print_module_parms("End o !! 167 62 rcu_torture_print_module_parms("End of test: SUCCESS"); 147 63 } !! 168 63 } 148 169 149 Line 6 sets a global variable that prevents an 170 Line 6 sets a global variable that prevents any RCU callbacks from 150 re-posting themselves. This will not be necess 171 re-posting themselves. This will not be necessary in most cases, since 151 RCU callbacks rarely include calls to call_rcu 172 RCU callbacks rarely include calls to call_rcu(). However, the rcutorture 152 module is an exception to this rule, and there 173 module is an exception to this rule, and therefore needs to set this 153 global variable. 174 global variable. 154 175 155 Lines 7-50 stop all the kernel tasks associate 176 Lines 7-50 stop all the kernel tasks associated with the rcutorture 156 module. Therefore, once execution reaches line 177 module. Therefore, once execution reaches line 53, no more rcutorture 157 RCU callbacks will be posted. The rcu_barrier( 178 RCU callbacks will be posted. The rcu_barrier() call on line 53 waits 158 for any pre-existing callbacks to complete. 179 for any pre-existing callbacks to complete. 159 180 160 Then lines 55-62 print status and do operation 181 Then lines 55-62 print status and do operation-specific cleanup, and 161 then return, permitting the module-unload oper 182 then return, permitting the module-unload operation to be completed. 162 183 163 .. _rcubarrier_quiz_1: 184 .. _rcubarrier_quiz_1: 164 185 165 Quick Quiz #1: 186 Quick Quiz #1: 166 Is there any other situation where rcu 187 Is there any other situation where rcu_barrier() might 167 be required? 188 be required? 168 189 169 :ref:`Answer to Quick Quiz #1 <answer_rcubarri 190 :ref:`Answer to Quick Quiz #1 <answer_rcubarrier_quiz_1>` 170 191 171 Your module might have additional complication 192 Your module might have additional complications. For example, if your 172 module invokes call_rcu() from timers, you wil !! 193 module invokes call_rcu() from timers, you will need to first cancel all 173 from posting new timers, cancel (or wait for) !! 194 the timers, and only then invoke rcu_barrier() to wait for any remaining 174 timers, and only then invoke rcu_barrier() to << 175 RCU callbacks to complete. 195 RCU callbacks to complete. 176 196 177 Of course, if your module uses call_rcu(), you !! 197 Of course, if you module uses call_rcu(), you will need to invoke 178 rcu_barrier() before unloading. Similarly, if 198 rcu_barrier() before unloading. Similarly, if your module uses 179 call_srcu(), you will need to invoke srcu_barr 199 call_srcu(), you will need to invoke srcu_barrier() before unloading, 180 and on the same srcu_struct structure. If you 200 and on the same srcu_struct structure. If your module uses call_rcu() 181 **and** call_srcu(), then (as noted above) you !! 201 **and** call_srcu(), then you will need to invoke rcu_barrier() **and** 182 rcu_barrier() **and** srcu_barrier(). !! 202 srcu_barrier(). 183 203 184 204 185 Implementing rcu_barrier() 205 Implementing rcu_barrier() 186 -------------------------- 206 -------------------------- 187 207 188 Dipankar Sarma's implementation of rcu_barrier 208 Dipankar Sarma's implementation of rcu_barrier() makes use of the fact 189 that RCU callbacks are never reordered once qu 209 that RCU callbacks are never reordered once queued on one of the per-CPU 190 queues. His implementation queues an RCU callb 210 queues. His implementation queues an RCU callback on each of the per-CPU 191 callback queues, and then waits until they hav 211 callback queues, and then waits until they have all started executing, at 192 which point, all earlier RCU callbacks are gua 212 which point, all earlier RCU callbacks are guaranteed to have completed. 193 213 194 The original code for rcu_barrier() was roughl !! 214 The original code for rcu_barrier() was as follows:: 195 215 196 1 void rcu_barrier(void) !! 216 1 void rcu_barrier(void) 197 2 { !! 217 2 { 198 3 BUG_ON(in_interrupt()); !! 218 3 BUG_ON(in_interrupt()); 199 4 /* Take cpucontrol mutex to protect aga !! 219 4 /* Take cpucontrol mutex to protect against CPU hotplug */ 200 5 mutex_lock(&rcu_barrier_mutex); !! 220 5 mutex_lock(&rcu_barrier_mutex); 201 6 init_completion(&rcu_barrier_completion !! 221 6 init_completion(&rcu_barrier_completion); 202 7 atomic_set(&rcu_barrier_cpu_count, 1); !! 222 7 atomic_set(&rcu_barrier_cpu_count, 0); 203 8 on_each_cpu(rcu_barrier_func, NULL, 0, !! 223 8 on_each_cpu(rcu_barrier_func, NULL, 0, 1); 204 9 if (atomic_dec_and_test(&rcu_barrier_cp !! 224 9 wait_for_completion(&rcu_barrier_completion); 205 10 complete(&rcu_barrier_completion); !! 225 10 mutex_unlock(&rcu_barrier_mutex); 206 11 wait_for_completion(&rcu_barrier_comple !! 226 11 } 207 12 mutex_unlock(&rcu_barrier_mutex); << 208 13 } << 209 227 210 Line 3 verifies that the caller is in process !! 228 Line 3 verifies that the caller is in process context, and lines 5 and 10 211 use rcu_barrier_mutex to ensure that only one 229 use rcu_barrier_mutex to ensure that only one rcu_barrier() is using the 212 global completion and counters at a time, whic 230 global completion and counters at a time, which are initialized on lines 213 6 and 7. Line 8 causes each CPU to invoke rcu_ 231 6 and 7. Line 8 causes each CPU to invoke rcu_barrier_func(), which is 214 shown below. Note that the final "1" in on_eac 232 shown below. Note that the final "1" in on_each_cpu()'s argument list 215 ensures that all the calls to rcu_barrier_func 233 ensures that all the calls to rcu_barrier_func() will have completed 216 before on_each_cpu() returns. Line 9 removes t !! 234 before on_each_cpu() returns. Line 9 then waits for the completion. 217 rcu_barrier_cpu_count, and if this count is no << 218 the completion, which prevents line 11 from bl << 219 line 11 then waits (if needed) for the complet << 220 << 221 .. _rcubarrier_quiz_2: << 222 << 223 Quick Quiz #2: << 224 Why doesn't line 8 initialize rcu_barr << 225 thereby avoiding the need for lines 9 << 226 << 227 :ref:`Answer to Quick Quiz #2 <answer_rcubarri << 228 235 229 This code was rewritten in 2008 and several ti 236 This code was rewritten in 2008 and several times thereafter, but this 230 still gives the general idea. 237 still gives the general idea. 231 238 232 The rcu_barrier_func() runs on each CPU, where 239 The rcu_barrier_func() runs on each CPU, where it invokes call_rcu() 233 to post an RCU callback, as follows:: 240 to post an RCU callback, as follows:: 234 241 235 1 static void rcu_barrier_func(void *notuse !! 242 1 static void rcu_barrier_func(void *notused) 236 2 { !! 243 2 { 237 3 int cpu = smp_processor_id(); !! 244 3 int cpu = smp_processor_id(); 238 4 struct rcu_data *rdp = &per_cpu(rcu_dat !! 245 4 struct rcu_data *rdp = &per_cpu(rcu_data, cpu); 239 5 struct rcu_head *head; !! 246 5 struct rcu_head *head; 240 6 !! 247 6 241 7 head = &rdp->barrier; !! 248 7 head = &rdp->barrier; 242 8 atomic_inc(&rcu_barrier_cpu_count); !! 249 8 atomic_inc(&rcu_barrier_cpu_count); 243 9 call_rcu(head, rcu_barrier_callback); !! 250 9 call_rcu(head, rcu_barrier_callback); 244 10 } !! 251 10 } 245 252 246 Lines 3 and 4 locate RCU's internal per-CPU rc 253 Lines 3 and 4 locate RCU's internal per-CPU rcu_data structure, 247 which contains the struct rcu_head that needed 254 which contains the struct rcu_head that needed for the later call to 248 call_rcu(). Line 7 picks up a pointer to this 255 call_rcu(). Line 7 picks up a pointer to this struct rcu_head, and line 249 8 increments the global counter. This counter !! 256 8 increments a global counter. This counter will later be decremented 250 by the callback. Line 9 then registers the rcu 257 by the callback. Line 9 then registers the rcu_barrier_callback() on 251 the current CPU's queue. 258 the current CPU's queue. 252 259 253 The rcu_barrier_callback() function simply ato 260 The rcu_barrier_callback() function simply atomically decrements the 254 rcu_barrier_cpu_count variable and finalizes t 261 rcu_barrier_cpu_count variable and finalizes the completion when it 255 reaches zero, as follows:: 262 reaches zero, as follows:: 256 263 257 1 static void rcu_barrier_callback(struct r !! 264 1 static void rcu_barrier_callback(struct rcu_head *notused) 258 2 { !! 265 2 { 259 3 if (atomic_dec_and_test(&rcu_barrier_cp !! 266 3 if (atomic_dec_and_test(&rcu_barrier_cpu_count)) 260 4 complete(&rcu_barrier_completion); !! 267 4 complete(&rcu_barrier_completion); 261 5 } !! 268 5 } 262 269 263 .. _rcubarrier_quiz_3: !! 270 .. _rcubarrier_quiz_2: 264 271 265 Quick Quiz #3: !! 272 Quick Quiz #2: 266 What happens if CPU 0's rcu_barrier_fu 273 What happens if CPU 0's rcu_barrier_func() executes 267 immediately (thus incrementing rcu_bar 274 immediately (thus incrementing rcu_barrier_cpu_count to the 268 value one), but the other CPU's rcu_ba 275 value one), but the other CPU's rcu_barrier_func() invocations 269 are delayed for a full grace period? C 276 are delayed for a full grace period? Couldn't this result in 270 rcu_barrier() returning prematurely? 277 rcu_barrier() returning prematurely? 271 278 272 :ref:`Answer to Quick Quiz #3 <answer_rcubarri !! 279 :ref:`Answer to Quick Quiz #2 <answer_rcubarrier_quiz_2>` 273 280 274 The current rcu_barrier() implementation is mo 281 The current rcu_barrier() implementation is more complex, due to the need 275 to avoid disturbing idle CPUs (especially on b 282 to avoid disturbing idle CPUs (especially on battery-powered systems) 276 and the need to minimally disturb non-idle CPU 283 and the need to minimally disturb non-idle CPUs in real-time systems. 277 In addition, a great many optimizations have b !! 284 However, the code above illustrates the concepts. 278 the code above illustrates the concepts. << 279 285 280 286 281 rcu_barrier() Summary 287 rcu_barrier() Summary 282 --------------------- 288 --------------------- 283 289 284 The rcu_barrier() primitive is used relatively !! 290 The rcu_barrier() primitive has seen relatively little use, since most 285 code using RCU is in the core kernel rather th 291 code using RCU is in the core kernel rather than in modules. However, if 286 you are using RCU from an unloadable module, y 292 you are using RCU from an unloadable module, you need to use rcu_barrier() 287 so that your module may be safely unloaded. 293 so that your module may be safely unloaded. 288 294 289 295 290 Answers to Quick Quizzes 296 Answers to Quick Quizzes 291 ------------------------ 297 ------------------------ 292 298 293 .. _answer_rcubarrier_quiz_1: 299 .. _answer_rcubarrier_quiz_1: 294 300 295 Quick Quiz #1: 301 Quick Quiz #1: 296 Is there any other situation where rcu 302 Is there any other situation where rcu_barrier() might 297 be required? 303 be required? 298 304 299 Answer: !! 305 Answer: Interestingly enough, rcu_barrier() was not originally 300 Interestingly enough, rcu_barrier() wa << 301 implemented for module unloading. Niki 306 implemented for module unloading. Nikita Danilov was using 302 RCU in a filesystem, which resulted in 307 RCU in a filesystem, which resulted in a similar situation at 303 filesystem-unmount time. Dipankar Sarm 308 filesystem-unmount time. Dipankar Sarma coded up rcu_barrier() 304 in response, so that Nikita could invo 309 in response, so that Nikita could invoke it during the 305 filesystem-unmount process. 310 filesystem-unmount process. 306 311 307 Much later, yours truly hit the RCU mo 312 Much later, yours truly hit the RCU module-unload problem when 308 implementing rcutorture, and found tha 313 implementing rcutorture, and found that rcu_barrier() solves 309 this problem as well. 314 this problem as well. 310 315 311 :ref:`Back to Quick Quiz #1 <rcubarrier_quiz_1 316 :ref:`Back to Quick Quiz #1 <rcubarrier_quiz_1>` 312 317 313 .. _answer_rcubarrier_quiz_2: 318 .. _answer_rcubarrier_quiz_2: 314 319 315 Quick Quiz #2: 320 Quick Quiz #2: 316 Why doesn't line 8 initialize rcu_barr << 317 thereby avoiding the need for lines 9 << 318 << 319 Answer: << 320 Suppose that the on_each_cpu() functio << 321 delayed, so that CPU 0's rcu_barrier_f << 322 the corresponding grace period elapsed << 323 rcu_barrier_func() started executing. << 324 rcu_barrier_cpu_count being decremente << 325 11's wait_for_completion() would retur << 326 wait for CPU 1's callbacks to be invok << 327 << 328 Note that this was not a problem when << 329 was first added back in 2005. This is << 330 disables preemption, which acted as an << 331 section, thus preventing CPU 0's grace << 332 until on_each_cpu() had dealt with all << 333 with the advent of preemptible RCU, rc << 334 waited on nonpreemptible regions of co << 335 that being the job of the new rcu_barr << 336 << 337 However, with the RCU flavor consolida << 338 possibility was once again ruled out, << 339 RCU once again waits on nonpreemptible << 340 << 341 Nevertheless, that extra count might s << 342 Relying on these sort of accidents of << 343 in later surprise bugs when the implem << 344 << 345 :ref:`Back to Quick Quiz #2 <rcubarrier_quiz_2 << 346 << 347 .. _answer_rcubarrier_quiz_3: << 348 << 349 Quick Quiz #3: << 350 What happens if CPU 0's rcu_barrier_fu 321 What happens if CPU 0's rcu_barrier_func() executes 351 immediately (thus incrementing rcu_bar 322 immediately (thus incrementing rcu_barrier_cpu_count to the 352 value one), but the other CPU's rcu_ba 323 value one), but the other CPU's rcu_barrier_func() invocations 353 are delayed for a full grace period? C 324 are delayed for a full grace period? Couldn't this result in 354 rcu_barrier() returning prematurely? 325 rcu_barrier() returning prematurely? 355 326 356 Answer: !! 327 Answer: This cannot happen. The reason is that on_each_cpu() has its last 357 This cannot happen. The reason is that << 358 argument, the wait flag, set to "1". T 328 argument, the wait flag, set to "1". This flag is passed through 359 to smp_call_function() and further to 329 to smp_call_function() and further to smp_call_function_on_cpu(), 360 causing this latter to spin until the 330 causing this latter to spin until the cross-CPU invocation of 361 rcu_barrier_func() has completed. This 331 rcu_barrier_func() has completed. This by itself would prevent 362 a grace period from completing on non- 332 a grace period from completing on non-CONFIG_PREEMPTION kernels, 363 since each CPU must undergo a context 333 since each CPU must undergo a context switch (or other quiescent 364 state) before the grace period can com 334 state) before the grace period can complete. However, this is 365 of no use in CONFIG_PREEMPTION kernels 335 of no use in CONFIG_PREEMPTION kernels. 366 336 367 Therefore, on_each_cpu() disables pree 337 Therefore, on_each_cpu() disables preemption across its call 368 to smp_call_function() and also across 338 to smp_call_function() and also across the local call to 369 rcu_barrier_func(). Because recent RCU !! 339 rcu_barrier_func(). This prevents the local CPU from context 370 preemption-disabled regions of code as !! 340 switching, again preventing grace periods from completing. This 371 sections, this prevents grace periods << 372 means that all CPUs have executed rcu_ 341 means that all CPUs have executed rcu_barrier_func() before 373 the first rcu_barrier_callback() can p 342 the first rcu_barrier_callback() can possibly execute, in turn 374 preventing rcu_barrier_cpu_count from 343 preventing rcu_barrier_cpu_count from prematurely reaching zero. 375 344 376 But if on_each_cpu() ever decides to f !! 345 Currently, -rt implementations of RCU keep but a single global 377 as might well happen due to real-time !! 346 queue for RCU callbacks, and thus do not suffer from this 378 initializing rcu_barrier_cpu_count to !! 347 problem. However, when the -rt RCU eventually does have per-CPU >> 348 callback queues, things will have to change. One simple change >> 349 is to add an rcu_read_lock() before line 8 of rcu_barrier() >> 350 and an rcu_read_unlock() after line 8 of this same function. If >> 351 you can think of a better change, please let me know! 379 352 380 :ref:`Back to Quick Quiz #3 <rcubarrier_quiz_3 !! 353 :ref:`Back to Quick Quiz #2 <rcubarrier_quiz_2>`
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.