1 .. _rcu_barrier: 2 3 RCU and Unloadable Modules 4 ========================== 5 6 [Originally published in LWN Jan. 14, 2007: ht 7 8 RCU updaters sometimes use call_rcu() to initi 9 a grace period to elapse. This primitive take 10 struct placed within the RCU-protected data st 11 to a function that may be invoked later to fre 12 delete an element p from the linked list from 13 as follows:: 14 15 list_del_rcu(p); 16 call_rcu(&p->rcu, p_callback); 17 18 Since call_rcu() never blocks, this code can s 19 IRQ context. The function p_callback() might b 20 21 static void p_callback(struct rcu_head 22 { 23 struct pstruct *p = container_ 24 25 kfree(p); 26 } 27 28 29 Unloading Modules That Use call_rcu() 30 ------------------------------------- 31 32 But what if the p_callback() function is defin 33 34 If we unload the module while some RCU callbac 35 the CPUs executing these callbacks are going t 36 disappointed when they are later invoked, as f 37 http://lwn.net/images/ns/kernel/rcu-drop.jpg. 38 39 We could try placing a synchronize_rcu() in th 40 but this is not sufficient. Although synchroni 41 grace period to elapse, it does not wait for t 42 43 One might be tempted to try several back-to-ba 44 calls, but this is still not guaranteed to wor 45 heavy RCU-callback load, then some of the call 46 order to allow other processing to proceed. Fo 47 deferral is required in realtime kernels in or 48 scheduling latencies. 49 50 51 rcu_barrier() 52 ------------- 53 54 This situation can be handled by the rcu_barri 55 than waiting for a grace period to elapse, rcu 56 outstanding RCU callbacks to complete. Please 57 does **not** imply synchronize_rcu(), in parti 58 callbacks queued anywhere, rcu_barrier() is wi 59 immediately, without waiting for anything, let 60 61 Pseudo-code using rcu_barrier() is as follows: 62 63 1. Prevent any new RCU callbacks from being 64 2. Execute rcu_barrier(). 65 3. Allow the module to be unloaded. 66 67 There is also an srcu_barrier() function for S 68 must match the flavor of srcu_barrier() with t 69 If your module uses multiple srcu_struct struc 70 use multiple invocations of srcu_barrier() whe 71 For example, if it uses call_rcu(), call_srcu( 72 call_srcu() on srcu_struct_2, then the followi 73 will be required when unloading:: 74 75 1 rcu_barrier(); 76 2 srcu_barrier(&srcu_struct_1); 77 3 srcu_barrier(&srcu_struct_2); 78 79 If latency is of the essence, workqueues could 80 three functions concurrently. 81 82 An ancient version of the rcutorture module ma 83 in its exit function as follows:: 84 85 1 static void 86 2 rcu_torture_cleanup(void) 87 3 { 88 4 int i; 89 5 90 6 fullstop = 1; 91 7 if (shuffler_task != NULL) { 92 8 VERBOSE_PRINTK_STRING("Stopping rcu_t 93 9 kthread_stop(shuffler_task); 94 10 } 95 11 shuffler_task = NULL; 96 12 97 13 if (writer_task != NULL) { 98 14 VERBOSE_PRINTK_STRING("Stopping rcu_t 99 15 kthread_stop(writer_task); 100 16 } 101 17 writer_task = NULL; 102 18 103 19 if (reader_tasks != NULL) { 104 20 for (i = 0; i < nrealreaders; i++) { 105 21 if (reader_tasks[i] != NULL) { 106 22 VERBOSE_PRINTK_STRING( 107 23 "Stopping rcu_torture_reader ta 108 24 kthread_stop(reader_tasks[i]); 109 25 } 110 26 reader_tasks[i] = NULL; 111 27 } 112 28 kfree(reader_tasks); 113 29 reader_tasks = NULL; 114 30 } 115 31 rcu_torture_current = NULL; 116 32 117 33 if (fakewriter_tasks != NULL) { 118 34 for (i = 0; i < nfakewriters; i++) { 119 35 if (fakewriter_tasks[i] != NULL) { 120 36 VERBOSE_PRINTK_STRING( 121 37 "Stopping rcu_torture_fakewrite 122 38 kthread_stop(fakewriter_tasks[i]) 123 39 } 124 40 fakewriter_tasks[i] = NULL; 125 41 } 126 42 kfree(fakewriter_tasks); 127 43 fakewriter_tasks = NULL; 128 44 } 129 45 130 46 if (stats_task != NULL) { 131 47 VERBOSE_PRINTK_STRING("Stopping rcu_t 132 48 kthread_stop(stats_task); 133 49 } 134 50 stats_task = NULL; 135 51 136 52 /* Wait for all RCU callbacks to fire. 137 53 rcu_barrier(); 138 54 139 55 rcu_torture_stats_print(); /* -After- t 140 56 141 57 if (cur_ops->cleanup != NULL) 142 58 cur_ops->cleanup(); 143 59 if (atomic_read(&n_rcu_torture_error)) 144 60 rcu_torture_print_module_parms("End o 145 61 else 146 62 rcu_torture_print_module_parms("End o 147 63 } 148 149 Line 6 sets a global variable that prevents an 150 re-posting themselves. This will not be necess 151 RCU callbacks rarely include calls to call_rcu 152 module is an exception to this rule, and there 153 global variable. 154 155 Lines 7-50 stop all the kernel tasks associate 156 module. Therefore, once execution reaches line 157 RCU callbacks will be posted. The rcu_barrier( 158 for any pre-existing callbacks to complete. 159 160 Then lines 55-62 print status and do operation 161 then return, permitting the module-unload oper 162 163 .. _rcubarrier_quiz_1: 164 165 Quick Quiz #1: 166 Is there any other situation where rcu 167 be required? 168 169 :ref:`Answer to Quick Quiz #1 <answer_rcubarri 170 171 Your module might have additional complication 172 module invokes call_rcu() from timers, you wil 173 from posting new timers, cancel (or wait for) 174 timers, and only then invoke rcu_barrier() to 175 RCU callbacks to complete. 176 177 Of course, if your module uses call_rcu(), you 178 rcu_barrier() before unloading. Similarly, if 179 call_srcu(), you will need to invoke srcu_barr 180 and on the same srcu_struct structure. If you 181 **and** call_srcu(), then (as noted above) you 182 rcu_barrier() **and** srcu_barrier(). 183 184 185 Implementing rcu_barrier() 186 -------------------------- 187 188 Dipankar Sarma's implementation of rcu_barrier 189 that RCU callbacks are never reordered once qu 190 queues. His implementation queues an RCU callb 191 callback queues, and then waits until they hav 192 which point, all earlier RCU callbacks are gua 193 194 The original code for rcu_barrier() was roughl 195 196 1 void rcu_barrier(void) 197 2 { 198 3 BUG_ON(in_interrupt()); 199 4 /* Take cpucontrol mutex to protect aga 200 5 mutex_lock(&rcu_barrier_mutex); 201 6 init_completion(&rcu_barrier_completion 202 7 atomic_set(&rcu_barrier_cpu_count, 1); 203 8 on_each_cpu(rcu_barrier_func, NULL, 0, 204 9 if (atomic_dec_and_test(&rcu_barrier_cp 205 10 complete(&rcu_barrier_completion); 206 11 wait_for_completion(&rcu_barrier_comple 207 12 mutex_unlock(&rcu_barrier_mutex); 208 13 } 209 210 Line 3 verifies that the caller is in process 211 use rcu_barrier_mutex to ensure that only one 212 global completion and counters at a time, whic 213 6 and 7. Line 8 causes each CPU to invoke rcu_ 214 shown below. Note that the final "1" in on_eac 215 ensures that all the calls to rcu_barrier_func 216 before on_each_cpu() returns. Line 9 removes t 217 rcu_barrier_cpu_count, and if this count is no 218 the completion, which prevents line 11 from bl 219 line 11 then waits (if needed) for the complet 220 221 .. _rcubarrier_quiz_2: 222 223 Quick Quiz #2: 224 Why doesn't line 8 initialize rcu_barr 225 thereby avoiding the need for lines 9 226 227 :ref:`Answer to Quick Quiz #2 <answer_rcubarri 228 229 This code was rewritten in 2008 and several ti 230 still gives the general idea. 231 232 The rcu_barrier_func() runs on each CPU, where 233 to post an RCU callback, as follows:: 234 235 1 static void rcu_barrier_func(void *notuse 236 2 { 237 3 int cpu = smp_processor_id(); 238 4 struct rcu_data *rdp = &per_cpu(rcu_dat 239 5 struct rcu_head *head; 240 6 241 7 head = &rdp->barrier; 242 8 atomic_inc(&rcu_barrier_cpu_count); 243 9 call_rcu(head, rcu_barrier_callback); 244 10 } 245 246 Lines 3 and 4 locate RCU's internal per-CPU rc 247 which contains the struct rcu_head that needed 248 call_rcu(). Line 7 picks up a pointer to this 249 8 increments the global counter. This counter 250 by the callback. Line 9 then registers the rcu 251 the current CPU's queue. 252 253 The rcu_barrier_callback() function simply ato 254 rcu_barrier_cpu_count variable and finalizes t 255 reaches zero, as follows:: 256 257 1 static void rcu_barrier_callback(struct r 258 2 { 259 3 if (atomic_dec_and_test(&rcu_barrier_cp 260 4 complete(&rcu_barrier_completion); 261 5 } 262 263 .. _rcubarrier_quiz_3: 264 265 Quick Quiz #3: 266 What happens if CPU 0's rcu_barrier_fu 267 immediately (thus incrementing rcu_bar 268 value one), but the other CPU's rcu_ba 269 are delayed for a full grace period? C 270 rcu_barrier() returning prematurely? 271 272 :ref:`Answer to Quick Quiz #3 <answer_rcubarri 273 274 The current rcu_barrier() implementation is mo 275 to avoid disturbing idle CPUs (especially on b 276 and the need to minimally disturb non-idle CPU 277 In addition, a great many optimizations have b 278 the code above illustrates the concepts. 279 280 281 rcu_barrier() Summary 282 --------------------- 283 284 The rcu_barrier() primitive is used relatively 285 code using RCU is in the core kernel rather th 286 you are using RCU from an unloadable module, y 287 so that your module may be safely unloaded. 288 289 290 Answers to Quick Quizzes 291 ------------------------ 292 293 .. _answer_rcubarrier_quiz_1: 294 295 Quick Quiz #1: 296 Is there any other situation where rcu 297 be required? 298 299 Answer: 300 Interestingly enough, rcu_barrier() wa 301 implemented for module unloading. Niki 302 RCU in a filesystem, which resulted in 303 filesystem-unmount time. Dipankar Sarm 304 in response, so that Nikita could invo 305 filesystem-unmount process. 306 307 Much later, yours truly hit the RCU mo 308 implementing rcutorture, and found tha 309 this problem as well. 310 311 :ref:`Back to Quick Quiz #1 <rcubarrier_quiz_1 312 313 .. _answer_rcubarrier_quiz_2: 314 315 Quick Quiz #2: 316 Why doesn't line 8 initialize rcu_barr 317 thereby avoiding the need for lines 9 318 319 Answer: 320 Suppose that the on_each_cpu() functio 321 delayed, so that CPU 0's rcu_barrier_f 322 the corresponding grace period elapsed 323 rcu_barrier_func() started executing. 324 rcu_barrier_cpu_count being decremente 325 11's wait_for_completion() would retur 326 wait for CPU 1's callbacks to be invok 327 328 Note that this was not a problem when 329 was first added back in 2005. This is 330 disables preemption, which acted as an 331 section, thus preventing CPU 0's grace 332 until on_each_cpu() had dealt with all 333 with the advent of preemptible RCU, rc 334 waited on nonpreemptible regions of co 335 that being the job of the new rcu_barr 336 337 However, with the RCU flavor consolida 338 possibility was once again ruled out, 339 RCU once again waits on nonpreemptible 340 341 Nevertheless, that extra count might s 342 Relying on these sort of accidents of 343 in later surprise bugs when the implem 344 345 :ref:`Back to Quick Quiz #2 <rcubarrier_quiz_2 346 347 .. _answer_rcubarrier_quiz_3: 348 349 Quick Quiz #3: 350 What happens if CPU 0's rcu_barrier_fu 351 immediately (thus incrementing rcu_bar 352 value one), but the other CPU's rcu_ba 353 are delayed for a full grace period? C 354 rcu_barrier() returning prematurely? 355 356 Answer: 357 This cannot happen. The reason is that 358 argument, the wait flag, set to "1". T 359 to smp_call_function() and further to 360 causing this latter to spin until the 361 rcu_barrier_func() has completed. This 362 a grace period from completing on non- 363 since each CPU must undergo a context 364 state) before the grace period can com 365 of no use in CONFIG_PREEMPTION kernels 366 367 Therefore, on_each_cpu() disables pree 368 to smp_call_function() and also across 369 rcu_barrier_func(). Because recent RCU 370 preemption-disabled regions of code as 371 sections, this prevents grace periods 372 means that all CPUs have executed rcu_ 373 the first rcu_barrier_callback() can p 374 preventing rcu_barrier_cpu_count from 375 376 But if on_each_cpu() ever decides to f 377 as might well happen due to real-time 378 initializing rcu_barrier_cpu_count to 379 380 :ref:`Back to Quick Quiz #3 <rcubarrier_quiz_3
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.