~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/RCU/listRCU.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/RCU/listRCU.rst (Version linux-6.11.5) and /Documentation/RCU/listRCU.rst (Version linux-4.9.337)


  1 .. _list_rcu_doc:                                 
  2                                                   
  3 Using RCU to Protect Read-Mostly Linked Lists     
  4 =============================================     
  5                                                   
  6 One of the most common uses of RCU is protecti    
  7 (``struct list_head`` in list.h).  One big adv    
  8 that all of the required memory ordering is pr    
  9 This document describes several list-based RCU    
 10                                                   
 11 When iterating a list while holding the rcu_re    
 12 modify the list.  The reader is guaranteed to     
 13 which were added to the list before they acqui    
 14 and are still on the list when they drop the r    
 15 Elements which are added to, or removed from t    
 16 be seen.  If the writer calls list_replace_rcu    
 17 either the old element or the new element; the    
 18 nor will they see neither.                        
 19                                                   
 20                                                   
 21 Example 1: Read-mostly list: Deferred Destruct    
 22 ----------------------------------------------    
 23                                                   
 24 A widely used usecase for RCU lists in the ker    
 25 all processes in the system. ``task_struct::ta    
 26 links all the processes. The list can be trave    
 27 additions or removals.                            
 28                                                   
 29 The traversal of the list is done using ``for_    
 30 by the 2 macros::                                 
 31                                                   
 32         #define next_task(p) \                    
 33                 list_entry_rcu((p)->tasks.next    
 34                                                   
 35         #define for_each_process(p) \             
 36                 for (p = &init_task ; (p = nex    
 37                                                   
 38 The code traversing the list of all processes     
 39                                                   
 40         rcu_read_lock();                          
 41         for_each_process(p) {                     
 42                 /* Do something with p */         
 43         }                                         
 44         rcu_read_unlock();                        
 45                                                   
 46 The simplified and heavily inlined code for re    
 47 task list is::                                    
 48                                                   
 49         void release_task(struct task_struct *    
 50         {                                         
 51                 write_lock(&tasklist_lock);       
 52                 list_del_rcu(&p->tasks);          
 53                 write_unlock(&tasklist_lock);     
 54                 call_rcu(&p->rcu, delayed_put_    
 55         }                                         
 56                                                   
 57 When a process exits, ``release_task()`` calls    
 58 via __exit_signal() and __unhash_process() und    
 59 writer lock protection.  The list_del_rcu() in    
 60 the task from the list of all tasks. The ``tas    
 61 prevents concurrent list additions/removals fr    
 62 list. Readers using ``for_each_process()`` are    
 63 ``tasklist_lock``. To prevent readers from not    
 64 pointers, the ``task_struct`` object is freed     
 65 grace periods elapse, with the help of call_rc    
 66 put_task_struct_rcu_user(). This deferring of     
 67 any readers traversing the list will see valid    
 68 and deletion/freeing can happen in parallel wi    
 69 This pattern is also called an **existence loc    
 70 from invoking the delayed_put_task_struct() ca    
 71 all existing readers finish, which guarantees     
 72 object in question will remain in existence un    
 73 of all RCU readers that might possibly have a     
 74                                                   
 75                                                   
 76 Example 2: Read-Side Action Taken Outside of L    
 77 ----------------------------------------------    
 78                                                   
 79 Some reader-writer locking use cases compute a    
 80 the read-side lock, but continue to use that v    
 81 released.  These use cases are often good cand    
 82 to RCU.  One prominent example involves networ    
 83 Because the packet-routing data tracks the sta    
 84 of the computer, it will at times contain stal    
 85 the route has been computed, there is no need     
 86 static during transmission of the packet.  Aft    
 87 routing table static all you want, but that wo    
 88 Internet from changing, and it is the state of    
 89 that really matters.  In addition, routing ent    
 90 or deleted, rather than being modified in plac    
 91 of the finite speed of light and the non-zero     
 92 helping make synchronization be lighter weight    
 93                                                   
 94 A straightforward example of this type of RCU     
 95 the system-call auditing support.  For example    
 96 implementation of ``audit_filter_task()`` migh    
 97                                                   
 98         static enum audit_state audit_filter_t    
 99         {                                         
100                 struct audit_entry *e;            
101                 enum audit_state   state;         
102                                                   
103                 read_lock(&auditsc_lock);         
104                 /* Note: audit_filter_mutex he    
105                 list_for_each_entry(e, &audit_    
106                         if (audit_filter_rules    
107                                 if (state == A    
108                                         *key =    
109                                 read_unlock(&a    
110                                 return state;     
111                         }                         
112                 }                                 
113                 read_unlock(&auditsc_lock);       
114                 return AUDIT_BUILD_CONTEXT;       
115         }                                         
116                                                   
117 Here the list is searched under the lock, but     
118 the corresponding value is returned.  By the t    
119 on, the list may well have been modified.  Thi    
120 you are turning auditing off, it is OK to audi    
121                                                   
122 This means that RCU can be easily applied to t    
123                                                   
124         static enum audit_state audit_filter_t    
125         {                                         
126                 struct audit_entry *e;            
127                 enum audit_state   state;         
128                                                   
129                 rcu_read_lock();                  
130                 /* Note: audit_filter_mutex he    
131                 list_for_each_entry_rcu(e, &au    
132                         if (audit_filter_rules    
133                                 if (state == A    
134                                         *key =    
135                                 rcu_read_unloc    
136                                 return state;     
137                         }                         
138                 }                                 
139                 rcu_read_unlock();                
140                 return AUDIT_BUILD_CONTEXT;       
141         }                                         
142                                                   
143 The read_lock() and read_unlock() calls have b    
144 and rcu_read_unlock(), respectively, and the l    
145 has become list_for_each_entry_rcu().  The **_    
146 primitives add READ_ONCE() and diagnostic chec    
147 outside of an RCU read-side critical section.     
148                                                   
149 The changes to the update side are also straig    
150 might be used as follows for deletion and inse    
151 versions of audit_del_rule() and audit_add_rul    
152                                                   
153         static inline int audit_del_rule(struc    
154                                          struc    
155         {                                         
156                 struct audit_entry *e;            
157                                                   
158                 write_lock(&auditsc_lock);        
159                 list_for_each_entry(e, list, l    
160                         if (!audit_compare_rul    
161                                 list_del(&e->l    
162                                 write_unlock(&    
163                                 return 0;         
164                         }                         
165                 }                                 
166                 write_unlock(&auditsc_lock);      
167                 return -EFAULT;         /* No     
168         }                                         
169                                                   
170         static inline int audit_add_rule(struc    
171                                          struc    
172         {                                         
173                 write_lock(&auditsc_lock);        
174                 if (entry->rule.flags & AUDIT_    
175                         entry->rule.flags &= ~    
176                         list_add(&entry->list,    
177                 } else {                          
178                         list_add_tail(&entry->    
179                 }                                 
180                 write_unlock(&auditsc_lock);      
181                 return 0;                         
182         }                                         
183                                                   
184 Following are the RCU equivalents for these tw    
185                                                   
186         static inline int audit_del_rule(struc    
187                                          struc    
188         {                                         
189                 struct audit_entry *e;            
190                                                   
191                 /* No need to use the _rcu ite    
192                  * deletion routine. */           
193                 list_for_each_entry(e, list, l    
194                         if (!audit_compare_rul    
195                                 list_del_rcu(&    
196                                 call_rcu(&e->r    
197                                 return 0;         
198                         }                         
199                 }                                 
200                 return -EFAULT;         /* No     
201         }                                         
202                                                   
203         static inline int audit_add_rule(struc    
204                                          struc    
205         {                                         
206                 if (entry->rule.flags & AUDIT_    
207                         entry->rule.flags &= ~    
208                         list_add_rcu(&entry->l    
209                 } else {                          
210                         list_add_tail_rcu(&ent    
211                 }                                 
212                 return 0;                         
213         }                                         
214                                                   
215 Normally, the write_lock() and write_unlock()     
216 spin_lock() and a spin_unlock(). But in this c    
217 ``audit_filter_mutex``, so no additional locki    
218 auditsc_lock can therefore be eliminated, sinc    
219 need for writers to exclude readers.              
220                                                   
221 The list_del(), list_add(), and list_add_tail(    
222 replaced by list_del_rcu(), list_add_rcu(), an    
223 The **_rcu()** list-manipulation primitives ad    
224 needed on weakly ordered CPUs.  The list_del_r    
225 pointer poisoning debug-assist code that would    
226 readers to fail spectacularly.                    
227                                                   
228 So, when readers can tolerate stale data and w    
229 deleted, without in-place modification, it is     
230                                                   
231                                                   
232 Example 3: Handling In-Place Updates              
233 ------------------------------------              
234                                                   
235 The system-call auditing code does not update     
236 if it did, the reader-writer-locked code to do    
237 (assuming only ``field_count`` is updated, oth    
238 need to be filled in)::                           
239                                                   
240         static inline int audit_upd_rule(struc    
241                                          struc    
242                                          __u32    
243                                          __u32    
244         {                                         
245                 struct audit_entry *e;            
246                 struct audit_entry *ne;           
247                                                   
248                 write_lock(&auditsc_lock);        
249                 /* Note: audit_filter_mutex he    
250                 list_for_each_entry(e, list, l    
251                         if (!audit_compare_rul    
252                                 e->rule.action    
253                                 e->rule.field_    
254                                 write_unlock(&    
255                                 return 0;         
256                         }                         
257                 }                                 
258                 write_unlock(&auditsc_lock);      
259                 return -EFAULT;         /* No     
260         }                                         
261                                                   
262 The RCU version creates a copy, updates the co    
263 entry with the newly updated entry.  This sequ    
264 concurrent reads while making a copy to perfor    
265 RCU (*read-copy update*) its name.                
266                                                   
267 The RCU version of audit_upd_rule() is as foll    
268                                                   
269         static inline int audit_upd_rule(struc    
270                                          struc    
271                                          __u32    
272                                          __u32    
273         {                                         
274                 struct audit_entry *e;            
275                 struct audit_entry *ne;           
276                                                   
277                 list_for_each_entry(e, list, l    
278                         if (!audit_compare_rul    
279                                 ne = kmalloc(s    
280                                 if (ne == NULL    
281                                         return    
282                                 audit_copy_rul    
283                                 ne->rule.actio    
284                                 ne->rule.field    
285                                 list_replace_r    
286                                 call_rcu(&e->r    
287                                 return 0;         
288                         }                         
289                 }                                 
290                 return -EFAULT;         /* No     
291         }                                         
292                                                   
293 Again, this assumes that the caller holds ``au    
294 writer lock would become a spinlock in this so    
295                                                   
296 The update_lsm_rule() does something very simi    
297 prefer to look at real Linux-kernel code.         
298                                                   
299 Another use of this pattern can be found in th    
300 tracking table* code in ``ct_limit_set()``.  T    
301 entries and has a limit on the maximum entries    
302 per-zone and hence one *limit* per zone.  The     
303 through a hashtable using an RCU-managed hlist    
304 limit is set, a new limit object is allocated     
305 to replace the old limit object with the new o    
306 The old limit object is then freed after a gra    
307                                                   
308                                                   
309 Example 4: Eliminating Stale Data                 
310 ---------------------------------                 
311                                                   
312 The auditing example above tolerates stale dat    
313 that are tracking external state.  After all,     
314 from the time the external state changes befor    
315 of the change, and so as noted earlier, a smal    
316 RCU-induced staleness is generally not a probl    
317                                                   
318 However, there are many examples where stale d    
319 One example in the Linux kernel is the System     
320 function in ipc/shm.c).  This code checks a *d    
321 per-entry spinlock, and, if the *deleted* flag    
322 entry does not exist.  For this to be helpful,    
323 return holding the per-entry spinlock, as shm_    
324                                                   
325 .. _quick_quiz:                                   
326                                                   
327 Quick Quiz:                                       
328         For the deleted-flag technique to be h    
329         to hold the per-entry lock while retur    
330                                                   
331 :ref:`Answer to Quick Quiz <quick_quiz_answer>    
332                                                   
333 If the system-call audit module were to ever n    
334 to accomplish this would be to add a ``deleted    
335 ``audit_entry`` structure, and modify audit_fi    
336                                                   
337         static enum audit_state audit_filter_t    
338         {                                         
339                 struct audit_entry *e;            
340                 enum audit_state   state;         
341                                                   
342                 rcu_read_lock();                  
343                 list_for_each_entry_rcu(e, &au    
344                         if (audit_filter_rules    
345                                 spin_lock(&e->    
346                                 if (e->deleted    
347                                         spin_u    
348                                         rcu_re    
349                                         return    
350                                 }                 
351                                 rcu_read_unloc    
352                                 if (state == A    
353                                         *key =    
354                                 return state;     
355                         }                         
356                 }                                 
357                 rcu_read_unlock();                
358                 return AUDIT_BUILD_CONTEXT;       
359         }                                         
360                                                   
361 The ``audit_del_rule()`` function would need t    
362 spinlock as follows::                             
363                                                   
364         static inline int audit_del_rule(struc    
365                                          struc    
366         {                                         
367                 struct audit_entry *e;            
368                                                   
369                 /* No need to use the _rcu ite    
370                  * is the only deletion routin    
371                 list_for_each_entry(e, list, l    
372                         if (!audit_compare_rul    
373                                 spin_lock(&e->    
374                                 list_del_rcu(&    
375                                 e->deleted = 1    
376                                 spin_unlock(&e    
377                                 call_rcu(&e->r    
378                                 return 0;         
379                         }                         
380                 }                                 
381                 return -EFAULT;         /* No     
382         }                                         
383                                                   
384 This too assumes that the caller holds ``audit    
385                                                   
386 Note that this example assumes that entries ar    
387 Additional mechanism is required to deal corre    
388 performed by audit_upd_rule().  For one thing,    
389 need to hold the locks of both the old ``audit    
390 while executing the list_replace_rcu().           
391                                                   
392                                                   
393 Example 5: Skipping Stale Objects                 
394 ---------------------------------                 
395                                                   
396 For some use cases, reader performance can be     
397 stale objects during read-side list traversal,    
398 are those that will be removed and destroyed a    
399 periods. One such example can be found in the     
400 ``CLOCK_REALTIME`` clock is reprogrammed (for     
401 of the system time) then all programmed ``time    
402 this clock get triggered and processes waiting    
403 advance of their scheduled expiry. To facilita    
404 are added to an RCU-managed ``cancel_list`` wh    
405 ``timerfd_setup_cancel()``::                      
406                                                   
407         static void timerfd_setup_cancel(struc    
408         {                                         
409                 spin_lock(&ctx->cancel_lock);     
410                 if ((ctx->clockid == CLOCK_REA    
411                      ctx->clockid == CLOCK_REA    
412                     (flags & TFD_TIMER_ABSTIME    
413                         if (!ctx->might_cancel    
414                                 ctx->might_can    
415                                 spin_lock(&can    
416                                 list_add_rcu(&    
417                                 spin_unlock(&c    
418                         }                         
419                 } else {                          
420                         __timerfd_remove_cance    
421                 }                                 
422                 spin_unlock(&ctx->cancel_lock)    
423         }                                         
424                                                   
425 When a timerfd is freed (fd is closed), then t    
426 flag of the timerfd object is cleared, the obj    
427 ``cancel_list`` and destroyed, as shown in thi    
428 version of timerfd_release()::                    
429                                                   
430         int timerfd_release(struct inode *inod    
431         {                                         
432                 struct timerfd_ctx *ctx = file    
433                                                   
434                 spin_lock(&ctx->cancel_lock);     
435                 if (ctx->might_cancel) {          
436                         ctx->might_cancel = fa    
437                         spin_lock(&cancel_lock    
438                         list_del_rcu(&ctx->cli    
439                         spin_unlock(&cancel_lo    
440                 }                                 
441                 spin_unlock(&ctx->cancel_lock)    
442                                                   
443                 if (isalarm(ctx))                 
444                         alarm_cancel(&ctx->t.a    
445                 else                              
446                         hrtimer_cancel(&ctx->t    
447                 kfree_rcu(ctx, rcu);              
448                 return 0;                         
449         }                                         
450                                                   
451 If the ``CLOCK_REALTIME`` clock is set, for ex    
452 hrtimer framework calls ``timerfd_clock_was_se    
453 ``cancel_list`` and wakes up processes waiting    
454 the ``cancel_list``, the ``might_cancel`` flag    
455 objects::                                         
456                                                   
457         void timerfd_clock_was_set(void)          
458         {                                         
459                 ktime_t moffs = ktime_mono_to_    
460                 struct timerfd_ctx *ctx;          
461                 unsigned long flags;              
462                                                   
463                 rcu_read_lock();                  
464                 list_for_each_entry_rcu(ctx, &    
465                         if (!ctx->might_cancel    
466                                 continue;         
467                         spin_lock_irqsave(&ctx    
468                         if (ctx->moffs != moff    
469                                 ctx->moffs = K    
470                                 ctx->ticks++;     
471                                 wake_up_locked    
472                         }                         
473                         spin_unlock_irqrestore    
474                 }                                 
475                 rcu_read_unlock();                
476         }                                         
477                                                   
478 The key point is that because RCU-protected tr    
479 ``cancel_list`` happens concurrently with obje    
480 sometimes the traversal can access an object t    
481 the list. In this example, a flag is used to s    
482                                                   
483                                                   
484 Summary                                           
485 -------                                           
486                                                   
487 Read-mostly list-based data structures that ca    
488 the most amenable to use of RCU.  The simplest    
489 either added or deleted from the data structur    
490 in place), but non-atomic in-place modificatio    
491 a copy, updating the copy, then replacing the     
492 If stale data cannot be tolerated, then a *del    
493 in conjunction with a per-entry spinlock in or    
494 function to reject newly deleted data.            
495                                                   
496 .. _quick_quiz_answer:                            
497                                                   
498 Answer to Quick Quiz:                             
499         For the deleted-flag technique to be h    
500         to hold the per-entry lock while retur    
501                                                   
502         If the search function drops the per-e    
503         then the caller will be processing sta    
504         is really OK to be processing stale da    
505         *deleted* flag.  If processing stale d    
506         then you need to hold the per-entry lo    
507         that uses the value that was returned.    
508                                                   
509 :ref:`Back to Quick Quiz <quick_quiz>`            
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php