~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/tools/memory-model/Documentation/recipes.txt

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /tools/memory-model/Documentation/recipes.txt (Architecture sparc64) and /tools/memory-model/Documentation/recipes.txt (Architecture alpha)


  1 This document provides "recipes", that is, lit      1 This document provides "recipes", that is, litmus tests for commonly
  2 occurring situations, as well as a few that il      2 occurring situations, as well as a few that illustrate subtly broken but
  3 attractive nuisances.  Many of these recipes i      3 attractive nuisances.  Many of these recipes include example code from
  4 v5.7 of the Linux kernel.                           4 v5.7 of the Linux kernel.
  5                                                     5 
  6 The first section covers simple special cases,      6 The first section covers simple special cases, the second section
  7 takes off the training wheels to cover more in      7 takes off the training wheels to cover more involved examples,
  8 and the third section provides a few rules of       8 and the third section provides a few rules of thumb.
  9                                                     9 
 10                                                    10 
 11 Simple special cases                               11 Simple special cases
 12 ====================                               12 ====================
 13                                                    13 
 14 This section presents two simple special cases     14 This section presents two simple special cases, the first being where
 15 there is only one CPU or only one memory locat     15 there is only one CPU or only one memory location is accessed, and the
 16 second being use of that old concurrency workh     16 second being use of that old concurrency workhorse, locking.
 17                                                    17 
 18                                                    18 
 19 Single CPU or single memory location               19 Single CPU or single memory location
 20 ------------------------------------               20 ------------------------------------
 21                                                    21 
 22 If there is only one CPU on the one hand or on     22 If there is only one CPU on the one hand or only one variable
 23 on the other, the code will execute in order.      23 on the other, the code will execute in order.  There are (as
 24 usual) some things to be careful of:               24 usual) some things to be careful of:
 25                                                    25 
 26 1.      Some aspects of the C language are uno     26 1.      Some aspects of the C language are unordered.  For example,
 27         in the expression "f(x) + g(y)", the o     27         in the expression "f(x) + g(y)", the order in which f and g are
 28         called is not defined; the object code     28         called is not defined; the object code is allowed to use either
 29         order or even to interleave the comput     29         order or even to interleave the computations.
 30                                                    30 
 31 2.      Compilers are permitted to use the "as     31 2.      Compilers are permitted to use the "as-if" rule.  That is, a
 32         compiler can emit whatever code it lik     32         compiler can emit whatever code it likes for normal accesses,
 33         as long as the results of a single-thr     33         as long as the results of a single-threaded execution appear
 34         just as if the compiler had followed a     34         just as if the compiler had followed all the relevant rules.
 35         To see this, compile with a high level     35         To see this, compile with a high level of optimization and run
 36         the debugger on the resulting binary.      36         the debugger on the resulting binary.
 37                                                    37 
 38 3.      If there is only one variable but mult     38 3.      If there is only one variable but multiple CPUs, that variable
 39         must be properly aligned and all acces     39         must be properly aligned and all accesses to that variable must
 40         be full sized.  Variables that straddl     40         be full sized.  Variables that straddle cachelines or pages void
 41         your full-ordering warranty, as do und     41         your full-ordering warranty, as do undersized accesses that load
 42         from or store to only part of the vari     42         from or store to only part of the variable.
 43                                                    43 
 44 4.      If there are multiple CPUs, accesses t     44 4.      If there are multiple CPUs, accesses to shared variables should
 45         use READ_ONCE() and WRITE_ONCE() or st     45         use READ_ONCE() and WRITE_ONCE() or stronger to prevent load/store
 46         tearing, load/store fusing, and invent     46         tearing, load/store fusing, and invented loads and stores.
 47         There are exceptions to this rule, inc     47         There are exceptions to this rule, including:
 48                                                    48 
 49         i.      When there is no possibility o     49         i.      When there is no possibility of a given shared variable
 50                 being updated by some other CP     50                 being updated by some other CPU, for example, while
 51                 holding the update-side lock,      51                 holding the update-side lock, reads from that variable
 52                 need not use READ_ONCE().          52                 need not use READ_ONCE().
 53                                                    53 
 54         ii.     When there is no possibility o     54         ii.     When there is no possibility of a given shared variable
 55                 being either read or updated b     55                 being either read or updated by other CPUs, for example,
 56                 when running during early boot     56                 when running during early boot, reads from that variable
 57                 need not use READ_ONCE() and w     57                 need not use READ_ONCE() and writes to that variable
 58                 need not use WRITE_ONCE().         58                 need not use WRITE_ONCE().
 59                                                    59 
 60                                                    60 
 61 Locking                                            61 Locking
 62 -------                                            62 -------
 63                                                    63 
 64 Locking is well-known and straightforward, at      64 Locking is well-known and straightforward, at least if you don't think
 65 about it too hard.  And the basic rule is inde     65 about it too hard.  And the basic rule is indeed quite simple: Any CPU that
 66 has acquired a given lock sees any changes pre     66 has acquired a given lock sees any changes previously seen or made by any
 67 CPU before it released that same lock.  Note t     67 CPU before it released that same lock.  Note that this statement is a bit
 68 stronger than "Any CPU holding a given lock se     68 stronger than "Any CPU holding a given lock sees all changes made by any
 69 CPU during the time that CPU was holding this      69 CPU during the time that CPU was holding this same lock".  For example,
 70 consider the following pair of code fragments:     70 consider the following pair of code fragments:
 71                                                    71 
 72         /* See MP+polocks.litmus. */               72         /* See MP+polocks.litmus. */
 73         void CPU0(void)                            73         void CPU0(void)
 74         {                                          74         {
 75                 WRITE_ONCE(x, 1);                  75                 WRITE_ONCE(x, 1);
 76                 spin_lock(&mylock);                76                 spin_lock(&mylock);
 77                 WRITE_ONCE(y, 1);                  77                 WRITE_ONCE(y, 1);
 78                 spin_unlock(&mylock);              78                 spin_unlock(&mylock);
 79         }                                          79         }
 80                                                    80 
 81         void CPU1(void)                            81         void CPU1(void)
 82         {                                          82         {
 83                 spin_lock(&mylock);                83                 spin_lock(&mylock);
 84                 r0 = READ_ONCE(y);                 84                 r0 = READ_ONCE(y);
 85                 spin_unlock(&mylock);              85                 spin_unlock(&mylock);
 86                 r1 = READ_ONCE(x);                 86                 r1 = READ_ONCE(x);
 87         }                                          87         }
 88                                                    88 
 89 The basic rule guarantees that if CPU0() acqui     89 The basic rule guarantees that if CPU0() acquires mylock before CPU1(),
 90 then both r0 and r1 must be set to the value 1     90 then both r0 and r1 must be set to the value 1.  This also has the
 91 consequence that if the final value of r0 is e     91 consequence that if the final value of r0 is equal to 1, then the final
 92 value of r1 must also be equal to 1.  In contr     92 value of r1 must also be equal to 1.  In contrast, the weaker rule would
 93 say nothing about the final value of r1.           93 say nothing about the final value of r1.
 94                                                    94 
 95 The converse to the basic rule also holds, as      95 The converse to the basic rule also holds, as illustrated by the
 96 following litmus test:                             96 following litmus test:
 97                                                    97 
 98         /* See MP+porevlocks.litmus. */            98         /* See MP+porevlocks.litmus. */
 99         void CPU0(void)                            99         void CPU0(void)
100         {                                         100         {
101                 r0 = READ_ONCE(y);                101                 r0 = READ_ONCE(y);
102                 spin_lock(&mylock);               102                 spin_lock(&mylock);
103                 r1 = READ_ONCE(x);                103                 r1 = READ_ONCE(x);
104                 spin_unlock(&mylock);             104                 spin_unlock(&mylock);
105         }                                         105         }
106                                                   106 
107         void CPU1(void)                           107         void CPU1(void)
108         {                                         108         {
109                 spin_lock(&mylock);               109                 spin_lock(&mylock);
110                 WRITE_ONCE(x, 1);                 110                 WRITE_ONCE(x, 1);
111                 spin_unlock(&mylock);             111                 spin_unlock(&mylock);
112                 WRITE_ONCE(y, 1);                 112                 WRITE_ONCE(y, 1);
113         }                                         113         }
114                                                   114 
115 This converse to the basic rule guarantees tha    115 This converse to the basic rule guarantees that if CPU0() acquires
116 mylock before CPU1(), then both r0 and r1 must    116 mylock before CPU1(), then both r0 and r1 must be set to the value 0.
117 This also has the consequence that if the fina    117 This also has the consequence that if the final value of r1 is equal
118 to 0, then the final value of r0 must also be     118 to 0, then the final value of r0 must also be equal to 0.  In contrast,
119 the weaker rule would say nothing about the fi    119 the weaker rule would say nothing about the final value of r0.
120                                                   120 
121 These examples show only a single pair of CPUs    121 These examples show only a single pair of CPUs, but the effects of the
122 locking basic rule extend across multiple acqu    122 locking basic rule extend across multiple acquisitions of a given lock
123 across multiple CPUs.                             123 across multiple CPUs.
124                                                   124 
125 However, it is not necessarily the case that a    125 However, it is not necessarily the case that accesses ordered by
126 locking will be seen as ordered by CPUs not ho    126 locking will be seen as ordered by CPUs not holding that lock.
127 Consider this example:                            127 Consider this example:
128                                                   128 
129         /* See Z6.0+pooncelock+pooncelock+pomb    129         /* See Z6.0+pooncelock+pooncelock+pombonce.litmus. */
130         void CPU0(void)                           130         void CPU0(void)
131         {                                         131         {
132                 spin_lock(&mylock);               132                 spin_lock(&mylock);
133                 WRITE_ONCE(x, 1);                 133                 WRITE_ONCE(x, 1);
134                 WRITE_ONCE(y, 1);                 134                 WRITE_ONCE(y, 1);
135                 spin_unlock(&mylock);             135                 spin_unlock(&mylock);
136         }                                         136         }
137                                                   137 
138         void CPU1(void)                           138         void CPU1(void)
139         {                                         139         {
140                 spin_lock(&mylock);               140                 spin_lock(&mylock);
141                 r0 = READ_ONCE(y);                141                 r0 = READ_ONCE(y);
142                 WRITE_ONCE(z, 1);                 142                 WRITE_ONCE(z, 1);
143                 spin_unlock(&mylock);             143                 spin_unlock(&mylock);
144         }                                         144         }
145                                                   145 
146         void CPU2(void)                           146         void CPU2(void)
147         {                                         147         {
148                 WRITE_ONCE(z, 2);                 148                 WRITE_ONCE(z, 2);
149                 smp_mb();                         149                 smp_mb();
150                 r1 = READ_ONCE(x);                150                 r1 = READ_ONCE(x);
151         }                                         151         }
152                                                   152 
153 Counter-intuitive though it might be, it is qu    153 Counter-intuitive though it might be, it is quite possible to have
154 the final value of r0 be 1, the final value of    154 the final value of r0 be 1, the final value of z be 2, and the final
155 value of r1 be 0.  The reason for this surpris    155 value of r1 be 0.  The reason for this surprising outcome is that
156 CPU2() never acquired the lock, and thus did n    156 CPU2() never acquired the lock, and thus did not benefit from the
157 lock's ordering properties.                       157 lock's ordering properties.
158                                                   158 
159 Ordering can be extended to CPUs not holding t    159 Ordering can be extended to CPUs not holding the lock by careful use
160 of smp_mb__after_spinlock():                      160 of smp_mb__after_spinlock():
161                                                   161 
162         /* See Z6.0+pooncelock+poonceLock+pomb    162         /* See Z6.0+pooncelock+poonceLock+pombonce.litmus. */
163         void CPU0(void)                           163         void CPU0(void)
164         {                                         164         {
165                 spin_lock(&mylock);               165                 spin_lock(&mylock);
166                 WRITE_ONCE(x, 1);                 166                 WRITE_ONCE(x, 1);
167                 WRITE_ONCE(y, 1);                 167                 WRITE_ONCE(y, 1);
168                 spin_unlock(&mylock);             168                 spin_unlock(&mylock);
169         }                                         169         }
170                                                   170 
171         void CPU1(void)                           171         void CPU1(void)
172         {                                         172         {
173                 spin_lock(&mylock);               173                 spin_lock(&mylock);
174                 smp_mb__after_spinlock();         174                 smp_mb__after_spinlock();
175                 r0 = READ_ONCE(y);                175                 r0 = READ_ONCE(y);
176                 WRITE_ONCE(z, 1);                 176                 WRITE_ONCE(z, 1);
177                 spin_unlock(&mylock);             177                 spin_unlock(&mylock);
178         }                                         178         }
179                                                   179 
180         void CPU2(void)                           180         void CPU2(void)
181         {                                         181         {
182                 WRITE_ONCE(z, 2);                 182                 WRITE_ONCE(z, 2);
183                 smp_mb();                         183                 smp_mb();
184                 r1 = READ_ONCE(x);                184                 r1 = READ_ONCE(x);
185         }                                         185         }
186                                                   186 
187 This addition of smp_mb__after_spinlock() stre    187 This addition of smp_mb__after_spinlock() strengthens the lock acquisition
188 sufficiently to rule out the counter-intuitive    188 sufficiently to rule out the counter-intuitive outcome.
189                                                   189 
190                                                   190 
191 Taking off the training wheels                    191 Taking off the training wheels
192 ==============================                    192 ==============================
193                                                   193 
194 This section looks at more complex examples, i    194 This section looks at more complex examples, including message passing,
195 load buffering, release-acquire chains, store     195 load buffering, release-acquire chains, store buffering.
196 Many classes of litmus tests have abbreviated     196 Many classes of litmus tests have abbreviated names, which may be found
197 here: https://www.cl.cam.ac.uk/~pes20/ppc-supp    197 here: https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test6.pdf
198                                                   198 
199                                                   199 
200 Message passing (MP)                              200 Message passing (MP)
201 --------------------                              201 --------------------
202                                                   202 
203 The MP pattern has one CPU execute a pair of s    203 The MP pattern has one CPU execute a pair of stores to a pair of variables
204 and another CPU execute a pair of loads from t    204 and another CPU execute a pair of loads from this same pair of variables,
205 but in the opposite order.  The goal is to avo    205 but in the opposite order.  The goal is to avoid the counter-intuitive
206 outcome in which the first load sees the value    206 outcome in which the first load sees the value written by the second store
207 but the second load does not see the value wri    207 but the second load does not see the value written by the first store.
208 In the absence of any ordering, this goal may     208 In the absence of any ordering, this goal may not be met, as can be seen
209 in the MP+poonceonces.litmus litmus test.  Thi    209 in the MP+poonceonces.litmus litmus test.  This section therefore looks at
210 a number of ways of meeting this goal.            210 a number of ways of meeting this goal.
211                                                   211 
212                                                   212 
213 Release and acquire                               213 Release and acquire
214 ~~~~~~~~~~~~~~~~~~~                               214 ~~~~~~~~~~~~~~~~~~~
215                                                   215 
216 Use of smp_store_release() and smp_load_acquir    216 Use of smp_store_release() and smp_load_acquire() is one way to force
217 the desired MP ordering.  The general approach    217 the desired MP ordering.  The general approach is shown below:
218                                                   218 
219         /* See MP+pooncerelease+poacquireonce.    219         /* See MP+pooncerelease+poacquireonce.litmus. */
220         void CPU0(void)                           220         void CPU0(void)
221         {                                         221         {
222                 WRITE_ONCE(x, 1);                 222                 WRITE_ONCE(x, 1);
223                 smp_store_release(&y, 1);         223                 smp_store_release(&y, 1);
224         }                                         224         }
225                                                   225 
226         void CPU1(void)                           226         void CPU1(void)
227         {                                         227         {
228                 r0 = smp_load_acquire(&y);        228                 r0 = smp_load_acquire(&y);
229                 r1 = READ_ONCE(x);                229                 r1 = READ_ONCE(x);
230         }                                         230         }
231                                                   231 
232 The smp_store_release() macro orders any prior    232 The smp_store_release() macro orders any prior accesses against the
233 store, while the smp_load_acquire macro orders    233 store, while the smp_load_acquire macro orders the load against any
234 subsequent accesses.  Therefore, if the final     234 subsequent accesses.  Therefore, if the final value of r0 is the value 1,
235 the final value of r1 must also be the value 1    235 the final value of r1 must also be the value 1.
236                                                   236 
237 The init_stack_slab() function in lib/stackdep    237 The init_stack_slab() function in lib/stackdepot.c uses release-acquire
238 in this way to safely initialize of a slab of     238 in this way to safely initialize of a slab of the stack.  Working out
239 the mutual-exclusion design is left as an exer    239 the mutual-exclusion design is left as an exercise for the reader.
240                                                   240 
241                                                   241 
242 Assign and dereference                            242 Assign and dereference
243 ~~~~~~~~~~~~~~~~~~~~~~                            243 ~~~~~~~~~~~~~~~~~~~~~~
244                                                   244 
245 Use of rcu_assign_pointer() and rcu_dereferenc    245 Use of rcu_assign_pointer() and rcu_dereference() is quite similar to the
246 use of smp_store_release() and smp_load_acquir    246 use of smp_store_release() and smp_load_acquire(), except that both
247 rcu_assign_pointer() and rcu_dereference() ope    247 rcu_assign_pointer() and rcu_dereference() operate on RCU-protected
248 pointers.  The general approach is shown below    248 pointers.  The general approach is shown below:
249                                                   249 
250         /* See MP+onceassign+derefonce.litmus.    250         /* See MP+onceassign+derefonce.litmus. */
251         int z;                                    251         int z;
252         int *y = &z;                              252         int *y = &z;
253         int x;                                    253         int x;
254                                                   254 
255         void CPU0(void)                           255         void CPU0(void)
256         {                                         256         {
257                 WRITE_ONCE(x, 1);                 257                 WRITE_ONCE(x, 1);
258                 rcu_assign_pointer(y, &x);        258                 rcu_assign_pointer(y, &x);
259         }                                         259         }
260                                                   260 
261         void CPU1(void)                           261         void CPU1(void)
262         {                                         262         {
263                 rcu_read_lock();                  263                 rcu_read_lock();
264                 r0 = rcu_dereference(y);          264                 r0 = rcu_dereference(y);
265                 r1 = READ_ONCE(*r0);              265                 r1 = READ_ONCE(*r0);
266                 rcu_read_unlock();                266                 rcu_read_unlock();
267         }                                         267         }
268                                                   268 
269 In this example, if the final value of r0 is &    269 In this example, if the final value of r0 is &x then the final value of
270 r1 must be 1.                                     270 r1 must be 1.
271                                                   271 
272 The rcu_assign_pointer() macro has the same or    272 The rcu_assign_pointer() macro has the same ordering properties as does
273 smp_store_release(), but the rcu_dereference()    273 smp_store_release(), but the rcu_dereference() macro orders the load only
274 against later accesses that depend on the valu    274 against later accesses that depend on the value loaded.  A dependency
275 is present if the value loaded determines the     275 is present if the value loaded determines the address of a later access
276 (address dependency, as shown above), the valu    276 (address dependency, as shown above), the value written by a later store
277 (data dependency), or whether or not a later s    277 (data dependency), or whether or not a later store is executed in the
278 first place (control dependency).  Note that t    278 first place (control dependency).  Note that the term "data dependency"
279 is sometimes casually used to cover both addre    279 is sometimes casually used to cover both address and data dependencies.
280                                                   280 
281 In lib/math/prime_numbers.c, the expand_to_nex    281 In lib/math/prime_numbers.c, the expand_to_next_prime() function invokes
282 rcu_assign_pointer(), and the next_prime_numbe    282 rcu_assign_pointer(), and the next_prime_number() function invokes
283 rcu_dereference().  This combination mediates     283 rcu_dereference().  This combination mediates access to a bit vector
284 that is expanded as additional primes are need    284 that is expanded as additional primes are needed.
285                                                   285 
286                                                   286 
287 Write and read memory barriers                    287 Write and read memory barriers
288 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                    288 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
289                                                   289 
290 It is usually better to use smp_store_release(    290 It is usually better to use smp_store_release() instead of smp_wmb()
291 and to use smp_load_acquire() instead of smp_r    291 and to use smp_load_acquire() instead of smp_rmb().  However, the older
292 smp_wmb() and smp_rmb() APIs are still heavily    292 smp_wmb() and smp_rmb() APIs are still heavily used, so it is important
293 to understand their use cases.  The general ap    293 to understand their use cases.  The general approach is shown below:
294                                                   294 
295         /* See MP+fencewmbonceonce+fencermbonc    295         /* See MP+fencewmbonceonce+fencermbonceonce.litmus. */
296         void CPU0(void)                           296         void CPU0(void)
297         {                                         297         {
298                 WRITE_ONCE(x, 1);                 298                 WRITE_ONCE(x, 1);
299                 smp_wmb();                        299                 smp_wmb();
300                 WRITE_ONCE(y, 1);                 300                 WRITE_ONCE(y, 1);
301         }                                         301         }
302                                                   302 
303         void CPU1(void)                           303         void CPU1(void)
304         {                                         304         {
305                 r0 = READ_ONCE(y);                305                 r0 = READ_ONCE(y);
306                 smp_rmb();                        306                 smp_rmb();
307                 r1 = READ_ONCE(x);                307                 r1 = READ_ONCE(x);
308         }                                         308         }
309                                                   309 
310 The smp_wmb() macro orders prior stores agains    310 The smp_wmb() macro orders prior stores against later stores, and the
311 smp_rmb() macro orders prior loads against lat    311 smp_rmb() macro orders prior loads against later loads.  Therefore, if
312 the final value of r0 is 1, the final value of    312 the final value of r0 is 1, the final value of r1 must also be 1.
313                                                   313 
314 The xlog_state_switch_iclogs() function in fs/    314 The xlog_state_switch_iclogs() function in fs/xfs/xfs_log.c contains
315 the following write-side code fragment:           315 the following write-side code fragment:
316                                                   316 
317         log->l_curr_block -= log->l_logBBsize;    317         log->l_curr_block -= log->l_logBBsize;
318         ASSERT(log->l_curr_block >= 0);           318         ASSERT(log->l_curr_block >= 0);
319         smp_wmb();                                319         smp_wmb();
320         log->l_curr_cycle++;                      320         log->l_curr_cycle++;
321                                                   321 
322 And the xlog_valid_lsn() function in fs/xfs/xf    322 And the xlog_valid_lsn() function in fs/xfs/xfs_log_priv.h contains
323 the corresponding read-side code fragment:        323 the corresponding read-side code fragment:
324                                                   324 
325         cur_cycle = READ_ONCE(log->l_curr_cycl    325         cur_cycle = READ_ONCE(log->l_curr_cycle);
326         smp_rmb();                                326         smp_rmb();
327         cur_block = READ_ONCE(log->l_curr_bloc    327         cur_block = READ_ONCE(log->l_curr_block);
328                                                   328 
329 Alternatively, consider the following comment     329 Alternatively, consider the following comment in function
330 perf_output_put_handle() in kernel/events/ring    330 perf_output_put_handle() in kernel/events/ring_buffer.c:
331                                                   331 
332          *   kernel                               332          *   kernel                             user
333          *                                        333          *
334          *   if (LOAD ->data_tail) {              334          *   if (LOAD ->data_tail) {            LOAD ->data_head
335          *                      (A)               335          *                      (A)             smp_rmb()       (C)
336          *      STORE $data                       336          *      STORE $data                     LOAD $data
337          *      smp_wmb()       (B)               337          *      smp_wmb()       (B)             smp_mb()        (D)
338          *      STORE ->data_head                 338          *      STORE ->data_head               STORE ->data_tail
339          *   }                                    339          *   }
340                                                   340 
341 The B/C pairing is an example of the MP patter    341 The B/C pairing is an example of the MP pattern using smp_wmb() on the
342 write side and smp_rmb() on the read side.        342 write side and smp_rmb() on the read side.
343                                                   343 
344 Of course, given that smp_mb() is strictly str    344 Of course, given that smp_mb() is strictly stronger than either smp_wmb()
345 or smp_rmb(), any code fragment that would wor    345 or smp_rmb(), any code fragment that would work with smp_rmb() and
346 smp_wmb() would also work with smp_mb() replac    346 smp_wmb() would also work with smp_mb() replacing either or both of the
347 weaker barriers.                                  347 weaker barriers.
348                                                   348 
349                                                   349 
350 Load buffering (LB)                               350 Load buffering (LB)
351 -------------------                               351 -------------------
352                                                   352 
353 The LB pattern has one CPU load from one varia    353 The LB pattern has one CPU load from one variable and then store to a
354 second, while another CPU loads from the secon    354 second, while another CPU loads from the second variable and then stores
355 to the first.  The goal is to avoid the counte    355 to the first.  The goal is to avoid the counter-intuitive situation where
356 each load reads the value written by the other    356 each load reads the value written by the other CPU's store.  In the
357 absence of any ordering it is quite possible t    357 absence of any ordering it is quite possible that this may happen, as
358 can be seen in the LB+poonceonces.litmus litmu    358 can be seen in the LB+poonceonces.litmus litmus test.
359                                                   359 
360 One way of avoiding the counter-intuitive outc    360 One way of avoiding the counter-intuitive outcome is through the use of a
361 control dependency paired with a full memory b    361 control dependency paired with a full memory barrier:
362                                                   362 
363         /* See LB+fencembonceonce+ctrlonceonce    363         /* See LB+fencembonceonce+ctrlonceonce.litmus. */
364         void CPU0(void)                           364         void CPU0(void)
365         {                                         365         {
366                 r0 = READ_ONCE(x);                366                 r0 = READ_ONCE(x);
367                 if (r0)                           367                 if (r0)
368                         WRITE_ONCE(y, 1);         368                         WRITE_ONCE(y, 1);
369         }                                         369         }
370                                                   370 
371         void CPU1(void)                           371         void CPU1(void)
372         {                                         372         {
373                 r1 = READ_ONCE(y);                373                 r1 = READ_ONCE(y);
374                 smp_mb();                         374                 smp_mb();
375                 WRITE_ONCE(x, 1);                 375                 WRITE_ONCE(x, 1);
376         }                                         376         }
377                                                   377 
378 This pairing of a control dependency in CPU0()    378 This pairing of a control dependency in CPU0() with a full memory
379 barrier in CPU1() prevents r0 and r1 from both    379 barrier in CPU1() prevents r0 and r1 from both ending up equal to 1.
380                                                   380 
381 The A/D pairing from the ring-buffer use case     381 The A/D pairing from the ring-buffer use case shown earlier also
382 illustrates LB.  Here is a repeat of the comme    382 illustrates LB.  Here is a repeat of the comment in
383 perf_output_put_handle() in kernel/events/ring    383 perf_output_put_handle() in kernel/events/ring_buffer.c, showing a
384 control dependency on the kernel side and a fu    384 control dependency on the kernel side and a full memory barrier on
385 the user side:                                    385 the user side:
386                                                   386 
387          *   kernel                               387          *   kernel                             user
388          *                                        388          *
389          *   if (LOAD ->data_tail) {              389          *   if (LOAD ->data_tail) {            LOAD ->data_head
390          *                      (A)               390          *                      (A)             smp_rmb()       (C)
391          *      STORE $data                       391          *      STORE $data                     LOAD $data
392          *      smp_wmb()       (B)               392          *      smp_wmb()       (B)             smp_mb()        (D)
393          *      STORE ->data_head                 393          *      STORE ->data_head               STORE ->data_tail
394          *   }                                    394          *   }
395          *                                        395          *
396          * Where A pairs with D, and B pairs w    396          * Where A pairs with D, and B pairs with C.
397                                                   397 
398 The kernel's control dependency between the lo    398 The kernel's control dependency between the load from ->data_tail
399 and the store to data combined with the user's    399 and the store to data combined with the user's full memory barrier
400 between the load from data and the store to ->    400 between the load from data and the store to ->data_tail prevents
401 the counter-intuitive outcome where the kernel    401 the counter-intuitive outcome where the kernel overwrites the data
402 before the user gets done loading it.             402 before the user gets done loading it.
403                                                   403 
404                                                   404 
405 Release-acquire chains                            405 Release-acquire chains
406 ----------------------                            406 ----------------------
407                                                   407 
408 Release-acquire chains are a low-overhead, fle    408 Release-acquire chains are a low-overhead, flexible, and easy-to-use
409 method of maintaining order.  However, they do    409 method of maintaining order.  However, they do have some limitations that
410 need to be fully understood.  Here is an examp    410 need to be fully understood.  Here is an example that maintains order:
411                                                   411 
412         /* See ISA2+pooncerelease+poacquirerel    412         /* See ISA2+pooncerelease+poacquirerelease+poacquireonce.litmus. */
413         void CPU0(void)                           413         void CPU0(void)
414         {                                         414         {
415                 WRITE_ONCE(x, 1);                 415                 WRITE_ONCE(x, 1);
416                 smp_store_release(&y, 1);         416                 smp_store_release(&y, 1);
417         }                                         417         }
418                                                   418 
419         void CPU1(void)                           419         void CPU1(void)
420         {                                         420         {
421                 r0 = smp_load_acquire(y);         421                 r0 = smp_load_acquire(y);
422                 smp_store_release(&z, 1);         422                 smp_store_release(&z, 1);
423         }                                         423         }
424                                                   424 
425         void CPU2(void)                           425         void CPU2(void)
426         {                                         426         {
427                 r1 = smp_load_acquire(z);         427                 r1 = smp_load_acquire(z);
428                 r2 = READ_ONCE(x);                428                 r2 = READ_ONCE(x);
429         }                                         429         }
430                                                   430 
431 In this case, if r0 and r1 both have final val    431 In this case, if r0 and r1 both have final values of 1, then r2 must
432 also have a final value of 1.                     432 also have a final value of 1.
433                                                   433 
434 The ordering in this example is stronger than     434 The ordering in this example is stronger than it needs to be.  For
435 example, ordering would still be preserved if     435 example, ordering would still be preserved if CPU1()'s smp_load_acquire()
436 invocation was replaced with READ_ONCE().         436 invocation was replaced with READ_ONCE().
437                                                   437 
438 It is tempting to assume that CPU0()'s store t    438 It is tempting to assume that CPU0()'s store to x is globally ordered
439 before CPU1()'s store to z, but this is not th    439 before CPU1()'s store to z, but this is not the case:
440                                                   440 
441         /* See Z6.0+pooncerelease+poacquirerel    441         /* See Z6.0+pooncerelease+poacquirerelease+mbonceonce.litmus. */
442         void CPU0(void)                           442         void CPU0(void)
443         {                                         443         {
444                 WRITE_ONCE(x, 1);                 444                 WRITE_ONCE(x, 1);
445                 smp_store_release(&y, 1);         445                 smp_store_release(&y, 1);
446         }                                         446         }
447                                                   447 
448         void CPU1(void)                           448         void CPU1(void)
449         {                                         449         {
450                 r0 = smp_load_acquire(y);         450                 r0 = smp_load_acquire(y);
451                 smp_store_release(&z, 1);         451                 smp_store_release(&z, 1);
452         }                                         452         }
453                                                   453 
454         void CPU2(void)                           454         void CPU2(void)
455         {                                         455         {
456                 WRITE_ONCE(z, 2);                 456                 WRITE_ONCE(z, 2);
457                 smp_mb();                         457                 smp_mb();
458                 r1 = READ_ONCE(x);                458                 r1 = READ_ONCE(x);
459         }                                         459         }
460                                                   460 
461 One might hope that if the final value of r0 i    461 One might hope that if the final value of r0 is 1 and the final value
462 of z is 2, then the final value of r1 must als    462 of z is 2, then the final value of r1 must also be 1, but it really is
463 possible for r1 to have the final value of 0.     463 possible for r1 to have the final value of 0.  The reason, of course,
464 is that in this version, CPU2() is not part of    464 is that in this version, CPU2() is not part of the release-acquire chain.
465 This situation is accounted for in the rules o    465 This situation is accounted for in the rules of thumb below.
466                                                   466 
467 Despite this limitation, release-acquire chain    467 Despite this limitation, release-acquire chains are low-overhead as
468 well as simple and powerful, at least as memor    468 well as simple and powerful, at least as memory-ordering mechanisms go.
469                                                   469 
470                                                   470 
471 Store buffering                                   471 Store buffering
472 ---------------                                   472 ---------------
473                                                   473 
474 Store buffering can be thought of as upside-do    474 Store buffering can be thought of as upside-down load buffering, so
475 that one CPU first stores to one variable and     475 that one CPU first stores to one variable and then loads from a second,
476 while another CPU stores to the second variabl    476 while another CPU stores to the second variable and then loads from the
477 first.  Preserving order requires nothing less    477 first.  Preserving order requires nothing less than full barriers:
478                                                   478 
479         /* See SB+fencembonceonces.litmus. */     479         /* See SB+fencembonceonces.litmus. */
480         void CPU0(void)                           480         void CPU0(void)
481         {                                         481         {
482                 WRITE_ONCE(x, 1);                 482                 WRITE_ONCE(x, 1);
483                 smp_mb();                         483                 smp_mb();
484                 r0 = READ_ONCE(y);                484                 r0 = READ_ONCE(y);
485         }                                         485         }
486                                                   486 
487         void CPU1(void)                           487         void CPU1(void)
488         {                                         488         {
489                 WRITE_ONCE(y, 1);                 489                 WRITE_ONCE(y, 1);
490                 smp_mb();                         490                 smp_mb();
491                 r1 = READ_ONCE(x);                491                 r1 = READ_ONCE(x);
492         }                                         492         }
493                                                   493 
494 Omitting either smp_mb() will allow both r0 an    494 Omitting either smp_mb() will allow both r0 and r1 to have final
495 values of 0, but providing both full barriers     495 values of 0, but providing both full barriers as shown above prevents
496 this counter-intuitive outcome.                   496 this counter-intuitive outcome.
497                                                   497 
498 This pattern most famously appears as part of     498 This pattern most famously appears as part of Dekker's locking
499 algorithm, but it has a much more practical us    499 algorithm, but it has a much more practical use within the Linux kernel
500 of ordering wakeups.  The following comment ta    500 of ordering wakeups.  The following comment taken from waitqueue_active()
501 in include/linux/wait.h shows the canonical pa    501 in include/linux/wait.h shows the canonical pattern:
502                                                   502 
503  *      CPU0 - waker                    CPU1 -    503  *      CPU0 - waker                    CPU1 - waiter
504  *                                                504  *
505  *                                      for (;    505  *                                      for (;;) {
506  *      @cond = true;                     prep    506  *      @cond = true;                     prepare_to_wait(&wq_head, &wait, state);
507  *      smp_mb();                         // s    507  *      smp_mb();                         // smp_mb() from set_current_state()
508  *      if (waitqueue_active(wq_head))            508  *      if (waitqueue_active(wq_head))         if (@cond)
509  *        wake_up(wq_head);                       509  *        wake_up(wq_head);                      break;
510  *                                        sche    510  *                                        schedule();
511  *                                      }         511  *                                      }
512  *                                      finish    512  *                                      finish_wait(&wq_head, &wait);
513                                                   513 
514 On CPU0, the store is to @cond and the load is    514 On CPU0, the store is to @cond and the load is in waitqueue_active().
515 On CPU1, prepare_to_wait() contains both a sto    515 On CPU1, prepare_to_wait() contains both a store to wq_head and a call
516 to set_current_state(), which contains an smp_    516 to set_current_state(), which contains an smp_mb() barrier; the load is
517 "if (@cond)".  The full barriers prevent the u    517 "if (@cond)".  The full barriers prevent the undesirable outcome where
518 CPU1 puts the waiting task to sleep and CPU0 f    518 CPU1 puts the waiting task to sleep and CPU0 fails to wake it up.
519                                                   519 
520 Note that use of locking can greatly simplify     520 Note that use of locking can greatly simplify this pattern.
521                                                   521 
522                                                   522 
523 Rules of thumb                                    523 Rules of thumb
524 ==============                                    524 ==============
525                                                   525 
526 There might seem to be no pattern governing wh    526 There might seem to be no pattern governing what ordering primitives are
527 needed in which situations, but this is not th    527 needed in which situations, but this is not the case.  There is a pattern
528 based on the relation between the accesses lin    528 based on the relation between the accesses linking successive CPUs in a
529 given litmus test.  There are three types of l    529 given litmus test.  There are three types of linkage:
530                                                   530 
531 1.      Write-to-read, where the next CPU read    531 1.      Write-to-read, where the next CPU reads the value that the
532         previous CPU wrote.  The LB litmus-tes    532         previous CPU wrote.  The LB litmus-test patterns contain only
533         this type of relation.  In formal memo    533         this type of relation.  In formal memory-modeling texts, this
534         relation is called "reads-from" and is    534         relation is called "reads-from" and is usually abbreviated "rf".
535                                                   535 
536 2.      Read-to-write, where the next CPU over    536 2.      Read-to-write, where the next CPU overwrites the value that the
537         previous CPU read.  The SB litmus test    537         previous CPU read.  The SB litmus test contains only this type
538         of relation.  In formal memory-modelin    538         of relation.  In formal memory-modeling texts, this relation is
539         often called "from-reads" and is somet    539         often called "from-reads" and is sometimes abbreviated "fr".
540                                                   540 
541 3.      Write-to-write, where the next CPU ove    541 3.      Write-to-write, where the next CPU overwrites the value written
542         by the previous CPU.  The Z6.0 litmus     542         by the previous CPU.  The Z6.0 litmus test pattern contains a
543         write-to-write relation between the la    543         write-to-write relation between the last access of CPU1() and
544         the first access of CPU2().  In formal    544         the first access of CPU2().  In formal memory-modeling texts,
545         this relation is often called "coheren    545         this relation is often called "coherence order" and is sometimes
546         abbreviated "co".  In the C++ standard    546         abbreviated "co".  In the C++ standard, it is instead called
547         "modification order" and often abbrevi    547         "modification order" and often abbreviated "mo".
548                                                   548 
549 The strength of memory ordering required for a    549 The strength of memory ordering required for a given litmus test to
550 avoid a counter-intuitive outcome depends on t    550 avoid a counter-intuitive outcome depends on the types of relations
551 linking the memory accesses for the outcome in    551 linking the memory accesses for the outcome in question:
552                                                   552 
553 o       If all links are write-to-read links,     553 o       If all links are write-to-read links, then the weakest
554         possible ordering within each CPU suff    554         possible ordering within each CPU suffices.  For example, in
555         the LB litmus test, a control dependen    555         the LB litmus test, a control dependency was enough to do the
556         job.                                      556         job.
557                                                   557 
558 o       If all but one of the links are write-    558 o       If all but one of the links are write-to-read links, then a
559         release-acquire chain suffices.  Both     559         release-acquire chain suffices.  Both the MP and the ISA2
560         litmus tests illustrate this case.        560         litmus tests illustrate this case.
561                                                   561 
562 o       If more than one of the links are some    562 o       If more than one of the links are something other than
563         write-to-read links, then a full memor    563         write-to-read links, then a full memory barrier is required
564         between each successive pair of non-wr    564         between each successive pair of non-write-to-read links.  This
565         case is illustrated by the Z6.0 litmus    565         case is illustrated by the Z6.0 litmus tests, both in the
566         locking and in the release-acquire sec    566         locking and in the release-acquire sections.
567                                                   567 
568 However, if you find yourself having to stretc    568 However, if you find yourself having to stretch these rules of thumb
569 to fit your situation, you should consider cre    569 to fit your situation, you should consider creating a litmus test and
570 running it on the model.                          570 running it on the model.
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php