~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/arch/arm/vlocks.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 ======================================
  2 vlocks for Bare-Metal Mutual Exclusion
  3 ======================================
  4 
  5 Voting Locks, or "vlocks" provide a simple low-level mutual exclusion
  6 mechanism, with reasonable but minimal requirements on the memory
  7 system.
  8 
  9 These are intended to be used to coordinate critical activity among CPUs
 10 which are otherwise non-coherent, in situations where the hardware
 11 provides no other mechanism to support this and ordinary spinlocks
 12 cannot be used.
 13 
 14 
 15 vlocks make use of the atomicity provided by the memory system for
 16 writes to a single memory location.  To arbitrate, every CPU "votes for
 17 itself", by storing a unique number to a common memory location.  The
 18 final value seen in that memory location when all the votes have been
 19 cast identifies the winner.
 20 
 21 In order to make sure that the election produces an unambiguous result
 22 in finite time, a CPU will only enter the election in the first place if
 23 no winner has been chosen and the election does not appear to have
 24 started yet.
 25 
 26 
 27 Algorithm
 28 ---------
 29 
 30 The easiest way to explain the vlocks algorithm is with some pseudo-code::
 31 
 32 
 33         int currently_voting[NR_CPUS] = { 0, };
 34         int last_vote = -1; /* no votes yet */
 35 
 36         bool vlock_trylock(int this_cpu)
 37         {
 38                 /* signal our desire to vote */
 39                 currently_voting[this_cpu] = 1;
 40                 if (last_vote != -1) {
 41                         /* someone already volunteered himself */
 42                         currently_voting[this_cpu] = 0;
 43                         return false; /* not ourself */
 44                 }
 45 
 46                 /* let's suggest ourself */
 47                 last_vote = this_cpu;
 48                 currently_voting[this_cpu] = 0;
 49 
 50                 /* then wait until everyone else is done voting */
 51                 for_each_cpu(i) {
 52                         while (currently_voting[i] != 0)
 53                                 /* wait */;
 54                 }
 55 
 56                 /* result */
 57                 if (last_vote == this_cpu)
 58                         return true; /* we won */
 59                 return false;
 60         }
 61 
 62         bool vlock_unlock(void)
 63         {
 64                 last_vote = -1;
 65         }
 66 
 67 
 68 The currently_voting[] array provides a way for the CPUs to determine
 69 whether an election is in progress, and plays a role analogous to the
 70 "entering" array in Lamport's bakery algorithm [1].
 71 
 72 However, once the election has started, the underlying memory system
 73 atomicity is used to pick the winner.  This avoids the need for a static
 74 priority rule to act as a tie-breaker, or any counters which could
 75 overflow.
 76 
 77 As long as the last_vote variable is globally visible to all CPUs, it
 78 will contain only one value that won't change once every CPU has cleared
 79 its currently_voting flag.
 80 
 81 
 82 Features and limitations
 83 ------------------------
 84 
 85  * vlocks are not intended to be fair.  In the contended case, it is the
 86    _last_ CPU which attempts to get the lock which will be most likely
 87    to win.
 88 
 89    vlocks are therefore best suited to situations where it is necessary
 90    to pick a unique winner, but it does not matter which CPU actually
 91    wins.
 92 
 93  * Like other similar mechanisms, vlocks will not scale well to a large
 94    number of CPUs.
 95 
 96    vlocks can be cascaded in a voting hierarchy to permit better scaling
 97    if necessary, as in the following hypothetical example for 4096 CPUs::
 98 
 99         /* first level: local election */
100         my_town = towns[(this_cpu >> 4) & 0xf];
101         I_won = vlock_trylock(my_town, this_cpu & 0xf);
102         if (I_won) {
103                 /* we won the town election, let's go for the state */
104                 my_state = states[(this_cpu >> 8) & 0xf];
105                 I_won = vlock_lock(my_state, this_cpu & 0xf));
106                 if (I_won) {
107                         /* and so on */
108                         I_won = vlock_lock(the_whole_country, this_cpu & 0xf];
109                         if (I_won) {
110                                 /* ... */
111                         }
112                         vlock_unlock(the_whole_country);
113                 }
114                 vlock_unlock(my_state);
115         }
116         vlock_unlock(my_town);
117 
118 
119 ARM implementation
120 ------------------
121 
122 The current ARM implementation [2] contains some optimisations beyond
123 the basic algorithm:
124 
125  * By packing the members of the currently_voting array close together,
126    we can read the whole array in one transaction (providing the number
127    of CPUs potentially contending the lock is small enough).  This
128    reduces the number of round-trips required to external memory.
129 
130    In the ARM implementation, this means that we can use a single load
131    and comparison::
132 
133         LDR     Rt, [Rn]
134         CMP     Rt, #0
135 
136    ...in place of code equivalent to::
137 
138         LDRB    Rt, [Rn]
139         CMP     Rt, #0
140         LDRBEQ  Rt, [Rn, #1]
141         CMPEQ   Rt, #0
142         LDRBEQ  Rt, [Rn, #2]
143         CMPEQ   Rt, #0
144         LDRBEQ  Rt, [Rn, #3]
145         CMPEQ   Rt, #0
146 
147    This cuts down on the fast-path latency, as well as potentially
148    reducing bus contention in contended cases.
149 
150    The optimisation relies on the fact that the ARM memory system
151    guarantees coherency between overlapping memory accesses of
152    different sizes, similarly to many other architectures.  Note that
153    we do not care which element of currently_voting appears in which
154    bits of Rt, so there is no need to worry about endianness in this
155    optimisation.
156 
157    If there are too many CPUs to read the currently_voting array in
158    one transaction then multiple transactions are still required.  The
159    implementation uses a simple loop of word-sized loads for this
160    case.  The number of transactions is still fewer than would be
161    required if bytes were loaded individually.
162 
163 
164    In principle, we could aggregate further by using LDRD or LDM, but
165    to keep the code simple this was not attempted in the initial
166    implementation.
167 
168 
169  * vlocks are currently only used to coordinate between CPUs which are
170    unable to enable their caches yet.  This means that the
171    implementation removes many of the barriers which would be required
172    when executing the algorithm in cached memory.
173 
174    packing of the currently_voting array does not work with cached
175    memory unless all CPUs contending the lock are cache-coherent, due
176    to cache writebacks from one CPU clobbering values written by other
177    CPUs.  (Though if all the CPUs are cache-coherent, you should be
178    probably be using proper spinlocks instead anyway).
179 
180 
181  * The "no votes yet" value used for the last_vote variable is 0 (not
182    -1 as in the pseudocode).  This allows statically-allocated vlocks
183    to be implicitly initialised to an unlocked state simply by putting
184    them in .bss.
185 
186    An offset is added to each CPU's ID for the purpose of setting this
187    variable, so that no CPU uses the value 0 for its ID.
188 
189 
190 Colophon
191 --------
192 
193 Originally created and documented by Dave Martin for Linaro Limited, for
194 use in ARM-based big.LITTLE platforms, with review and input gratefully
195 received from Nicolas Pitre and Achin Gupta.  Thanks to Nicolas for
196 grabbing most of this text out of the relevant mail thread and writing
197 up the pseudocode.
198 
199 Copyright (C) 2012-2013  Linaro Limited
200 Distributed under the terms of Version 2 of the GNU General Public
201 License, as defined in linux/COPYING.
202 
203 
204 References
205 ----------
206 
207 [1] Lamport, L. "A New Solution of Dijkstra's Concurrent Programming
208     Problem", Communications of the ACM 17, 8 (August 1974), 453-455.
209 
210     https://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm
211 
212 [2] linux/arch/arm/common/vlock.S, www.kernel.org.

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php