1 ====================================== 2 vlocks for Bare-Metal Mutual Exclusion 3 ====================================== 4 5 Voting Locks, or "vlocks" provide a simple low 6 mechanism, with reasonable but minimal require 7 system. 8 9 These are intended to be used to coordinate cr 10 which are otherwise non-coherent, in situation 11 provides no other mechanism to support this an 12 cannot be used. 13 14 15 vlocks make use of the atomicity provided by t 16 writes to a single memory location. To arbitr 17 itself", by storing a unique number to a commo 18 final value seen in that memory location when 19 cast identifies the winner. 20 21 In order to make sure that the election produc 22 in finite time, a CPU will only enter the elec 23 no winner has been chosen and the election doe 24 started yet. 25 26 27 Algorithm 28 --------- 29 30 The easiest way to explain the vlocks algorith 31 32 33 int currently_voting[NR_CPUS] = { 0, } 34 int last_vote = -1; /* no votes yet */ 35 36 bool vlock_trylock(int this_cpu) 37 { 38 /* signal our desire to vote * 39 currently_voting[this_cpu] = 1 40 if (last_vote != -1) { 41 /* someone already vol 42 currently_voting[this_ 43 return false; /* not o 44 } 45 46 /* let's suggest ourself */ 47 last_vote = this_cpu; 48 currently_voting[this_cpu] = 0 49 50 /* then wait until everyone el 51 for_each_cpu(i) { 52 while (currently_votin 53 /* wait */; 54 } 55 56 /* result */ 57 if (last_vote == this_cpu) 58 return true; /* we won 59 return false; 60 } 61 62 bool vlock_unlock(void) 63 { 64 last_vote = -1; 65 } 66 67 68 The currently_voting[] array provides a way fo 69 whether an election is in progress, and plays 70 "entering" array in Lamport's bakery algorithm 71 72 However, once the election has started, the un 73 atomicity is used to pick the winner. This av 74 priority rule to act as a tie-breaker, or any 75 overflow. 76 77 As long as the last_vote variable is globally 78 will contain only one value that won't change 79 its currently_voting flag. 80 81 82 Features and limitations 83 ------------------------ 84 85 * vlocks are not intended to be fair. In the 86 _last_ CPU which attempts to get the lock w 87 to win. 88 89 vlocks are therefore best suited to situati 90 to pick a unique winner, but it does not ma 91 wins. 92 93 * Like other similar mechanisms, vlocks will 94 number of CPUs. 95 96 vlocks can be cascaded in a voting hierarch 97 if necessary, as in the following hypotheti 98 99 /* first level: local election */ 100 my_town = towns[(this_cpu >> 4) & 0xf] 101 I_won = vlock_trylock(my_town, this_cp 102 if (I_won) { 103 /* we won the town election, l 104 my_state = states[(this_cpu >> 105 I_won = vlock_lock(my_state, t 106 if (I_won) { 107 /* and so on */ 108 I_won = vlock_lock(the 109 if (I_won) { 110 /* ... */ 111 } 112 vlock_unlock(the_whole 113 } 114 vlock_unlock(my_state); 115 } 116 vlock_unlock(my_town); 117 118 119 ARM implementation 120 ------------------ 121 122 The current ARM implementation [2] contains so 123 the basic algorithm: 124 125 * By packing the members of the currently_vot 126 we can read the whole array in one transact 127 of CPUs potentially contending the lock is 128 reduces the number of round-trips required 129 130 In the ARM implementation, this means that 131 and comparison:: 132 133 LDR Rt, [Rn] 134 CMP Rt, #0 135 136 ...in place of code equivalent to:: 137 138 LDRB Rt, [Rn] 139 CMP Rt, #0 140 LDRBEQ Rt, [Rn, #1] 141 CMPEQ Rt, #0 142 LDRBEQ Rt, [Rn, #2] 143 CMPEQ Rt, #0 144 LDRBEQ Rt, [Rn, #3] 145 CMPEQ Rt, #0 146 147 This cuts down on the fast-path latency, as 148 reducing bus contention in contended cases. 149 150 The optimisation relies on the fact that th 151 guarantees coherency between overlapping me 152 different sizes, similarly to many other ar 153 we do not care which element of currently_v 154 bits of Rt, so there is no need to worry ab 155 optimisation. 156 157 If there are too many CPUs to read the curr 158 one transaction then multiple transactions 159 implementation uses a simple loop of word-s 160 case. The number of transactions is still 161 required if bytes were loaded individually. 162 163 164 In principle, we could aggregate further by 165 to keep the code simple this was not attemp 166 implementation. 167 168 169 * vlocks are currently only used to coordinat 170 unable to enable their caches yet. This me 171 implementation removes many of the barriers 172 when executing the algorithm in cached memo 173 174 packing of the currently_voting array does 175 memory unless all CPUs contending the lock 176 to cache writebacks from one CPU clobbering 177 CPUs. (Though if all the CPUs are cache-co 178 probably be using proper spinlocks instead 179 180 181 * The "no votes yet" value used for the last_ 182 -1 as in the pseudocode). This allows stat 183 to be implicitly initialised to an unlocked 184 them in .bss. 185 186 An offset is added to each CPU's ID for the 187 variable, so that no CPU uses the value 0 f 188 189 190 Colophon 191 -------- 192 193 Originally created and documented by Dave Mart 194 use in ARM-based big.LITTLE platforms, with re 195 received from Nicolas Pitre and Achin Gupta. 196 grabbing most of this text out of the relevant 197 up the pseudocode. 198 199 Copyright (C) 2012-2013 Linaro Limited 200 Distributed under the terms of Version 2 of th 201 License, as defined in linux/COPYING. 202 203 204 References 205 ---------- 206 207 [1] Lamport, L. "A New Solution of Dijkstra's 208 Problem", Communications of the ACM 17, 8 209 210 https://en.wikipedia.org/wiki/Lamport%27s_ 211 212 [2] linux/arch/arm/common/vlock.S, www.kernel.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.