~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/arch/arm/cluster-pm-race-avoidance.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/arch/arm/cluster-pm-race-avoidance.rst (Version linux-6.12-rc7) and /Documentation/arch/sparc/cluster-pm-race-avoidance.rst (Version linux-6.11.7)


  1 ==============================================    
  2 Cluster-wide Power-up/power-down race avoidanc    
  3 ==============================================    
  4                                                   
  5 This file documents the algorithm which is use    
  6 cluster setup and teardown operations and to m    
  7 controls safely.                                  
  8                                                   
  9 The section "Rationale" explains what the algo    
 10 needed.  "Basic model" explains general concep    
 11 of the system.  The other sections explain the    
 12 algorithm in use.                                 
 13                                                   
 14                                                   
 15 Rationale                                         
 16 ---------                                         
 17                                                   
 18 In a system containing multiple CPUs, it is de    
 19 ability to turn off individual CPUs when the s    
 20 power consumption and thermal dissipation.        
 21                                                   
 22 In a system containing multiple clusters of CP    
 23 to have the ability to turn off entire cluster    
 24                                                   
 25 Turning entire clusters off and on is a risky     
 26 involves performing potentially destructive op    
 27 of independently running CPUs, while the OS co    
 28 means that we need some coordination in order     
 29 cluster-level operations are only performed wh    
 30 so.                                               
 31                                                   
 32 Simple locking may not be sufficient to solve     
 33 mechanisms like Linux spinlocks may rely on co    
 34 are not immediately enabled when a cluster pow    
 35 disabling those mechanisms may itself be a non    
 36 writing some hardware registers and invalidati    
 37 methods of coordination are required in order     
 38 power-down and power-up at the cluster level.     
 39                                                   
 40 The mechanism presented in this document descr    
 41 based protocol for performing the needed coord    
 42 lightweight as possible, while providing the r    
 43                                                   
 44                                                   
 45 Basic model                                       
 46 -----------                                       
 47                                                   
 48 Each cluster and CPU is assigned a state, as f    
 49                                                   
 50         - DOWN                                    
 51         - COMING_UP                               
 52         - UP                                      
 53         - GOING_DOWN                              
 54                                                   
 55 ::                                                
 56                                                   
 57             +---------> UP ----------+            
 58             |                        v            
 59                                                   
 60         COMING_UP                GOING_DOWN       
 61                                                   
 62             ^                        |            
 63             +--------- DOWN <--------+            
 64                                                   
 65                                                   
 66 DOWN:                                             
 67         The CPU or cluster is not coherent, an    
 68         suspended, or is ready to be powered o    
 69                                                   
 70 COMING_UP:                                        
 71         The CPU or cluster has committed to mo    
 72         It may be part way through the process    
 73         enabling coherency.                       
 74                                                   
 75 UP:                                               
 76         The CPU or cluster is active and coher    
 77         level.  A CPU in this state is not nec    
 78         actively by the kernel.                   
 79                                                   
 80 GOING_DOWN:                                       
 81         The CPU or cluster has committed to mo    
 82         state.  It may be part way through the    
 83         coherency exit.                           
 84                                                   
 85                                                   
 86 Each CPU has one of these states assigned to i    
 87 The CPU states are described in the "CPU state    
 88                                                   
 89 Each cluster is also assigned a state, but it     
 90 state value into two parts (the "cluster" stat    
 91 to introduce additional states in order to avo    
 92 CPUs in the cluster simultaneously modifying t    
 93 level states are described in the "Cluster sta    
 94                                                   
 95 To help distinguish the CPU states from cluste    
 96 discussion, the state names are given a `CPU_`    
 97 and a `CLUSTER_` or `INBOUND_` prefix for the     
 98                                                   
 99                                                   
100 CPU state                                         
101 ---------                                         
102                                                   
103 In this algorithm, each individual core in a m    
104 referred to as a "CPU".  CPUs are assumed to b    
105 therefore, a CPU can only be doing one thing a    
106                                                   
107 This means that CPUs fit the basic model close    
108                                                   
109 The algorithm defines the following states for    
110                                                   
111         - CPU_DOWN                                
112         - CPU_COMING_UP                           
113         - CPU_UP                                  
114         - CPU_GOING_DOWN                          
115                                                   
116 ::                                                
117                                                   
118          cluster setup and                        
119         CPU setup complete          policy dec    
120               +-----------> CPU_UP -----------    
121               |                                   
122                                                   
123         CPU_COMING_UP                   CPU_GO    
124                                                   
125               ^                                   
126               +----------- CPU_DOWN <---------    
127          policy decision           CPU teardow    
128         or hardware event                         
129                                                   
130                                                   
131 The definitions of the four states correspond     
132 the basic model.                                  
133                                                   
134 Transitions between states occur as follows.      
135                                                   
136 A trigger event (spontaneous) means that the C    
137 next state as a result of making local progres    
138 requirement for any external event to happen.     
139                                                   
140                                                   
141 CPU_DOWN:                                         
142         A CPU reaches the CPU_DOWN state when     
143         power-down.  On reaching this state, t    
144         power itself down or suspend itself, v    
145         firmware call.                            
146                                                   
147         Next state:                               
148                 CPU_COMING_UP                     
149         Conditions:                               
150                 none                              
151                                                   
152         Trigger events:                           
153                 a) an explicit hardware power-    
154                    from a policy decision on a    
155                                                   
156                 b) a hardware event, such as a    
157                                                   
158                                                   
159 CPU_COMING_UP:                                    
160         A CPU cannot start participating in ha    
161         cluster is set up and coherent.  If th    
162         then the CPU will wait in the CPU_COMI    
163         cluster has been set up.                  
164                                                   
165         Next state:                               
166                 CPU_UP                            
167         Conditions:                               
168                 The CPU's parent cluster must     
169         Trigger events:                           
170                 Transition of the parent clust    
171                                                   
172         Refer to the "Cluster state" section f    
173         CLUSTER_UP state.                         
174                                                   
175                                                   
176 CPU_UP:                                           
177         When a CPU reaches the CPU_UP state, i    
178         start participating in local coherency    
179                                                   
180         This is done by jumping to the kernel'    
181                                                   
182         Note that the definition of this state    
183         from the basic model definition: CPU_U    
184         CPU is coherent yet, but it does mean     
185         the kernel.  The kernel handles the re    
186         procedure, so the remaining steps are     
187         race avoidance algorithm.                 
188                                                   
189         The CPU remains in this state until an    
190         is made to shut down or suspend the CP    
191                                                   
192         Next state:                               
193                 CPU_GOING_DOWN                    
194         Conditions:                               
195                 none                              
196         Trigger events:                           
197                 explicit policy decision          
198                                                   
199                                                   
200 CPU_GOING_DOWN:                                   
201         While in this state, the CPU exits coh    
202         operations required to achieve this (s    
203         caches).                                  
204                                                   
205         Next state:                               
206                 CPU_DOWN                          
207         Conditions:                               
208                 local CPU teardown complete       
209         Trigger events:                           
210                 (spontaneous)                     
211                                                   
212                                                   
213 Cluster state                                     
214 -------------                                     
215                                                   
216 A cluster is a group of connected CPUs with so    
217 Because a cluster contains multiple CPUs, it c    
218 things at the same time.  This has some implic    
219 CPU can start up while another CPU is tearing     
220                                                   
221 In this discussion, the "outbound side" is the    
222 as seen by a CPU tearing the cluster down.  Th    
223 view of the cluster state as seen by a CPU set    
224                                                   
225 In order to enable safe coordination in such s    
226 that a CPU which is setting up the cluster can    
227 independently of the CPU which is tearing down    
228 reason, the cluster state is split into two pa    
229                                                   
230         "cluster" state: The global state of t    
231         on the outbound side:                     
232                                                   
233                 - CLUSTER_DOWN                    
234                 - CLUSTER_UP                      
235                 - CLUSTER_GOING_DOWN              
236                                                   
237         "inbound" state: The state of the clus    
238                                                   
239                 - INBOUND_NOT_COMING_UP           
240                 - INBOUND_COMING_UP               
241                                                   
242                                                   
243         The different pairings of these states    
244         states for the cluster as a whole::       
245                                                   
246                                     CLUSTER_UP    
247                   +==========> INBOUND_NOT_COM    
248                   #                               
249                                                   
250              CLUSTER_UP     <----+                
251           INBOUND_COMING_UP      |                
252                                                   
253                   ^             CLUSTER_GOING_    
254                   #              INBOUND_COMIN    
255                                                   
256             CLUSTER_DOWN         |                
257           INBOUND_COMING_UP <----+                
258                                                   
259                   ^                               
260                   +===========     CLUSTER_DOW    
261                                INBOUND_NOT_COM    
262                                                   
263         Transitions -----> can only be made by    
264         only involve changes to the "cluster"     
265                                                   
266         Transitions ===##> can only be made by    
267         involve changes to the "inbound" state    
268         further transition possible on the out    
269         outbound CPU has put the cluster into     
270                                                   
271         The race avoidance algorithm does not     
272         which exact CPUs within the cluster pl    
273         be decided in advance by some other me    
274         "Last man and first man selection" for    
275                                                   
276                                                   
277         CLUSTER_DOWN/INBOUND_NOT_COMING_UP is     
278         cluster can actually be powered down.     
279                                                   
280         The parallelism of the inbound and out    
281         the existence of two different paths f    
282         INBOUND_NOT_COMING_UP (corresponding t    
283         model) to CLUSTER_DOWN/INBOUND_COMING_    
284         COMING_UP in the basic model).  The se    
285         teardown completely.                      
286                                                   
287         CLUSTER_UP/INBOUND_COMING_UP is equiva    
288         model.  The final transition to CLUSTE    
289         is trivial and merely resets the state    
290         next cycle.                               
291                                                   
292         Details of the allowable transitions f    
293                                                   
294         The next state in each case is notated    
295                                                   
296                 <cluster state>/<inbound state    
297                                                   
298         where the <transitioner> is the side o    
299         can occur; either the inbound or the o    
300                                                   
301                                                   
302 CLUSTER_DOWN/INBOUND_NOT_COMING_UP:               
303         Next state:                               
304                 CLUSTER_DOWN/INBOUND_COMING_UP    
305         Conditions:                               
306                 none                              
307                                                   
308         Trigger events:                           
309                 a) an explicit hardware power-    
310                    from a policy decision on a    
311                                                   
312                 b) a hardware event, such as a    
313                                                   
314                                                   
315 CLUSTER_DOWN/INBOUND_COMING_UP:                   
316                                                   
317         In this state, an inbound CPU sets up     
318         enabling of hardware coherency at the     
319         other operations (such as cache invali    
320         in order to achieve this.                 
321                                                   
322         The purpose of this state is to do suf    
323         setup to enable other CPUs in the clus    
324         safely.                                   
325                                                   
326         Next state:                               
327                 CLUSTER_UP/INBOUND_COMING_UP (    
328         Conditions:                               
329                 cluster-level setup and hardwa    
330         Trigger events:                           
331                 (spontaneous)                     
332                                                   
333                                                   
334 CLUSTER_UP/INBOUND_COMING_UP:                     
335                                                   
336         Cluster-level setup is complete and ha    
337         enabled for the cluster.  Other CPUs i    
338         enter coherency.                          
339                                                   
340         This is a transient state, leading imm    
341         CLUSTER_UP/INBOUND_NOT_COMING_UP.  All    
342         should consider treat these two states    
343                                                   
344         Next state:                               
345                 CLUSTER_UP/INBOUND_NOT_COMING_    
346         Conditions:                               
347                 none                              
348         Trigger events:                           
349                 (spontaneous)                     
350                                                   
351                                                   
352 CLUSTER_UP/INBOUND_NOT_COMING_UP:                 
353                                                   
354         Cluster-level setup is complete and ha    
355         enabled for the cluster.  Other CPUs i    
356         enter coherency.                          
357                                                   
358         The cluster will remain in this state     
359         made to power the cluster down.           
360                                                   
361         Next state:                               
362                 CLUSTER_GOING_DOWN/INBOUND_NOT    
363         Conditions:                               
364                 none                              
365         Trigger events:                           
366                 policy decision to power down     
367                                                   
368                                                   
369 CLUSTER_GOING_DOWN/INBOUND_NOT_COMING_UP:         
370                                                   
371         An outbound CPU is tearing the cluster    
372         must wait in this state until all CPUs    
373         CPU_DOWN state.                           
374                                                   
375         When all CPUs are in the CPU_DOWN stat    
376         down, for example by cleaning data cac    
377         cluster-level coherency.                  
378                                                   
379         To avoid wasteful unnecessary teardown    
380         should check the inbound cluster state    
381         transitions to INBOUND_COMING_UP.  Alt    
382         CPUs can be checked for entry into CPU    
383                                                   
384                                                   
385         Next states:                              
386                                                   
387         CLUSTER_DOWN/INBOUND_NOT_COMING_UP (ou    
388                 Conditions:                       
389                         cluster torn down and     
390                 Trigger events:                   
391                         (spontaneous)             
392                                                   
393         CLUSTER_GOING_DOWN/INBOUND_COMING_UP (    
394                 Conditions:                       
395                         none                      
396                                                   
397                 Trigger events:                   
398                         a) an explicit hardwar    
399                            resulting from a po    
400                            CPU;                   
401                                                   
402                         b) a hardware event, s    
403                                                   
404                                                   
405 CLUSTER_GOING_DOWN/INBOUND_COMING_UP:             
406                                                   
407         The cluster is (or was) being torn dow    
408         come online in the meantime and is try    
409         again.                                    
410                                                   
411         If the outbound CPU observes this stat    
412                                                   
413                 a) back out of teardown, resto    
414                    CLUSTER_UP state;              
415                                                   
416                 b) finish tearing the cluster     
417                    in the CLUSTER_DOWN state;     
418                    set up the cluster again fr    
419                                                   
420         Choice (a) permits the removal of some    
421         unnecessary teardown and setup operati    
422         the cluster is not really going to be     
423                                                   
424                                                   
425         Next states:                              
426                                                   
427         CLUSTER_UP/INBOUND_COMING_UP (outbound    
428                 Conditions:                       
429                                 cluster-level     
430                                 coherency comp    
431                                                   
432                 Trigger events:                   
433                                 (spontaneous)     
434                                                   
435         CLUSTER_DOWN/INBOUND_COMING_UP (outbou    
436                 Conditions:                       
437                         cluster torn down and     
438                                                   
439                 Trigger events:                   
440                         (spontaneous)             
441                                                   
442                                                   
443 Last man and First man selection                  
444 --------------------------------                  
445                                                   
446 The CPU which performs cluster tear-down opera    
447 is commonly referred to as the "last man".        
448                                                   
449 The CPU which performs cluster setup on the in    
450 referred to as the "first man".                   
451                                                   
452 The race avoidance algorithm documented above     
453 mechanism to choose which CPUs should play the    
454                                                   
455                                                   
456 Last man:                                         
457                                                   
458 When shutting down the cluster, all the CPUs i    
459 executing Linux and hence coherent.  Therefore    
460 be used to select a last man safely, before th    
461 non-coherent.                                     
462                                                   
463                                                   
464 First man:                                        
465                                                   
466 Because CPUs may power up asynchronously in re    
467 events, a dynamic mechanism is needed to make     
468 attempts to play the first man role and do the    
469 initialisation: any other CPUs must wait for t    
470 proceeding.                                       
471                                                   
472 Cluster-level initialisation may involve actio    
473 coherency controls in the bus fabric.             
474                                                   
475 The current implementation in mcpm_head.S uses    
476 mechanism to do this arbitration.  This mechan    
477 detail in vlocks.txt.                             
478                                                   
479                                                   
480 Features and Limitations                          
481 ------------------------                          
482                                                   
483 Implementation:                                   
484                                                   
485         The current ARM-based implementation i    
486         arch/arm/common/mcpm_head.S (low-level    
487         arch/arm/common/mcpm_entry.c (everythi    
488                                                   
489         __mcpm_cpu_going_down() signals the tr    
490         CPU_GOING_DOWN state.                     
491                                                   
492         __mcpm_cpu_down() signals the transiti    
493         state.                                    
494                                                   
495         A CPU transitions to CPU_COMING_UP and    
496         low-level power-up code in mcpm_head.S    
497         involve CPU-specific setup code, but i    
498         implementation it does not.               
499                                                   
500         __mcpm_outbound_enter_critical() and _    
501         handle transitions from CLUSTER_UP to     
502         and from there to CLUSTER_DOWN or back    
503         the case of an aborted cluster power-d    
504                                                   
505         These functions are more complex than     
506         functions due to the extra inter-CPU c    
507         is needed for safe transitions at the     
508                                                   
509         A cluster transitions from CLUSTER_DOW    
510         the low-level power-up code in mcpm_he    
511         typically involves platform-specific s    
512         provided by the platform-specific powe    
513         function registered via mcpm_sync_init    
514                                                   
515 Deep topologies:                                  
516                                                   
517         As currently described and implemented    
518         support CPU topologies involving more     
519         clusters of clusters are not supported    
520         extended by replicating the cluster-le    
521         additional topological levels, and mod    
522         rules for the intermediate (non-outerm    
523                                                   
524                                                   
525 Colophon                                          
526 --------                                          
527                                                   
528 Originally created and documented by Dave Mart    
529 collaboration with Nicolas Pitre and Achin Gup    
530                                                   
531 Copyright (C) 2012-2013  Linaro Limited           
532 Distributed under the terms of Version 2 of th    
533 License, as defined in linux/COPYING.             
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php