~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/arch/powerpc/eeh-pci-error-recovery.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/arch/powerpc/eeh-pci-error-recovery.rst (Version linux-6.12-rc7) and /Documentation/arch/i386/eeh-pci-error-recovery.rst (Version linux-3.10.108)


  1 ==========================                        
  2 PCI Bus EEH Error Recovery                        
  3 ==========================                        
  4                                                   
  5 Linas Vepstas <linas@austin.ibm.com>               
  6                                                   
  7 12 January 2005                                   
  8                                                   
  9                                                   
 10 Overview:                                         
 11 ---------                                         
 12 The IBM POWER-based pSeries and iSeries comput    
 13 controller chips that have extended capabiliti    
 14 reporting a large variety of PCI bus error con    
 15 go under the name of "EEH", for "Enhanced Erro    
 16 hardware features allow PCI bus errors to be c    
 17 card to be "rebooted", without also having to     
 18 system.                                           
 19                                                   
 20 This is in contrast to traditional PCI error h    
 21 PCI chip is wired directly to the CPU, and an     
 22 a CPU machine-check/check-stop condition, halt    
 23 Another "traditional" technique is to ignore s    
 24 can lead to data corruption, both of user data    
 25 hung/unresponsive adapters, or system crashes/    
 26 the idea behind EEH is that the operating syst    
 27 reliable and robust by protecting it from PCI     
 28 the OS the ability to "reboot"/recover individ    
 29                                                   
 30 Future systems from other vendors, based on th    
 31 may contain similar features.                     
 32                                                   
 33                                                   
 34 Causes of EEH Errors                              
 35 --------------------                              
 36 EEH was originally designed to guard against h    
 37 as PCI cards dying from heat, humidity, dust,     
 38 electrical connections. The vast majority of E    
 39 "real life" are due to either poorly seated PC    
 40 unfortunately quite commonly, due to device dr    
 41 bugs, and sometimes PCI card hardware bugs.       
 42                                                   
 43 The most common software bug, is one that caus    
 44 attempt to DMA to a location in system memory     
 45 reserved for DMA access for that card.  This i    
 46 as it prevents what; otherwise, would have bee    
 47 corruption caused by the bad DMA.  A number of    
 48 bugs have been found and fixed in this way ove    
 49 years.  Other possible causes of EEH errors in    
 50 address line parity errors (for example, due t    
 51 connectivity due to a poorly seated card), and    
 52 errors (due to software, device firmware, or d    
 53 The vast majority of "true hardware failures"     
 54 physically removing and re-seating the PCI car    
 55                                                   
 56                                                   
 57 Detection and Recovery                            
 58 ----------------------                            
 59 In the following discussion, a generic overvie    
 60 and recover from EEH errors will be presented.    
 61 by an overview of how the current implementati    
 62 kernel does it.  The actual implementation is     
 63 and some of the finer points are still being d    
 64 may in turn be swayed if or when other archite    
 65 similar functionality.                            
 66                                                   
 67 When a PCI Host Bridge (PHB, the bus controlle    
 68 PCI bus to the system CPU electronics complex)    
 69 condition, it will "isolate" the affected PCI     
 70 will block all writes (either to the card from    
 71 from the card to the system), and it will caus    
 72 return all-ff's (0xff, 0xffff, 0xffffffff for     
 73 This value was chosen because it is the same v    
 74 get if the device was physically unplugged fro    
 75 This includes access to PCI memory, I/O space,    
 76 space.  Interrupts; however, will continue to     
 77                                                   
 78 Detection and recovery are performed with the     
 79 firmware.  The programming interfaces in the L    
 80 into the firmware are referred to as RTAS (Run    
 81 Services).  The Linux kernel does not (should     
 82 the EEH function in the PCI chipsets directly,    
 83 there are a number of different chipsets out t    
 84 different interfaces and quirks. The firmware     
 85 uniform abstraction layer that will work with     
 86 and iSeries hardware (and be forwards-compatib    
 87                                                   
 88 If the OS or device driver suspects that a PCI    
 89 EEH-isolated, there is a firmware call it can     
 90 this is the case. If so, then the device drive    
 91 into a consistent state (given that it won't b    
 92 pending work) and start recovery of the card.     
 93 would consist of resetting the PCI device (hol    
 94 line high for two seconds), followed by settin    
 95 config space (the base address registers (BAR'    
 96 cache line size, interrupt line, and so on).      
 97 reinitialization of the device driver.  In a w    
 98 the power to the card can be toggled, at least    
 99 slots.  In principle, layers far above the dev    
100 do not need to know that the PCI card has been    
101 way; ideally, there should be at most a pause     
102 I/O while the card is being reset.                
103                                                   
104 If the card cannot be recovered after three or    
105 kernel/device driver should assume the worst-c    
106 card has died completely, and report this erro    
107 In addition, error messages are reported throu    
108 syslogd (/var/log/messages) to alert the sysad    
109 The correct way to deal with failed adapters i    
110 PCI hotplug tools to remove and replace the de    
111                                                   
112                                                   
113 Current PPC64 Linux EEH Implementation            
114 --------------------------------------            
115 At this time, a generic EEH recovery mechanism    
116 so that individual device drivers do not need     
117 EEH recovery.  This generic mechanism piggy-ba    
118 infrastructure,  and percolates events up thro    
119 infrastructure.  Following is a detailed descr    
120 accomplished.                                     
121                                                   
122 EEH must be enabled in the PHB's very early du    
123 and if a PCI slot is hot-plugged. The former i    
124 eeh_init() in arch/powerpc/platforms/pseries/e    
125 drivers/pci/hotplug/pSeries_pci.c calling in t    
126 EEH must be enabled before a PCI scan of the d    
127 Current Power5 hardware will not work unless E    
128 although older Power4 can run with it disabled    
129 EEH can no longer be turned off.  PCI devices     
130 registered with the EEH code; the EEH code nee    
131 the I/O address ranges of the PCI device in or    
132 error.  Given an arbitrary address, the routin    
133 pci_get_device_by_addr() will find the pci dev    
134 with that address (if any).                       
135                                                   
136 The default arch/powerpc/include/asm/io.h macr    
137 etc. include a check to see if the i/o read re    
138 If so, these make a call to eeh_dn_check_failu    
139 asks the firmware if the all-ff's value is the    
140 error.  If it is not, processing continues as     
141 total number of these false alarms or "false p    
142 seen in /proc/ppc64/eeh (subject to change).      
143 all of these occur during boot, when the PCI b    
144 a large number of 0xff reads are part of the b    
145                                                   
146 If a frozen slot is detected, code in             
147 arch/powerpc/platforms/pseries/eeh.c will prin    
148 syslog (/var/log/messages).  This stack trace     
149 useful to device-driver authors for finding ou    
150 error was detected, as the error itself usuall    
151 beforehand.                                       
152                                                   
153 Next, it uses the Linux kernel notifier chain/    
154 allow any interested parties to find out about    
155 drivers, or other parts of the kernel, can use    
156 `eeh_register_notifier(struct notifier_block *    
157 events.  The event will include a pointer to t    
158 device node and some state info.  Receivers of    
159 they wish"; the default handler will be descri    
160 section.                                          
161                                                   
162 To assist in the recovery of the device, eeh.c    
163 following functions:                              
164                                                   
165 rtas_set_slot_reset()                             
166    assert the  PCI #RST line for 1/8th of a se    
167 rtas_configure_bridge()                           
168    ask firmware to configure any PCI bridges      
169    located topologically under the pci slot.      
170 eeh_save_bars() and eeh_restore_bars():           
171    save and restore the PCI                       
172    config-space info for a device and any devi    
173                                                   
174                                                   
175 A handler for the EEH notifier_block events is    
176 drivers/pci/hotplug/pSeries_pci.c, called hand    
177 It saves the device BAR's and then calls rpaph    
178 This last call causes the device driver for th    
179 which causes uevents to go out to user space.     
180 user-space scripts that might issue commands s    
181 for ethernet cards, and so on.  This handler t    
182 hoping to give the user-space scripts enough t    
183 It then resets the PCI card, reconfigures the     
184 any bridges underneath. It then calls rpaphp_e    
185 which restarts the device driver and triggers     
186 events (for example, calling "ifup eth0" for e    
187                                                   
188                                                   
189 Device Shutdown and User-Space Events             
190 -------------------------------------             
191 This section documents what happens when a pci    
192 focusing on how the device driver gets shut do    
193 events get delivered to user-space scripts.       
194                                                   
195 Following is an example sequence of events tha    
196 close function to be called during the first p    
197 The following sequence is an example of the pc    
198                                                   
199     rpa_php_unconfig_pci_adapter (struct slot     
200     {                                             
201       calls                                       
202       pci_remove_bus_device (struct pci_dev *)    
203       {                                           
204         calls                                     
205         pci_destroy_dev (struct pci_dev *)        
206         {                                         
207           calls                                   
208           device_unregister (&dev->dev) // in     
209           {                                       
210             calls                                 
211             device_del (struct device *)          
212             {                                     
213               calls                               
214               bus_remove_device() // in /drive    
215               {                                   
216                 calls                             
217                 device_release_driver()           
218                 {                                 
219                   calls                           
220                   struct device_driver->remove    
221                   pci_device_remove()  // in /    
222                   {                               
223                     calls                         
224                     struct pci_driver->remove(    
225                     pcnet32_remove_one() // in    
226                     {                             
227                       calls                       
228                       unregister_netdev() // i    
229                       {                           
230                         calls                     
231                         dev_close()  // in /ne    
232                         {                         
233                            calls dev->stop();     
234                            which is just pcnet    
235                            {                      
236                              which does what y    
237                              to stop the devic    
238                            }                      
239                         }                         
240                      }                            
241                    which                          
242                    frees pcnet32 device driver    
243                 }                                 
244      }}}}}}                                       
245                                                   
246                                                   
247 in drivers/pci/pci_driver.c,                      
248 struct device_driver->remove() is just pci_dev    
249 which calls struct pci_driver->remove() which     
250 which calls unregister_netdev()  (in net/core/    
251 which calls dev_close()  (in net/core/dev.c)      
252 which calls dev->stop() which is pcnet32_close    
253 which then does the appropriate shutdown.         
254                                                   
255 ---                                               
256                                                   
257 Following is the analogous stack trace for eve    
258 when the pci device is unconfigured::             
259                                                   
260   rpa_php_unconfig_pci_adapter() {                
261     calls                                         
262     pci_remove_bus_device (struct pci_dev *) {    
263       calls                                       
264       pci_destroy_dev (struct pci_dev *) {        
265         calls                                     
266         device_unregister (&dev->dev) {           
267           calls                                   
268           device_del(struct device * dev) {       
269             calls                                 
270             kobject_del() {                       
271               calls                               
272               kobject_uevent() {                  
273                 calls                             
274                 kset_uevent() {                   
275                   calls                           
276                   kset->uevent_ops->uevent()      
277                   a call to                       
278                   dev_uevent() {                  
279                     calls                         
280                     dev->bus->uevent() which i    
281                     pci_uevent () {               
282                       which prints device name    
283                    }                              
284                  }                                
285                  then kobject_uevent() sends a    
286                  --> userspace uevent             
287                  (during early boot, nobody li    
288                  kobject_uevent() executes uev    
289                  event process /sbin/hotplug)     
290              }                                    
291            }                                      
292            kobject_del() then calls sysfs_remo    
293            trigger any user-space daemon that     
294            and notice the delete event.           
295                                                   
296                                                   
297 Pro's and Con's of the Current Design             
298 -------------------------------------             
299 There are several issues with the current EEH     
300 which may be addressed in future revisions.  B    
301 big plus of the current design is that no chan    
302 individual device drivers, so that the current    
303 The biggest negative of the design is that it     
304 network daemons and file systems that didn't n    
305                                                   
306 -  A minor complaint is that resetting the net    
307    user-space back-to-back ifdown/ifup burps t    
308    network daemons, that didn't need to even k    
309    card was being rebooted.                       
310                                                   
311 -  A more serious concern is that the same res    
312    causes havoc to mounted file systems.  Scri    
313    unmount a file system without flushing pend    
314    is impossible, because I/O has already been    
315    ideally, the reset should happen at or belo    
316    so that the file systems are not disturbed.    
317                                                   
318    Reiserfs does not tolerate errors returned     
319    Ext3fs seems to be tolerant, retrying reads    
320    succeed. Both have been only lightly tested    
321                                                   
322    The SCSI-generic subsystem already has buil    
323    SCSI device resets, SCSI bus resets, and SC    
324    (HBA) resets.  These are cascaded into a ch    
325    resets if a SCSI command fails. These are c    
326    from the block layer.  It would be very nat    
327    reset into this chain of events.               
328                                                   
329 -  If a SCSI error occurs for the root device,    
330    the sysadmin had the foresight to run /bin,    
331    and so on, out of ramdisk/tmpfs.               
332                                                   
333                                                   
334 Conclusions                                       
335 -----------                                       
336 There's forward progress ...                      
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php