~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/PCI/pci-error-recovery.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/PCI/pci-error-recovery.rst (Version linux-6.11.5) and /Documentation/PCI/pci-error-recovery.rst (Version linux-4.11.12)


  1 .. SPDX-License-Identifier: GPL-2.0               
  2                                                   
  3 ==================                                
  4 PCI Error Recovery                                
  5 ==================                                
  6                                                   
  7                                                   
  8 :Authors: - Linas Vepstas <linasvepstas@gmail.c    
  9           - Richard Lary <rlary@us.ibm.com>        
 10           - Mike Mason <mmlnx@us.ibm.com>          
 11                                                   
 12                                                   
 13 Many PCI bus controllers are able to detect a     
 14 PCI errors on the bus, such as parity errors o    
 15 buses, as well as SERR and PERR errors.  Some     
 16 chipsets are able to deal with these errors; t    
 17 and the PCI-host bridges found on IBM Power4,     
 18 pSeries boxes. A typical action taken is to di    
 19 halting all I/O to it.  The goal of a disconne    
 20 corruption; for example, to halt system memory    
 21 to "wild" addresses. Typically, a reconnection    
 22 offered, so that the affected PCI device(s) ar    
 23 into working condition. The reset phase requir    
 24 between the affected device drivers and the PC    
 25 This document describes a generic API for noti    
 26 of a bus disconnection, and then performing er    
 27 This API is currently implemented in the 2.6.1    
 28                                                   
 29 Reporting and recovery is performed in several    
 30 a PCI hardware error has resulted in a bus dis    
 31 is reported as soon as possible to all affecte    
 32 including multiple instances of a device drive    
 33 cards. This allows device drivers to avoid dea    
 34 waiting for some i/o-space register to change,    
 35 It also gives the drivers a chance to defer in    
 36 needed.                                           
 37                                                   
 38 Next, recovery is performed in several stages.    
 39 is forced by the need to handle multi-function    
 40 devices that have multiple device drivers asso    
 41 In the first stage, each driver is allowed to     
 42 of reset it desires, the choices being a simpl    
 43 or requesting a slot reset.                       
 44                                                   
 45 If any driver requests a slot reset, that is w    
 46                                                   
 47 After a reset and/or a re-enabling of I/O, all    
 48 again notified, so that they may then perform     
 49 that may be required.  After these have all co    
 50 "resume normal operations" event is sent out.     
 51                                                   
 52 The biggest reason for choosing a kernel-based    
 53 than a user-space implementation was the need     
 54 disconnects of PCI devices attached to storage    
 55 disconnects from devices holding the root file    
 56 file system is disconnected, a user-space mech    
 57 through a large number of contortions to compl    
 58 of the current Linux file systems are not tole    
 59 from/reconnection to their underlying block de    
 60 bus errors are easy to manage in the device dr    
 61 device drivers already handle very similar rec    
 62 for example, the SCSI-generic layer already pr    
 63 mechanisms for dealing with SCSI bus errors an    
 64                                                   
 65                                                   
 66 Detailed Design                                   
 67 ===============                                   
 68                                                   
 69 Design and implementation details below, based    
 70 public email discussions with Ben Herrenschmid    
 71                                                   
 72 The error recovery API support is exposed to t    
 73 a structure of function pointers pointed to by    
 74 pci_driver. A driver that fails to provide the    
 75 and the actual recovery steps taken are platfo    
 76 arch/powerpc implementation will simulate a PC    
 77                                                   
 78 This structure has the form::                     
 79                                                   
 80         struct pci_error_handlers                 
 81         {                                         
 82                 int (*error_detected)(struct p    
 83                 int (*mmio_enabled)(struct pci    
 84                 int (*slot_reset)(struct pci_d    
 85                 void (*resume)(struct pci_dev     
 86                 void (*cor_error_detected)(str    
 87         };                                        
 88                                                   
 89 The possible channel states are::                 
 90                                                   
 91         typedef enum {                            
 92                 pci_channel_io_normal,  /* I/O    
 93                 pci_channel_io_frozen,  /* I/O    
 94                 pci_channel_io_perm_failure, /    
 95         } pci_channel_state_t;                    
 96                                                   
 97 Possible return values are::                      
 98                                                   
 99         enum pci_ers_result {                     
100                 PCI_ERS_RESULT_NONE,        /*    
101                 PCI_ERS_RESULT_CAN_RECOVER, /*    
102                 PCI_ERS_RESULT_NEED_RESET,  /*    
103                 PCI_ERS_RESULT_DISCONNECT,  /*    
104                 PCI_ERS_RESULT_RECOVERED,   /*    
105         };                                        
106                                                   
107 A driver does not have to implement all of the    
108 if it implements any, it must implement error_    
109 is not implemented, the corresponding feature     
110 For example, if mmio_enabled() and resume() ar    
111 is assumed that the driver is not doing any di    
112 a slot reset.  Typically a driver will want to    
113 a slot_reset().                                   
114                                                   
115 The actual steps taken by a platform to recove    
116 event will be platform-dependent, but will fol    
117 sequence described below.                         
118                                                   
119 STEP 0: Error Event                               
120 -------------------                               
121 A PCI bus error is detected by the PCI hardwar    
122 is isolated, in that all I/O is blocked: all r    
123 all writes are ignored.                           
124                                                   
125                                                   
126 STEP 1: Notification                              
127 --------------------                              
128 Platform calls the error_detected() callback o    
129 every driver affected by the error.               
130                                                   
131 At this point, the device might not be accessi    
132 the platform (the slot will be isolated on pow    
133 already have "noticed" the error because of a     
134 is the proper "synchronization point", that is    
135 a chance to cleanup, waiting for pending stuff    
136 to complete; it can take semaphores, schedule,    
137 touch the device. Within this function and aft    
138 shouldn't do any new IOs. Called in task conte    
139 "quiesce" point. See note about interrupts at     
140                                                   
141 All drivers participating in this system must     
142 The driver must return one of the following re    
143                                                   
144   - PCI_ERS_RESULT_CAN_RECOVER                    
145       Driver returns this if it thinks it migh    
146       the HW by just banging IOs or if it want    
147       a chance to extract some diagnostic info    
148       mmio_enable, below).                        
149   - PCI_ERS_RESULT_NEED_RESET                     
150       Driver returns this if it can't recover     
151       slot reset.                                 
152   - PCI_ERS_RESULT_DISCONNECT                     
153       Driver returns this if it doesn't want t    
154                                                   
155 The next step taken will depend on the result     
156 drivers.                                          
157                                                   
158 If all drivers on the segment/slot return PCI_    
159 then the platform should re-enable IOs on the     
160 particular, if the platform doesn't isolate sl    
161 proceeds to STEP 2 (MMIO Enable).                 
162                                                   
163 If any driver requested a slot reset (by retur    
164 then recovery proceeds to STEP 4 (Slot Reset).    
165                                                   
166 If the platform is unable to recover the slot,    
167 is STEP 6 (Permanent Failure).                    
168                                                   
169 .. note::                                         
170                                                   
171    The current powerpc implementation assumes     
172    *not* schedule or semaphore in this routine    
173    implementation uses one kernel thread to no    
174    thus, if one device sleeps/schedules, all d    
175    Doing better requires complex multi-threade    
176    recovery implementation (e.g. waiting for a    
177    to "join" before proceeding with recovery.)    
178    complex and not worth implementing.            
179                                                   
180    The current powerpc implementation doesn't     
181    attempts I/O at this point, or not.  I/Os w    
182    a value of 0xff on read, and writes will be    
183    EEH_MAX_FAILS I/Os are attempted to a froze    
184    assumes that the device driver has gone int    
185    and prints an error to syslog.  A reboot is    
186    get the device working again.                  
187                                                   
188 STEP 2: MMIO Enabled                              
189 --------------------                              
190 The platform re-enables MMIO to the device (bu    
191 DMA), and then calls the mmio_enabled() callba    
192 device drivers.                                   
193                                                   
194 This is the "early recovery" call. IOs are all    
195 not, with some restrictions. This is NOT a cal    
196 start operations again, only to peek/poke at t    
197 information, if any, and eventually do things     
198 reset or some such, but not restart operations    
199 all drivers on a segment agree that they can t    
200 link reset was performed by the HW. If the pla    
201 without a slot reset or a link reset, it will     
202 instead will have gone directly to STEP 3 (Lin    
203                                                   
204 .. note::                                         
205                                                   
206    The following is proposed; no platform impl    
207    Proposal: All I/Os should be done _synchron    
208    this callback, errors triggered by them wil    
209    the normal pci_check_whatever() API, no new    
210    callback will be issued due to an error hap    
211    such an error might cause IOs to be re-bloc    
212    segment, and thus invalidate the recovery t    
213    on the same segment might have done, forcin    
214    into one of the next states, that is, link     
215                                                   
216 The driver should return one of the following     
217   - PCI_ERS_RESULT_RECOVERED                      
218       Driver returns this if it thinks the dev    
219       functional and thinks it is ready to sta    
220       normal driver operations again. There is    
221       guarantee that the driver will actually     
222       allowed to proceed, as another driver on    
223       same segment might have failed and thus     
224       slot reset on platforms that support it.    
225                                                   
226   - PCI_ERS_RESULT_NEED_RESET                     
227       Driver returns this if it thinks the dev    
228       recoverable in its current state and it     
229       reset to proceed.                           
230                                                   
231   - PCI_ERS_RESULT_DISCONNECT                     
232       Same as above. Total failure, no recover    
233       reset driver dead. (To be defined more p    
234                                                   
235 The next step taken depends on the results ret    
236 If all drivers returned PCI_ERS_RESULT_RECOVER    
237 proceeds to either STEP3 (Link Reset) or to ST    
238                                                   
239 If any driver returned PCI_ERS_RESULT_NEED_RES    
240 proceeds to STEP 4 (Slot Reset)                   
241                                                   
242 STEP 3: Link Reset                                
243 ------------------                                
244 The platform resets the link.  This is a PCI-E    
245 and is done whenever a fatal error has been de    
246 "solved" by resetting the link.                   
247                                                   
248 STEP 4: Slot Reset                                
249 ------------------                                
250                                                   
251 In response to a return value of PCI_ERS_RESUL    
252 platform will perform a slot reset on the requ    
253 The actual steps taken by a platform to perfor    
254 will be platform-dependent. Upon completion of    
255 platform will call the device slot_reset() cal    
256                                                   
257 Powerpc platforms implement two levels of slot    
258 soft reset(default) and fundamental(optional)     
259                                                   
260 Powerpc soft reset consists of asserting the a    
261 restoring the PCI BARs and PCI configuration h    
262 that is equivalent to what it would be after a    
263 power-on followed by power-on BIOS/system firm    
264 Soft reset is also known as hot-reset.            
265                                                   
266 Powerpc fundamental reset is supported by PCI     
267 and results in device's state machines, hardwa    
268 configuration registers to initialize to their    
269                                                   
270 For most PCI devices, a soft reset will be suf    
271 Optional fundamental reset is provided to supp    
272 of PCI Express devices for which a soft reset     
273 for recovery.                                     
274                                                   
275 If the platform supports PCI hotplug, then the    
276 performed by toggling the slot electrical powe    
277                                                   
278 It is important for the platform to restore th    
279 to the "fresh poweron" state, rather than the     
280 a slot reset, the device driver will almost al    
281 device initialization routines, and an unusual    
282 may result in hung devices, kernel panics, or     
283                                                   
284 This call gives drivers the chance to re-initi    
285 (re-download firmware, etc.).  At this point,     
286 that the card is in a fresh state and is fully    
287 is unfrozen and the driver has full access to     
288 memory mapped I/O space and DMA. Interrupts (L    
289 will also be available.                           
290                                                   
291 Drivers should not restart normal I/O processi    
292 at this point.  If all device drivers report s    
293 callback, the platform will call resume() to c    
294 and let the driver restart normal I/O processi    
295                                                   
296 A driver can still return a critical failure f    
297 it can't get the device operational after rese    
298 previously tried a soft reset, it might now tr    
299 cycle) and then call slot_reset() again.  If t    
300 be recovered, there is nothing more that can b    
301 will typically report a "permanent failure" in    
302 device will be considered "dead" in this case.    
303                                                   
304 Drivers for multi-function cards will need to     
305 themselves as to which driver instance will pe    
306 or global device initialization. For example,     
307 driver performs device init only from PCI func    
308                                                   
309         +       if (PCI_FUNC(pdev->devfn) == 0    
310         +               sym_reset_scsi_bus(np,    
311                                                   
312 Result codes:                                     
313         - PCI_ERS_RESULT_DISCONNECT               
314           Same as above.                          
315                                                   
316 Drivers for PCI Express cards that require a f    
317 set the needs_freset bit in the pci_dev struct    
318 For example, the QLogic qla2xxx driver sets th    
319 PCI card types::                                  
320                                                   
321         +       /* Set EEH reset type to funda    
322         +       if (IS_QLA24XX(ha) || IS_QLA25    
323         +               pdev->needs_freset = 1    
324         +                                         
325                                                   
326 Platform proceeds either to STEP 5 (Resume Ope    
327 Failure).                                         
328                                                   
329 .. note::                                         
330                                                   
331    The current powerpc implementation does not    
332    reset if the driver returned PCI_ERS_RESULT    
333    However, it probably should.                   
334                                                   
335                                                   
336 STEP 5: Resume Operations                         
337 -------------------------                         
338 The platform will call the resume() callback o    
339 drivers if all drivers on the segment have ret    
340 PCI_ERS_RESULT_RECOVERED from one of the 3 pre    
341 The goal of this callback is to tell the drive    
342 that everything is back and running. This call    
343 a result code.                                    
344                                                   
345 At this point, if a new error happens, the pla    
346 a new error recovery sequence.                    
347                                                   
348 STEP 6: Permanent Failure                         
349 -------------------------                         
350 A "permanent failure" has occurred, and the pl    
351 the device.  The platform will call error_dete    
352 pci_channel_state_t value of pci_channel_io_pe    
353                                                   
354 The device driver should, at this point, assum    
355 cancel all pending I/O, refuse all new I/O, re    
356 higher layers. The device driver should then c    
357 memory and remove itself from kernel operation    
358 during system shutdown.                           
359                                                   
360 The platform will typically notify the system     
361 permanent failure in some way.  If the device     
362 the operator will probably want to remove and     
363 Note, however, not all failures are truly "per    
364 caused by over-heating, some by a poorly seate    
365 PCI error events are caused by software bugs,     
366 wild addresses or bogus split transactions due    
367 errors. See the discussion in Documentation/ar    
368 for additional detail on real-life experience     
369 software errors.                                  
370                                                   
371                                                   
372 Conclusion; General Remarks                       
373 ---------------------------                       
374 The way the callbacks are called is platform p    
375 no slot reset capability may want to just "ign    
376 recover (disconnect them) and try to let other    
377 recover. Keep in mind that in most real life c    
378 be only one driver per segment.                   
379                                                   
380 Now, a note about interrupts. If you get an in    
381 device is dead or has been isolated, there is     
382 The current policy is to turn this into a plat    
383 That is, the recovery API only requires that:     
384                                                   
385  - There is no guarantee that interrupt delive    
386    device on the segment starting from the err    
387    slot_reset callback is called, at which poi    
388    to be fully operational.                       
389                                                   
390  - There is no guarantee that interrupt delive    
391    a driver that gets an interrupt after detec    
392    an error within the interrupt handler such     
393    ack'ing of the interrupt (and thus removal     
394    return IRQ_NOTHANDLED. It's up to the platf    
395    condition, typically by masking the IRQ sou    
396    the error handling. It is expected that the    
397    interrupts are routed to error-management c    
398    with temporarily disabling that IRQ number     
399    isn't terribly complex). That means some IR    
400    sharing the interrupt, but there is simply     
401    platforms aren't supposed to share interrup    
402    anyway :)                                      
403                                                   
404 .. note::                                         
405                                                   
406    Implementation details for the powerpc plat    
407    the file Documentation/arch/powerpc/eeh-pci    
408                                                   
409    As of this writing, there is a growing list    
410    patches implementing error recovery. Not al    
411    mainline yet. These may be used as "example    
412                                                   
413    - drivers/scsi/ipr                             
414    - drivers/scsi/sym53c8xx_2                     
415    - drivers/scsi/qla2xxx                         
416    - drivers/scsi/lpfc                            
417    - drivers/next/bnx2.c                          
418    - drivers/next/e100.c                          
419    - drivers/net/e1000                            
420    - drivers/net/e1000e                           
421    - drivers/net/ixgbe                            
422    - drivers/net/cxgb3                            
423    - drivers/net/s2io.c                           
424                                                   
425    The cor_error_detected() callback is invoke    
426    the error severity is "correctable". The ca    
427    additional logging to be done if desired. S    
428                                                   
429    - drivers/cxl/pci.c                            
430                                                   
431 The End                                           
432 -------                                           
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php