Linux/Documentation/PCI/pci-error-recovery.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

1 .. SPDX-License-Identifier: GPL-2.0 1 .. SPDX-License-Identifier: GPL-2.0 2 2 3 ================== 3 ================== 4 PCI Error Recovery 4 PCI Error Recovery 5 ================== 5 ================== 6 6 7 7 8 :Authors: - Linas Vepstas <linasvepstas@gmail.c 8 :Authors: - Linas Vepstas <linasvepstas@gmail.com> 9 - Richard Lary <rlary@us.ibm.com> 9 - Richard Lary <rlary@us.ibm.com> 10 - Mike Mason <mmlnx@us.ibm.com> 10 - Mike Mason <mmlnx@us.ibm.com> 11 11 12 12 13 Many PCI bus controllers are able to detect a 13 Many PCI bus controllers are able to detect a variety of hardware 14 PCI errors on the bus, such as parity errors o 14 PCI errors on the bus, such as parity errors on the data and address 15 buses, as well as SERR and PERR errors. Some 15 buses, as well as SERR and PERR errors. Some of the more advanced 16 chipsets are able to deal with these errors; t 16 chipsets are able to deal with these errors; these include PCI-E chipsets, 17 and the PCI-host bridges found on IBM Power4, 17 and the PCI-host bridges found on IBM Power4, Power5 and Power6-based 18 pSeries boxes. A typical action taken is to di 18 pSeries boxes. A typical action taken is to disconnect the affected device, 19 halting all I/O to it. The goal of a disconne 19 halting all I/O to it. The goal of a disconnection is to avoid system 20 corruption; for example, to halt system memory 20 corruption; for example, to halt system memory corruption due to DMAs 21 to "wild" addresses. Typically, a reconnection 21 to "wild" addresses. Typically, a reconnection mechanism is also 22 offered, so that the affected PCI device(s) ar 22 offered, so that the affected PCI device(s) are reset and put back 23 into working condition. The reset phase requir 23 into working condition. The reset phase requires coordination 24 between the affected device drivers and the PC 24 between the affected device drivers and the PCI controller chip. 25 This document describes a generic API for noti 25 This document describes a generic API for notifying device drivers 26 of a bus disconnection, and then performing er 26 of a bus disconnection, and then performing error recovery. 27 This API is currently implemented in the 2.6.1 27 This API is currently implemented in the 2.6.16 and later kernels. 28 28 29 Reporting and recovery is performed in several 29 Reporting and recovery is performed in several steps. First, when 30 a PCI hardware error has resulted in a bus dis 30 a PCI hardware error has resulted in a bus disconnect, that event 31 is reported as soon as possible to all affecte 31 is reported as soon as possible to all affected device drivers, 32 including multiple instances of a device drive 32 including multiple instances of a device driver on multi-function 33 cards. This allows device drivers to avoid dea 33 cards. This allows device drivers to avoid deadlocking in spinloops, 34 waiting for some i/o-space register to change, 34 waiting for some i/o-space register to change, when it never will. 35 It also gives the drivers a chance to defer in 35 It also gives the drivers a chance to defer incoming I/O as 36 needed. 36 needed. 37 37 38 Next, recovery is performed in several stages. 38 Next, recovery is performed in several stages. Most of the complexity 39 is forced by the need to handle multi-function 39 is forced by the need to handle multi-function devices, that is, 40 devices that have multiple device drivers asso 40 devices that have multiple device drivers associated with them. 41 In the first stage, each driver is allowed to 41 In the first stage, each driver is allowed to indicate what type 42 of reset it desires, the choices being a simpl 42 of reset it desires, the choices being a simple re-enabling of I/O 43 or requesting a slot reset. 43 or requesting a slot reset. 44 44 45 If any driver requests a slot reset, that is w 45 If any driver requests a slot reset, that is what will be done. 46 46 47 After a reset and/or a re-enabling of I/O, all 47 After a reset and/or a re-enabling of I/O, all drivers are 48 again notified, so that they may then perform 48 again notified, so that they may then perform any device setup/config 49 that may be required. After these have all co 49 that may be required. After these have all completed, a final 50 "resume normal operations" event is sent out. 50 "resume normal operations" event is sent out. 51 51 52 The biggest reason for choosing a kernel-based 52 The biggest reason for choosing a kernel-based implementation rather 53 than a user-space implementation was the need 53 than a user-space implementation was the need to deal with bus 54 disconnects of PCI devices attached to storage 54 disconnects of PCI devices attached to storage media, and, in particular, 55 disconnects from devices holding the root file 55 disconnects from devices holding the root file system. If the root 56 file system is disconnected, a user-space mech 56 file system is disconnected, a user-space mechanism would have to go 57 through a large number of contortions to compl 57 through a large number of contortions to complete recovery. Almost all 58 of the current Linux file systems are not tole 58 of the current Linux file systems are not tolerant of disconnection 59 from/reconnection to their underlying block de 59 from/reconnection to their underlying block device. By contrast, 60 bus errors are easy to manage in the device dr 60 bus errors are easy to manage in the device driver. Indeed, most 61 device drivers already handle very similar rec 61 device drivers already handle very similar recovery procedures; 62 for example, the SCSI-generic layer already pr 62 for example, the SCSI-generic layer already provides significant 63 mechanisms for dealing with SCSI bus errors an 63 mechanisms for dealing with SCSI bus errors and SCSI bus resets. 64 64 65 65 66 Detailed Design 66 Detailed Design 67 =============== 67 =============== 68 68 69 Design and implementation details below, based 69 Design and implementation details below, based on a chain of 70 public email discussions with Ben Herrenschmid 70 public email discussions with Ben Herrenschmidt, circa 5 April 2005. 71 71 72 The error recovery API support is exposed to t 72 The error recovery API support is exposed to the driver in the form of 73 a structure of function pointers pointed to by 73 a structure of function pointers pointed to by a new field in struct 74 pci_driver. A driver that fails to provide the 74 pci_driver. A driver that fails to provide the structure is "non-aware", 75 and the actual recovery steps taken are platfo 75 and the actual recovery steps taken are platform dependent. The 76 arch/powerpc implementation will simulate a PC 76 arch/powerpc implementation will simulate a PCI hotplug remove/add. 77 77 78 This structure has the form:: 78 This structure has the form:: 79 79 80 struct pci_error_handlers 80 struct pci_error_handlers 81 { 81 { 82 int (*error_detected)(struct p 82 int (*error_detected)(struct pci_dev *dev, pci_channel_state_t); 83 int (*mmio_enabled)(struct pci 83 int (*mmio_enabled)(struct pci_dev *dev); 84 int (*slot_reset)(struct pci_d 84 int (*slot_reset)(struct pci_dev *dev); 85 void (*resume)(struct pci_dev 85 void (*resume)(struct pci_dev *dev); 86 void (*cor_error_detected)(str 86 void (*cor_error_detected)(struct pci_dev *dev); 87 }; 87 }; 88 88 89 The possible channel states are:: 89 The possible channel states are:: 90 90 91 typedef enum { 91 typedef enum { 92 pci_channel_io_normal, /* I/O 92 pci_channel_io_normal, /* I/O channel is in normal state */ 93 pci_channel_io_frozen, /* I/O 93 pci_channel_io_frozen, /* I/O to channel is blocked */ 94 pci_channel_io_perm_failure, / 94 pci_channel_io_perm_failure, /* PCI card is dead */ 95 } pci_channel_state_t; 95 } pci_channel_state_t; 96 96 97 Possible return values are:: 97 Possible return values are:: 98 98 99 enum pci_ers_result { 99 enum pci_ers_result { 100 PCI_ERS_RESULT_NONE, /* 100 PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */ 101 PCI_ERS_RESULT_CAN_RECOVER, /* 101 PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */ 102 PCI_ERS_RESULT_NEED_RESET, /* 102 PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */ 103 PCI_ERS_RESULT_DISCONNECT, /* 103 PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */ 104 PCI_ERS_RESULT_RECOVERED, /* 104 PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */ 105 }; 105 }; 106 106 107 A driver does not have to implement all of the 107 A driver does not have to implement all of these callbacks; however, 108 if it implements any, it must implement error_ 108 if it implements any, it must implement error_detected(). If a callback 109 is not implemented, the corresponding feature 109 is not implemented, the corresponding feature is considered unsupported. 110 For example, if mmio_enabled() and resume() ar 110 For example, if mmio_enabled() and resume() aren't there, then it 111 is assumed that the driver is not doing any di 111 is assumed that the driver is not doing any direct recovery and requires 112 a slot reset. Typically a driver will want to 112 a slot reset. Typically a driver will want to know about 113 a slot_reset(). 113 a slot_reset(). 114 114 115 The actual steps taken by a platform to recove 115 The actual steps taken by a platform to recover from a PCI error 116 event will be platform-dependent, but will fol 116 event will be platform-dependent, but will follow the general 117 sequence described below. 117 sequence described below. 118 118 119 STEP 0: Error Event 119 STEP 0: Error Event 120 ------------------- 120 ------------------- 121 A PCI bus error is detected by the PCI hardwar 121 A PCI bus error is detected by the PCI hardware. On powerpc, the slot 122 is isolated, in that all I/O is blocked: all r 122 is isolated, in that all I/O is blocked: all reads return 0xffffffff, 123 all writes are ignored. 123 all writes are ignored. 124 124 125 125 126 STEP 1: Notification 126 STEP 1: Notification 127 -------------------- 127 -------------------- 128 Platform calls the error_detected() callback o 128 Platform calls the error_detected() callback on every instance of 129 every driver affected by the error. 129 every driver affected by the error. 130 130 131 At this point, the device might not be accessi 131 At this point, the device might not be accessible anymore, depending on 132 the platform (the slot will be isolated on pow 132 the platform (the slot will be isolated on powerpc). The driver may 133 already have "noticed" the error because of a 133 already have "noticed" the error because of a failing I/O, but this 134 is the proper "synchronization point", that is 134 is the proper "synchronization point", that is, it gives the driver 135 a chance to cleanup, waiting for pending stuff 135 a chance to cleanup, waiting for pending stuff (timers, whatever, etc...) 136 to complete; it can take semaphores, schedule, 136 to complete; it can take semaphores, schedule, etc... everything but 137 touch the device. Within this function and aft 137 touch the device. Within this function and after it returns, the driver 138 shouldn't do any new IOs. Called in task conte 138 shouldn't do any new IOs. Called in task context. This is sort of a 139 "quiesce" point. See note about interrupts at 139 "quiesce" point. See note about interrupts at the end of this doc. 140 140 141 All drivers participating in this system must 141 All drivers participating in this system must implement this call. 142 The driver must return one of the following re 142 The driver must return one of the following result codes: 143 143 144 - PCI_ERS_RESULT_CAN_RECOVER 144 - PCI_ERS_RESULT_CAN_RECOVER 145 Driver returns this if it thinks it migh 145 Driver returns this if it thinks it might be able to recover 146 the HW by just banging IOs or if it want 146 the HW by just banging IOs or if it wants to be given 147 a chance to extract some diagnostic info 147 a chance to extract some diagnostic information (see 148 mmio_enable, below). 148 mmio_enable, below). 149 - PCI_ERS_RESULT_NEED_RESET 149 - PCI_ERS_RESULT_NEED_RESET 150 Driver returns this if it can't recover 150 Driver returns this if it can't recover without a 151 slot reset. 151 slot reset. 152 - PCI_ERS_RESULT_DISCONNECT 152 - PCI_ERS_RESULT_DISCONNECT 153 Driver returns this if it doesn't want t 153 Driver returns this if it doesn't want to recover at all. 154 154 155 The next step taken will depend on the result 155 The next step taken will depend on the result codes returned by the 156 drivers. 156 drivers. 157 157 158 If all drivers on the segment/slot return PCI_ 158 If all drivers on the segment/slot return PCI_ERS_RESULT_CAN_RECOVER, 159 then the platform should re-enable IOs on the 159 then the platform should re-enable IOs on the slot (or do nothing in 160 particular, if the platform doesn't isolate sl 160 particular, if the platform doesn't isolate slots), and recovery 161 proceeds to STEP 2 (MMIO Enable). 161 proceeds to STEP 2 (MMIO Enable). 162 162 163 If any driver requested a slot reset (by retur 163 If any driver requested a slot reset (by returning PCI_ERS_RESULT_NEED_RESET), 164 then recovery proceeds to STEP 4 (Slot Reset). 164 then recovery proceeds to STEP 4 (Slot Reset). 165 165 166 If the platform is unable to recover the slot, 166 If the platform is unable to recover the slot, the next step 167 is STEP 6 (Permanent Failure). 167 is STEP 6 (Permanent Failure). 168 168 169 .. note:: 169 .. note:: 170 170 171 The current powerpc implementation assumes 171 The current powerpc implementation assumes that a device driver will 172 *not* schedule or semaphore in this routine 172 *not* schedule or semaphore in this routine; the current powerpc 173 implementation uses one kernel thread to no 173 implementation uses one kernel thread to notify all devices; 174 thus, if one device sleeps/schedules, all d 174 thus, if one device sleeps/schedules, all devices are affected. 175 Doing better requires complex multi-threade 175 Doing better requires complex multi-threaded logic in the error 176 recovery implementation (e.g. waiting for a 176 recovery implementation (e.g. waiting for all notification threads 177 to "join" before proceeding with recovery.) 177 to "join" before proceeding with recovery.) This seems excessively 178 complex and not worth implementing. 178 complex and not worth implementing. 179 179 180 The current powerpc implementation doesn't 180 The current powerpc implementation doesn't much care if the device 181 attempts I/O at this point, or not. I/Os w 181 attempts I/O at this point, or not. I/Os will fail, returning 182 a value of 0xff on read, and writes will be 182 a value of 0xff on read, and writes will be dropped. If more than 183 EEH_MAX_FAILS I/Os are attempted to a froze 183 EEH_MAX_FAILS I/Os are attempted to a frozen adapter, EEH 184 assumes that the device driver has gone int 184 assumes that the device driver has gone into an infinite loop 185 and prints an error to syslog. A reboot is 185 and prints an error to syslog. A reboot is then required to 186 get the device working again. 186 get the device working again. 187 187 188 STEP 2: MMIO Enabled 188 STEP 2: MMIO Enabled 189 -------------------- 189 -------------------- 190 The platform re-enables MMIO to the device (bu 190 The platform re-enables MMIO to the device (but typically not the 191 DMA), and then calls the mmio_enabled() callba 191 DMA), and then calls the mmio_enabled() callback on all affected 192 device drivers. 192 device drivers. 193 193 194 This is the "early recovery" call. IOs are all 194 This is the "early recovery" call. IOs are allowed again, but DMA is 195 not, with some restrictions. This is NOT a cal 195 not, with some restrictions. This is NOT a callback for the driver to 196 start operations again, only to peek/poke at t 196 start operations again, only to peek/poke at the device, extract diagnostic 197 information, if any, and eventually do things 197 information, if any, and eventually do things like trigger a device local 198 reset or some such, but not restart operations 198 reset or some such, but not restart operations. This callback is made if 199 all drivers on a segment agree that they can t 199 all drivers on a segment agree that they can try to recover and if no automatic 200 link reset was performed by the HW. If the pla 200 link reset was performed by the HW. If the platform can't just re-enable IOs 201 without a slot reset or a link reset, it will 201 without a slot reset or a link reset, it will not call this callback, and 202 instead will have gone directly to STEP 3 (Lin 202 instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset) 203 203 204 .. note:: 204 .. note:: 205 205 206 The following is proposed; no platform impl 206 The following is proposed; no platform implements this yet: 207 Proposal: All I/Os should be done _synchron 207 Proposal: All I/Os should be done _synchronously_ from within 208 this callback, errors triggered by them wil 208 this callback, errors triggered by them will be returned via 209 the normal pci_check_whatever() API, no new 209 the normal pci_check_whatever() API, no new error_detected() 210 callback will be issued due to an error hap 210 callback will be issued due to an error happening here. However, 211 such an error might cause IOs to be re-bloc 211 such an error might cause IOs to be re-blocked for the whole 212 segment, and thus invalidate the recovery t 212 segment, and thus invalidate the recovery that other devices 213 on the same segment might have done, forcin 213 on the same segment might have done, forcing the whole segment 214 into one of the next states, that is, link 214 into one of the next states, that is, link reset or slot reset. 215 215 216 The driver should return one of the following 216 The driver should return one of the following result codes: 217 - PCI_ERS_RESULT_RECOVERED 217 - PCI_ERS_RESULT_RECOVERED 218 Driver returns this if it thinks the dev 218 Driver returns this if it thinks the device is fully 219 functional and thinks it is ready to sta 219 functional and thinks it is ready to start 220 normal driver operations again. There is 220 normal driver operations again. There is no 221 guarantee that the driver will actually 221 guarantee that the driver will actually be 222 allowed to proceed, as another driver on 222 allowed to proceed, as another driver on the 223 same segment might have failed and thus 223 same segment might have failed and thus triggered a 224 slot reset on platforms that support it. 224 slot reset on platforms that support it. 225 225 226 - PCI_ERS_RESULT_NEED_RESET 226 - PCI_ERS_RESULT_NEED_RESET 227 Driver returns this if it thinks the dev 227 Driver returns this if it thinks the device is not 228 recoverable in its current state and it 228 recoverable in its current state and it needs a slot 229 reset to proceed. 229 reset to proceed. 230 230 231 - PCI_ERS_RESULT_DISCONNECT 231 - PCI_ERS_RESULT_DISCONNECT 232 Same as above. Total failure, no recover 232 Same as above. Total failure, no recovery even after 233 reset driver dead. (To be defined more p 233 reset driver dead. (To be defined more precisely) 234 234 235 The next step taken depends on the results ret 235 The next step taken depends on the results returned by the drivers. 236 If all drivers returned PCI_ERS_RESULT_RECOVER 236 If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform 237 proceeds to either STEP3 (Link Reset) or to ST 237 proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations). 238 238 239 If any driver returned PCI_ERS_RESULT_NEED_RES 239 If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform 240 proceeds to STEP 4 (Slot Reset) 240 proceeds to STEP 4 (Slot Reset) 241 241 242 STEP 3: Link Reset 242 STEP 3: Link Reset 243 ------------------ 243 ------------------ 244 The platform resets the link. This is a PCI-E 244 The platform resets the link. This is a PCI-Express specific step 245 and is done whenever a fatal error has been de 245 and is done whenever a fatal error has been detected that can be 246 "solved" by resetting the link. 246 "solved" by resetting the link. 247 247 248 STEP 4: Slot Reset 248 STEP 4: Slot Reset 249 ------------------ 249 ------------------ 250 250 251 In response to a return value of PCI_ERS_RESUL 251 In response to a return value of PCI_ERS_RESULT_NEED_RESET, the 252 platform will perform a slot reset on the requ 252 platform will perform a slot reset on the requesting PCI device(s). 253 The actual steps taken by a platform to perfor 253 The actual steps taken by a platform to perform a slot reset 254 will be platform-dependent. Upon completion of 254 will be platform-dependent. Upon completion of slot reset, the 255 platform will call the device slot_reset() cal 255 platform will call the device slot_reset() callback. 256 256 257 Powerpc platforms implement two levels of slot 257 Powerpc platforms implement two levels of slot reset: 258 soft reset(default) and fundamental(optional) 258 soft reset(default) and fundamental(optional) reset. 259 259 260 Powerpc soft reset consists of asserting the a 260 Powerpc soft reset consists of asserting the adapter #RST line and then 261 restoring the PCI BARs and PCI configuration h 261 restoring the PCI BARs and PCI configuration header to a state 262 that is equivalent to what it would be after a 262 that is equivalent to what it would be after a fresh system 263 power-on followed by power-on BIOS/system firm 263 power-on followed by power-on BIOS/system firmware initialization. 264 Soft reset is also known as hot-reset. 264 Soft reset is also known as hot-reset. 265 265 266 Powerpc fundamental reset is supported by PCI 266 Powerpc fundamental reset is supported by PCI Express cards only 267 and results in device's state machines, hardwa 267 and results in device's state machines, hardware logic, port states and 268 configuration registers to initialize to their 268 configuration registers to initialize to their default conditions. 269 269 270 For most PCI devices, a soft reset will be suf 270 For most PCI devices, a soft reset will be sufficient for recovery. 271 Optional fundamental reset is provided to supp 271 Optional fundamental reset is provided to support a limited number 272 of PCI Express devices for which a soft reset 272 of PCI Express devices for which a soft reset is not sufficient 273 for recovery. 273 for recovery. 274 274 275 If the platform supports PCI hotplug, then the 275 If the platform supports PCI hotplug, then the reset might be 276 performed by toggling the slot electrical powe 276 performed by toggling the slot electrical power off/on. 277 277 278 It is important for the platform to restore th 278 It is important for the platform to restore the PCI config space 279 to the "fresh poweron" state, rather than the 279 to the "fresh poweron" state, rather than the "last state". After 280 a slot reset, the device driver will almost al 280 a slot reset, the device driver will almost always use its standard 281 device initialization routines, and an unusual 281 device initialization routines, and an unusual config space setup 282 may result in hung devices, kernel panics, or 282 may result in hung devices, kernel panics, or silent data corruption. 283 283 284 This call gives drivers the chance to re-initi 284 This call gives drivers the chance to re-initialize the hardware 285 (re-download firmware, etc.). At this point, 285 (re-download firmware, etc.). At this point, the driver may assume 286 that the card is in a fresh state and is fully 286 that the card is in a fresh state and is fully functional. The slot 287 is unfrozen and the driver has full access to 287 is unfrozen and the driver has full access to PCI config space, 288 memory mapped I/O space and DMA. Interrupts (L 288 memory mapped I/O space and DMA. Interrupts (Legacy, MSI, or MSI-X) 289 will also be available. 289 will also be available. 290 290 291 Drivers should not restart normal I/O processi 291 Drivers should not restart normal I/O processing operations 292 at this point. If all device drivers report s 292 at this point. If all device drivers report success on this 293 callback, the platform will call resume() to c 293 callback, the platform will call resume() to complete the sequence, 294 and let the driver restart normal I/O processi 294 and let the driver restart normal I/O processing. 295 295 296 A driver can still return a critical failure f 296 A driver can still return a critical failure for this function if 297 it can't get the device operational after rese 297 it can't get the device operational after reset. If the platform 298 previously tried a soft reset, it might now tr 298 previously tried a soft reset, it might now try a hard reset (power 299 cycle) and then call slot_reset() again. If t 299 cycle) and then call slot_reset() again. If the device still can't 300 be recovered, there is nothing more that can b 300 be recovered, there is nothing more that can be done; the platform 301 will typically report a "permanent failure" in 301 will typically report a "permanent failure" in such a case. The 302 device will be considered "dead" in this case. 302 device will be considered "dead" in this case. 303 303 304 Drivers for multi-function cards will need to 304 Drivers for multi-function cards will need to coordinate among 305 themselves as to which driver instance will pe 305 themselves as to which driver instance will perform any "one-shot" 306 or global device initialization. For example, 306 or global device initialization. For example, the Symbios sym53cxx2 307 driver performs device init only from PCI func 307 driver performs device init only from PCI function 0:: 308 308 309 + if (PCI_FUNC(pdev->devfn) == 0 309 + if (PCI_FUNC(pdev->devfn) == 0) 310 + sym_reset_scsi_bus(np, 310 + sym_reset_scsi_bus(np, 0); 311 311 312 Result codes: 312 Result codes: 313 - PCI_ERS_RESULT_DISCONNECT 313 - PCI_ERS_RESULT_DISCONNECT 314 Same as above. 314 Same as above. 315 315 316 Drivers for PCI Express cards that require a f 316 Drivers for PCI Express cards that require a fundamental reset must 317 set the needs_freset bit in the pci_dev struct 317 set the needs_freset bit in the pci_dev structure in their probe function. 318 For example, the QLogic qla2xxx driver sets th 318 For example, the QLogic qla2xxx driver sets the needs_freset bit for certain 319 PCI card types:: 319 PCI card types:: 320 320 321 + /* Set EEH reset type to funda 321 + /* Set EEH reset type to fundamental if required by hba */ 322 + if (IS_QLA24XX(ha) || IS_QLA25 322 + if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha)) 323 + pdev->needs_freset = 1 323 + pdev->needs_freset = 1; 324 + 324 + 325 325 326 Platform proceeds either to STEP 5 (Resume Ope 326 Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent 327 Failure). 327 Failure). 328 328 329 .. note:: 329 .. note:: 330 330 331 The current powerpc implementation does not 331 The current powerpc implementation does not try a power-cycle 332 reset if the driver returned PCI_ERS_RESULT 332 reset if the driver returned PCI_ERS_RESULT_DISCONNECT. 333 However, it probably should. 333 However, it probably should. 334 334 335 335 336 STEP 5: Resume Operations 336 STEP 5: Resume Operations 337 ------------------------- 337 ------------------------- 338 The platform will call the resume() callback o 338 The platform will call the resume() callback on all affected device 339 drivers if all drivers on the segment have ret 339 drivers if all drivers on the segment have returned 340 PCI_ERS_RESULT_RECOVERED from one of the 3 pre 340 PCI_ERS_RESULT_RECOVERED from one of the 3 previous callbacks. 341 The goal of this callback is to tell the drive 341 The goal of this callback is to tell the driver to restart activity, 342 that everything is back and running. This call 342 that everything is back and running. This callback does not return 343 a result code. 343 a result code. 344 344 345 At this point, if a new error happens, the pla 345 At this point, if a new error happens, the platform will restart 346 a new error recovery sequence. 346 a new error recovery sequence. 347 347 348 STEP 6: Permanent Failure 348 STEP 6: Permanent Failure 349 ------------------------- 349 ------------------------- 350 A "permanent failure" has occurred, and the pl 350 A "permanent failure" has occurred, and the platform cannot recover 351 the device. The platform will call error_dete 351 the device. The platform will call error_detected() with a 352 pci_channel_state_t value of pci_channel_io_pe 352 pci_channel_state_t value of pci_channel_io_perm_failure. 353 353 354 The device driver should, at this point, assum 354 The device driver should, at this point, assume the worst. It should 355 cancel all pending I/O, refuse all new I/O, re 355 cancel all pending I/O, refuse all new I/O, returning -EIO to 356 higher layers. The device driver should then c 356 higher layers. The device driver should then clean up all of its 357 memory and remove itself from kernel operation 357 memory and remove itself from kernel operations, much as it would 358 during system shutdown. 358 during system shutdown. 359 359 360 The platform will typically notify the system 360 The platform will typically notify the system operator of the 361 permanent failure in some way. If the device 361 permanent failure in some way. If the device is hotplug-capable, 362 the operator will probably want to remove and 362 the operator will probably want to remove and replace the device. 363 Note, however, not all failures are truly "per 363 Note, however, not all failures are truly "permanent". Some are 364 caused by over-heating, some by a poorly seate 364 caused by over-heating, some by a poorly seated card. Many 365 PCI error events are caused by software bugs, 365 PCI error events are caused by software bugs, e.g. DMAs to 366 wild addresses or bogus split transactions due 366 wild addresses or bogus split transactions due to programming 367 errors. See the discussion in Documentation/ar !! 367 errors. See the discussion in Documentation/powerpc/eeh-pci-error-recovery.rst 368 for additional detail on real-life experience 368 for additional detail on real-life experience of the causes of 369 software errors. 369 software errors. 370 370 371 371 372 Conclusion; General Remarks 372 Conclusion; General Remarks 373 --------------------------- 373 --------------------------- 374 The way the callbacks are called is platform p 374 The way the callbacks are called is platform policy. A platform with 375 no slot reset capability may want to just "ign 375 no slot reset capability may want to just "ignore" drivers that can't 376 recover (disconnect them) and try to let other 376 recover (disconnect them) and try to let other cards on the same segment 377 recover. Keep in mind that in most real life c 377 recover. Keep in mind that in most real life cases, though, there will 378 be only one driver per segment. 378 be only one driver per segment. 379 379 380 Now, a note about interrupts. If you get an in 380 Now, a note about interrupts. If you get an interrupt and your 381 device is dead or has been isolated, there is 381 device is dead or has been isolated, there is a problem :) 382 The current policy is to turn this into a plat 382 The current policy is to turn this into a platform policy. 383 That is, the recovery API only requires that: 383 That is, the recovery API only requires that: 384 384 385 - There is no guarantee that interrupt delive 385 - There is no guarantee that interrupt delivery can proceed from any 386 device on the segment starting from the err 386 device on the segment starting from the error detection and until the 387 slot_reset callback is called, at which poi 387 slot_reset callback is called, at which point interrupts are expected 388 to be fully operational. 388 to be fully operational. 389 389 390 - There is no guarantee that interrupt delive 390 - There is no guarantee that interrupt delivery is stopped, that is, 391 a driver that gets an interrupt after detec 391 a driver that gets an interrupt after detecting an error, or that detects 392 an error within the interrupt handler such 392 an error within the interrupt handler such that it prevents proper 393 ack'ing of the interrupt (and thus removal 393 ack'ing of the interrupt (and thus removal of the source) should just 394 return IRQ_NOTHANDLED. It's up to the platf 394 return IRQ_NOTHANDLED. It's up to the platform to deal with that 395 condition, typically by masking the IRQ sou 395 condition, typically by masking the IRQ source during the duration of 396 the error handling. It is expected that the 396 the error handling. It is expected that the platform "knows" which 397 interrupts are routed to error-management c 397 interrupts are routed to error-management capable slots and can deal 398 with temporarily disabling that IRQ number 398 with temporarily disabling that IRQ number during error processing (this 399 isn't terribly complex). That means some IR 399 isn't terribly complex). That means some IRQ latency for other devices 400 sharing the interrupt, but there is simply 400 sharing the interrupt, but there is simply no other way. High end 401 platforms aren't supposed to share interrup 401 platforms aren't supposed to share interrupts between many devices 402 anyway :) 402 anyway :) 403 403 404 .. note:: 404 .. note:: 405 405 406 Implementation details for the powerpc plat 406 Implementation details for the powerpc platform are discussed in 407 the file Documentation/arch/powerpc/eeh-pci !! 407 the file Documentation/powerpc/eeh-pci-error-recovery.rst 408 408 409 As of this writing, there is a growing list 409 As of this writing, there is a growing list of device drivers with 410 patches implementing error recovery. Not al 410 patches implementing error recovery. Not all of these patches are in 411 mainline yet. These may be used as "example 411 mainline yet. These may be used as "examples": 412 412 413 - drivers/scsi/ipr 413 - drivers/scsi/ipr 414 - drivers/scsi/sym53c8xx_2 414 - drivers/scsi/sym53c8xx_2 415 - drivers/scsi/qla2xxx 415 - drivers/scsi/qla2xxx 416 - drivers/scsi/lpfc 416 - drivers/scsi/lpfc 417 - drivers/next/bnx2.c 417 - drivers/next/bnx2.c 418 - drivers/next/e100.c 418 - drivers/next/e100.c 419 - drivers/net/e1000 419 - drivers/net/e1000 420 - drivers/net/e1000e 420 - drivers/net/e1000e 421 - drivers/net/ixgbe 421 - drivers/net/ixgbe 422 - drivers/net/cxgb3 422 - drivers/net/cxgb3 423 - drivers/net/s2io.c 423 - drivers/net/s2io.c 424 424 425 The cor_error_detected() callback is invoke 425 The cor_error_detected() callback is invoked in handle_error_source() when 426 the error severity is "correctable". The ca 426 the error severity is "correctable". The callback is optional and allows 427 additional logging to be done if desired. S 427 additional logging to be done if desired. See example: 428 428 429 - drivers/cxl/pci.c 429 - drivers/cxl/pci.c 430 430 431 The End 431 The End 432 ------- 432 -------

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

TOMOYO Linux Cross Reference
Linux/Documentation/PCI/pci-error-recovery.rst

Diff markup

Differences between /Documentation/PCI/pci-error-recovery.rst (Version linux-6.11.5) and /Documentation/PCI/pci-error-recovery.rst (Version linux-6.6.58)

TOMOYO Linux Cross Reference Linux/Documentation/PCI/pci-error-recovery.rst

Diff markup

Differences between /Documentation/PCI/pci-error-recovery.rst (Version linux-6.11.5) and /Documentation/PCI/pci-error-recovery.rst (Version linux-6.6.58)

TOMOYO Linux Cross Reference
Linux/Documentation/PCI/pci-error-recovery.rst