1 .. SPDX-License-Identifier: GPL-2.0 1 .. SPDX-License-Identifier: GPL-2.0 2 2 3 ================== 3 ================== 4 PCI Error Recovery 4 PCI Error Recovery 5 ================== 5 ================== 6 6 7 7 8 :Authors: - Linas Vepstas <linasvepstas@gmail.c 8 :Authors: - Linas Vepstas <linasvepstas@gmail.com> 9 - Richard Lary <rlary@us.ibm.com> 9 - Richard Lary <rlary@us.ibm.com> 10 - Mike Mason <mmlnx@us.ibm.com> 10 - Mike Mason <mmlnx@us.ibm.com> 11 11 12 12 13 Many PCI bus controllers are able to detect a 13 Many PCI bus controllers are able to detect a variety of hardware 14 PCI errors on the bus, such as parity errors o 14 PCI errors on the bus, such as parity errors on the data and address 15 buses, as well as SERR and PERR errors. Some 15 buses, as well as SERR and PERR errors. Some of the more advanced 16 chipsets are able to deal with these errors; t 16 chipsets are able to deal with these errors; these include PCI-E chipsets, 17 and the PCI-host bridges found on IBM Power4, 17 and the PCI-host bridges found on IBM Power4, Power5 and Power6-based 18 pSeries boxes. A typical action taken is to di 18 pSeries boxes. A typical action taken is to disconnect the affected device, 19 halting all I/O to it. The goal of a disconne 19 halting all I/O to it. The goal of a disconnection is to avoid system 20 corruption; for example, to halt system memory !! 20 corruption; for example, to halt system memory corruption due to DMA's 21 to "wild" addresses. Typically, a reconnection 21 to "wild" addresses. Typically, a reconnection mechanism is also 22 offered, so that the affected PCI device(s) ar 22 offered, so that the affected PCI device(s) are reset and put back 23 into working condition. The reset phase requir 23 into working condition. The reset phase requires coordination 24 between the affected device drivers and the PC 24 between the affected device drivers and the PCI controller chip. 25 This document describes a generic API for noti 25 This document describes a generic API for notifying device drivers 26 of a bus disconnection, and then performing er 26 of a bus disconnection, and then performing error recovery. 27 This API is currently implemented in the 2.6.1 27 This API is currently implemented in the 2.6.16 and later kernels. 28 28 29 Reporting and recovery is performed in several 29 Reporting and recovery is performed in several steps. First, when 30 a PCI hardware error has resulted in a bus dis 30 a PCI hardware error has resulted in a bus disconnect, that event 31 is reported as soon as possible to all affecte 31 is reported as soon as possible to all affected device drivers, 32 including multiple instances of a device drive 32 including multiple instances of a device driver on multi-function 33 cards. This allows device drivers to avoid dea 33 cards. This allows device drivers to avoid deadlocking in spinloops, 34 waiting for some i/o-space register to change, 34 waiting for some i/o-space register to change, when it never will. 35 It also gives the drivers a chance to defer in 35 It also gives the drivers a chance to defer incoming I/O as 36 needed. 36 needed. 37 37 38 Next, recovery is performed in several stages. 38 Next, recovery is performed in several stages. Most of the complexity 39 is forced by the need to handle multi-function 39 is forced by the need to handle multi-function devices, that is, 40 devices that have multiple device drivers asso 40 devices that have multiple device drivers associated with them. 41 In the first stage, each driver is allowed to 41 In the first stage, each driver is allowed to indicate what type 42 of reset it desires, the choices being a simpl 42 of reset it desires, the choices being a simple re-enabling of I/O 43 or requesting a slot reset. 43 or requesting a slot reset. 44 44 45 If any driver requests a slot reset, that is w 45 If any driver requests a slot reset, that is what will be done. 46 46 47 After a reset and/or a re-enabling of I/O, all 47 After a reset and/or a re-enabling of I/O, all drivers are 48 again notified, so that they may then perform 48 again notified, so that they may then perform any device setup/config 49 that may be required. After these have all co 49 that may be required. After these have all completed, a final 50 "resume normal operations" event is sent out. 50 "resume normal operations" event is sent out. 51 51 52 The biggest reason for choosing a kernel-based 52 The biggest reason for choosing a kernel-based implementation rather 53 than a user-space implementation was the need 53 than a user-space implementation was the need to deal with bus 54 disconnects of PCI devices attached to storage 54 disconnects of PCI devices attached to storage media, and, in particular, 55 disconnects from devices holding the root file 55 disconnects from devices holding the root file system. If the root 56 file system is disconnected, a user-space mech 56 file system is disconnected, a user-space mechanism would have to go 57 through a large number of contortions to compl 57 through a large number of contortions to complete recovery. Almost all 58 of the current Linux file systems are not tole 58 of the current Linux file systems are not tolerant of disconnection 59 from/reconnection to their underlying block de 59 from/reconnection to their underlying block device. By contrast, 60 bus errors are easy to manage in the device dr 60 bus errors are easy to manage in the device driver. Indeed, most 61 device drivers already handle very similar rec 61 device drivers already handle very similar recovery procedures; 62 for example, the SCSI-generic layer already pr 62 for example, the SCSI-generic layer already provides significant 63 mechanisms for dealing with SCSI bus errors an 63 mechanisms for dealing with SCSI bus errors and SCSI bus resets. 64 64 65 65 66 Detailed Design 66 Detailed Design 67 =============== 67 =============== 68 68 69 Design and implementation details below, based 69 Design and implementation details below, based on a chain of 70 public email discussions with Ben Herrenschmid 70 public email discussions with Ben Herrenschmidt, circa 5 April 2005. 71 71 72 The error recovery API support is exposed to t 72 The error recovery API support is exposed to the driver in the form of 73 a structure of function pointers pointed to by 73 a structure of function pointers pointed to by a new field in struct 74 pci_driver. A driver that fails to provide the 74 pci_driver. A driver that fails to provide the structure is "non-aware", 75 and the actual recovery steps taken are platfo 75 and the actual recovery steps taken are platform dependent. The 76 arch/powerpc implementation will simulate a PC 76 arch/powerpc implementation will simulate a PCI hotplug remove/add. 77 77 78 This structure has the form:: 78 This structure has the form:: 79 79 80 struct pci_error_handlers 80 struct pci_error_handlers 81 { 81 { 82 int (*error_detected)(struct p !! 82 int (*error_detected)(struct pci_dev *dev, enum pci_channel_state); 83 int (*mmio_enabled)(struct pci 83 int (*mmio_enabled)(struct pci_dev *dev); 84 int (*slot_reset)(struct pci_d 84 int (*slot_reset)(struct pci_dev *dev); 85 void (*resume)(struct pci_dev 85 void (*resume)(struct pci_dev *dev); 86 void (*cor_error_detected)(str << 87 }; 86 }; 88 87 89 The possible channel states are:: 88 The possible channel states are:: 90 89 91 typedef enum { !! 90 enum pci_channel_state { 92 pci_channel_io_normal, /* I/O 91 pci_channel_io_normal, /* I/O channel is in normal state */ 93 pci_channel_io_frozen, /* I/O 92 pci_channel_io_frozen, /* I/O to channel is blocked */ 94 pci_channel_io_perm_failure, / 93 pci_channel_io_perm_failure, /* PCI card is dead */ 95 } pci_channel_state_t; !! 94 }; 96 95 97 Possible return values are:: 96 Possible return values are:: 98 97 99 enum pci_ers_result { 98 enum pci_ers_result { 100 PCI_ERS_RESULT_NONE, /* 99 PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */ 101 PCI_ERS_RESULT_CAN_RECOVER, /* 100 PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */ 102 PCI_ERS_RESULT_NEED_RESET, /* 101 PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */ 103 PCI_ERS_RESULT_DISCONNECT, /* 102 PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */ 104 PCI_ERS_RESULT_RECOVERED, /* 103 PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */ 105 }; 104 }; 106 105 107 A driver does not have to implement all of the 106 A driver does not have to implement all of these callbacks; however, 108 if it implements any, it must implement error_ 107 if it implements any, it must implement error_detected(). If a callback 109 is not implemented, the corresponding feature 108 is not implemented, the corresponding feature is considered unsupported. 110 For example, if mmio_enabled() and resume() ar 109 For example, if mmio_enabled() and resume() aren't there, then it 111 is assumed that the driver is not doing any di 110 is assumed that the driver is not doing any direct recovery and requires 112 a slot reset. Typically a driver will want to 111 a slot reset. Typically a driver will want to know about 113 a slot_reset(). 112 a slot_reset(). 114 113 115 The actual steps taken by a platform to recove 114 The actual steps taken by a platform to recover from a PCI error 116 event will be platform-dependent, but will fol 115 event will be platform-dependent, but will follow the general 117 sequence described below. 116 sequence described below. 118 117 119 STEP 0: Error Event 118 STEP 0: Error Event 120 ------------------- 119 ------------------- 121 A PCI bus error is detected by the PCI hardwar 120 A PCI bus error is detected by the PCI hardware. On powerpc, the slot 122 is isolated, in that all I/O is blocked: all r 121 is isolated, in that all I/O is blocked: all reads return 0xffffffff, 123 all writes are ignored. 122 all writes are ignored. 124 123 125 124 126 STEP 1: Notification 125 STEP 1: Notification 127 -------------------- 126 -------------------- 128 Platform calls the error_detected() callback o 127 Platform calls the error_detected() callback on every instance of 129 every driver affected by the error. 128 every driver affected by the error. 130 129 131 At this point, the device might not be accessi 130 At this point, the device might not be accessible anymore, depending on 132 the platform (the slot will be isolated on pow 131 the platform (the slot will be isolated on powerpc). The driver may 133 already have "noticed" the error because of a 132 already have "noticed" the error because of a failing I/O, but this 134 is the proper "synchronization point", that is 133 is the proper "synchronization point", that is, it gives the driver 135 a chance to cleanup, waiting for pending stuff 134 a chance to cleanup, waiting for pending stuff (timers, whatever, etc...) 136 to complete; it can take semaphores, schedule, 135 to complete; it can take semaphores, schedule, etc... everything but 137 touch the device. Within this function and aft 136 touch the device. Within this function and after it returns, the driver 138 shouldn't do any new IOs. Called in task conte 137 shouldn't do any new IOs. Called in task context. This is sort of a 139 "quiesce" point. See note about interrupts at 138 "quiesce" point. See note about interrupts at the end of this doc. 140 139 141 All drivers participating in this system must 140 All drivers participating in this system must implement this call. 142 The driver must return one of the following re 141 The driver must return one of the following result codes: 143 142 144 - PCI_ERS_RESULT_CAN_RECOVER 143 - PCI_ERS_RESULT_CAN_RECOVER 145 Driver returns this if it thinks it migh 144 Driver returns this if it thinks it might be able to recover 146 the HW by just banging IOs or if it want 145 the HW by just banging IOs or if it wants to be given 147 a chance to extract some diagnostic info 146 a chance to extract some diagnostic information (see 148 mmio_enable, below). 147 mmio_enable, below). 149 - PCI_ERS_RESULT_NEED_RESET 148 - PCI_ERS_RESULT_NEED_RESET 150 Driver returns this if it can't recover 149 Driver returns this if it can't recover without a 151 slot reset. 150 slot reset. 152 - PCI_ERS_RESULT_DISCONNECT 151 - PCI_ERS_RESULT_DISCONNECT 153 Driver returns this if it doesn't want t 152 Driver returns this if it doesn't want to recover at all. 154 153 155 The next step taken will depend on the result 154 The next step taken will depend on the result codes returned by the 156 drivers. 155 drivers. 157 156 158 If all drivers on the segment/slot return PCI_ 157 If all drivers on the segment/slot return PCI_ERS_RESULT_CAN_RECOVER, 159 then the platform should re-enable IOs on the 158 then the platform should re-enable IOs on the slot (or do nothing in 160 particular, if the platform doesn't isolate sl 159 particular, if the platform doesn't isolate slots), and recovery 161 proceeds to STEP 2 (MMIO Enable). 160 proceeds to STEP 2 (MMIO Enable). 162 161 163 If any driver requested a slot reset (by retur 162 If any driver requested a slot reset (by returning PCI_ERS_RESULT_NEED_RESET), 164 then recovery proceeds to STEP 4 (Slot Reset). 163 then recovery proceeds to STEP 4 (Slot Reset). 165 164 166 If the platform is unable to recover the slot, 165 If the platform is unable to recover the slot, the next step 167 is STEP 6 (Permanent Failure). 166 is STEP 6 (Permanent Failure). 168 167 169 .. note:: 168 .. note:: 170 169 171 The current powerpc implementation assumes 170 The current powerpc implementation assumes that a device driver will 172 *not* schedule or semaphore in this routine 171 *not* schedule or semaphore in this routine; the current powerpc 173 implementation uses one kernel thread to no 172 implementation uses one kernel thread to notify all devices; 174 thus, if one device sleeps/schedules, all d 173 thus, if one device sleeps/schedules, all devices are affected. 175 Doing better requires complex multi-threade 174 Doing better requires complex multi-threaded logic in the error 176 recovery implementation (e.g. waiting for a 175 recovery implementation (e.g. waiting for all notification threads 177 to "join" before proceeding with recovery.) 176 to "join" before proceeding with recovery.) This seems excessively 178 complex and not worth implementing. 177 complex and not worth implementing. 179 178 180 The current powerpc implementation doesn't 179 The current powerpc implementation doesn't much care if the device 181 attempts I/O at this point, or not. I/Os w !! 180 attempts I/O at this point, or not. I/O's will fail, returning 182 a value of 0xff on read, and writes will be 181 a value of 0xff on read, and writes will be dropped. If more than 183 EEH_MAX_FAILS I/Os are attempted to a froze !! 182 EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH 184 assumes that the device driver has gone int 183 assumes that the device driver has gone into an infinite loop 185 and prints an error to syslog. A reboot is 184 and prints an error to syslog. A reboot is then required to 186 get the device working again. 185 get the device working again. 187 186 188 STEP 2: MMIO Enabled 187 STEP 2: MMIO Enabled 189 -------------------- 188 -------------------- 190 The platform re-enables MMIO to the device (bu 189 The platform re-enables MMIO to the device (but typically not the 191 DMA), and then calls the mmio_enabled() callba 190 DMA), and then calls the mmio_enabled() callback on all affected 192 device drivers. 191 device drivers. 193 192 194 This is the "early recovery" call. IOs are all 193 This is the "early recovery" call. IOs are allowed again, but DMA is 195 not, with some restrictions. This is NOT a cal 194 not, with some restrictions. This is NOT a callback for the driver to 196 start operations again, only to peek/poke at t 195 start operations again, only to peek/poke at the device, extract diagnostic 197 information, if any, and eventually do things 196 information, if any, and eventually do things like trigger a device local 198 reset or some such, but not restart operations 197 reset or some such, but not restart operations. This callback is made if 199 all drivers on a segment agree that they can t 198 all drivers on a segment agree that they can try to recover and if no automatic 200 link reset was performed by the HW. If the pla 199 link reset was performed by the HW. If the platform can't just re-enable IOs 201 without a slot reset or a link reset, it will 200 without a slot reset or a link reset, it will not call this callback, and 202 instead will have gone directly to STEP 3 (Lin 201 instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset) 203 202 204 .. note:: 203 .. note:: 205 204 206 The following is proposed; no platform impl 205 The following is proposed; no platform implements this yet: 207 Proposal: All I/Os should be done _synchron !! 206 Proposal: All I/O's should be done _synchronously_ from within 208 this callback, errors triggered by them wil 207 this callback, errors triggered by them will be returned via 209 the normal pci_check_whatever() API, no new 208 the normal pci_check_whatever() API, no new error_detected() 210 callback will be issued due to an error hap 209 callback will be issued due to an error happening here. However, 211 such an error might cause IOs to be re-bloc 210 such an error might cause IOs to be re-blocked for the whole 212 segment, and thus invalidate the recovery t 211 segment, and thus invalidate the recovery that other devices 213 on the same segment might have done, forcin 212 on the same segment might have done, forcing the whole segment 214 into one of the next states, that is, link 213 into one of the next states, that is, link reset or slot reset. 215 214 216 The driver should return one of the following 215 The driver should return one of the following result codes: 217 - PCI_ERS_RESULT_RECOVERED 216 - PCI_ERS_RESULT_RECOVERED 218 Driver returns this if it thinks the dev 217 Driver returns this if it thinks the device is fully 219 functional and thinks it is ready to sta 218 functional and thinks it is ready to start 220 normal driver operations again. There is 219 normal driver operations again. There is no 221 guarantee that the driver will actually 220 guarantee that the driver will actually be 222 allowed to proceed, as another driver on 221 allowed to proceed, as another driver on the 223 same segment might have failed and thus 222 same segment might have failed and thus triggered a 224 slot reset on platforms that support it. 223 slot reset on platforms that support it. 225 224 226 - PCI_ERS_RESULT_NEED_RESET 225 - PCI_ERS_RESULT_NEED_RESET 227 Driver returns this if it thinks the dev 226 Driver returns this if it thinks the device is not 228 recoverable in its current state and it 227 recoverable in its current state and it needs a slot 229 reset to proceed. 228 reset to proceed. 230 229 231 - PCI_ERS_RESULT_DISCONNECT 230 - PCI_ERS_RESULT_DISCONNECT 232 Same as above. Total failure, no recover 231 Same as above. Total failure, no recovery even after 233 reset driver dead. (To be defined more p 232 reset driver dead. (To be defined more precisely) 234 233 235 The next step taken depends on the results ret 234 The next step taken depends on the results returned by the drivers. 236 If all drivers returned PCI_ERS_RESULT_RECOVER 235 If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform 237 proceeds to either STEP3 (Link Reset) or to ST 236 proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations). 238 237 239 If any driver returned PCI_ERS_RESULT_NEED_RES 238 If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform 240 proceeds to STEP 4 (Slot Reset) 239 proceeds to STEP 4 (Slot Reset) 241 240 242 STEP 3: Link Reset 241 STEP 3: Link Reset 243 ------------------ 242 ------------------ 244 The platform resets the link. This is a PCI-E 243 The platform resets the link. This is a PCI-Express specific step 245 and is done whenever a fatal error has been de 244 and is done whenever a fatal error has been detected that can be 246 "solved" by resetting the link. 245 "solved" by resetting the link. 247 246 248 STEP 4: Slot Reset 247 STEP 4: Slot Reset 249 ------------------ 248 ------------------ 250 249 251 In response to a return value of PCI_ERS_RESUL 250 In response to a return value of PCI_ERS_RESULT_NEED_RESET, the 252 platform will perform a slot reset on the requ !! 251 the platform will perform a slot reset on the requesting PCI device(s). 253 The actual steps taken by a platform to perfor 252 The actual steps taken by a platform to perform a slot reset 254 will be platform-dependent. Upon completion of 253 will be platform-dependent. Upon completion of slot reset, the 255 platform will call the device slot_reset() cal 254 platform will call the device slot_reset() callback. 256 255 257 Powerpc platforms implement two levels of slot 256 Powerpc platforms implement two levels of slot reset: 258 soft reset(default) and fundamental(optional) 257 soft reset(default) and fundamental(optional) reset. 259 258 260 Powerpc soft reset consists of asserting the a 259 Powerpc soft reset consists of asserting the adapter #RST line and then 261 restoring the PCI BARs and PCI configuration h !! 260 restoring the PCI BAR's and PCI configuration header to a state 262 that is equivalent to what it would be after a 261 that is equivalent to what it would be after a fresh system 263 power-on followed by power-on BIOS/system firm 262 power-on followed by power-on BIOS/system firmware initialization. 264 Soft reset is also known as hot-reset. 263 Soft reset is also known as hot-reset. 265 264 266 Powerpc fundamental reset is supported by PCI 265 Powerpc fundamental reset is supported by PCI Express cards only 267 and results in device's state machines, hardwa 266 and results in device's state machines, hardware logic, port states and 268 configuration registers to initialize to their 267 configuration registers to initialize to their default conditions. 269 268 270 For most PCI devices, a soft reset will be suf 269 For most PCI devices, a soft reset will be sufficient for recovery. 271 Optional fundamental reset is provided to supp 270 Optional fundamental reset is provided to support a limited number 272 of PCI Express devices for which a soft reset 271 of PCI Express devices for which a soft reset is not sufficient 273 for recovery. 272 for recovery. 274 273 275 If the platform supports PCI hotplug, then the 274 If the platform supports PCI hotplug, then the reset might be 276 performed by toggling the slot electrical powe 275 performed by toggling the slot electrical power off/on. 277 276 278 It is important for the platform to restore th 277 It is important for the platform to restore the PCI config space 279 to the "fresh poweron" state, rather than the 278 to the "fresh poweron" state, rather than the "last state". After 280 a slot reset, the device driver will almost al 279 a slot reset, the device driver will almost always use its standard 281 device initialization routines, and an unusual 280 device initialization routines, and an unusual config space setup 282 may result in hung devices, kernel panics, or 281 may result in hung devices, kernel panics, or silent data corruption. 283 282 284 This call gives drivers the chance to re-initi 283 This call gives drivers the chance to re-initialize the hardware 285 (re-download firmware, etc.). At this point, 284 (re-download firmware, etc.). At this point, the driver may assume 286 that the card is in a fresh state and is fully 285 that the card is in a fresh state and is fully functional. The slot 287 is unfrozen and the driver has full access to 286 is unfrozen and the driver has full access to PCI config space, 288 memory mapped I/O space and DMA. Interrupts (L 287 memory mapped I/O space and DMA. Interrupts (Legacy, MSI, or MSI-X) 289 will also be available. 288 will also be available. 290 289 291 Drivers should not restart normal I/O processi 290 Drivers should not restart normal I/O processing operations 292 at this point. If all device drivers report s 291 at this point. If all device drivers report success on this 293 callback, the platform will call resume() to c 292 callback, the platform will call resume() to complete the sequence, 294 and let the driver restart normal I/O processi 293 and let the driver restart normal I/O processing. 295 294 296 A driver can still return a critical failure f 295 A driver can still return a critical failure for this function if 297 it can't get the device operational after rese 296 it can't get the device operational after reset. If the platform 298 previously tried a soft reset, it might now tr 297 previously tried a soft reset, it might now try a hard reset (power 299 cycle) and then call slot_reset() again. If t !! 298 cycle) and then call slot_reset() again. It the device still can't 300 be recovered, there is nothing more that can b 299 be recovered, there is nothing more that can be done; the platform 301 will typically report a "permanent failure" in 300 will typically report a "permanent failure" in such a case. The 302 device will be considered "dead" in this case. 301 device will be considered "dead" in this case. 303 302 304 Drivers for multi-function cards will need to 303 Drivers for multi-function cards will need to coordinate among 305 themselves as to which driver instance will pe 304 themselves as to which driver instance will perform any "one-shot" 306 or global device initialization. For example, 305 or global device initialization. For example, the Symbios sym53cxx2 307 driver performs device init only from PCI func 306 driver performs device init only from PCI function 0:: 308 307 309 + if (PCI_FUNC(pdev->devfn) == 0 308 + if (PCI_FUNC(pdev->devfn) == 0) 310 + sym_reset_scsi_bus(np, 309 + sym_reset_scsi_bus(np, 0); 311 310 312 Result codes: 311 Result codes: 313 - PCI_ERS_RESULT_DISCONNECT 312 - PCI_ERS_RESULT_DISCONNECT 314 Same as above. 313 Same as above. 315 314 316 Drivers for PCI Express cards that require a f 315 Drivers for PCI Express cards that require a fundamental reset must 317 set the needs_freset bit in the pci_dev struct 316 set the needs_freset bit in the pci_dev structure in their probe function. 318 For example, the QLogic qla2xxx driver sets th 317 For example, the QLogic qla2xxx driver sets the needs_freset bit for certain 319 PCI card types:: 318 PCI card types:: 320 319 321 + /* Set EEH reset type to funda 320 + /* Set EEH reset type to fundamental if required by hba */ 322 + if (IS_QLA24XX(ha) || IS_QLA25 321 + if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha)) 323 + pdev->needs_freset = 1 322 + pdev->needs_freset = 1; 324 + 323 + 325 324 326 Platform proceeds either to STEP 5 (Resume Ope 325 Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent 327 Failure). 326 Failure). 328 327 329 .. note:: 328 .. note:: 330 329 331 The current powerpc implementation does not 330 The current powerpc implementation does not try a power-cycle 332 reset if the driver returned PCI_ERS_RESULT 331 reset if the driver returned PCI_ERS_RESULT_DISCONNECT. 333 However, it probably should. 332 However, it probably should. 334 333 335 334 336 STEP 5: Resume Operations 335 STEP 5: Resume Operations 337 ------------------------- 336 ------------------------- 338 The platform will call the resume() callback o 337 The platform will call the resume() callback on all affected device 339 drivers if all drivers on the segment have ret 338 drivers if all drivers on the segment have returned 340 PCI_ERS_RESULT_RECOVERED from one of the 3 pre 339 PCI_ERS_RESULT_RECOVERED from one of the 3 previous callbacks. 341 The goal of this callback is to tell the drive 340 The goal of this callback is to tell the driver to restart activity, 342 that everything is back and running. This call 341 that everything is back and running. This callback does not return 343 a result code. 342 a result code. 344 343 345 At this point, if a new error happens, the pla 344 At this point, if a new error happens, the platform will restart 346 a new error recovery sequence. 345 a new error recovery sequence. 347 346 348 STEP 6: Permanent Failure 347 STEP 6: Permanent Failure 349 ------------------------- 348 ------------------------- 350 A "permanent failure" has occurred, and the pl 349 A "permanent failure" has occurred, and the platform cannot recover 351 the device. The platform will call error_dete 350 the device. The platform will call error_detected() with a 352 pci_channel_state_t value of pci_channel_io_pe !! 351 pci_channel_state value of pci_channel_io_perm_failure. 353 352 354 The device driver should, at this point, assum 353 The device driver should, at this point, assume the worst. It should 355 cancel all pending I/O, refuse all new I/O, re 354 cancel all pending I/O, refuse all new I/O, returning -EIO to 356 higher layers. The device driver should then c 355 higher layers. The device driver should then clean up all of its 357 memory and remove itself from kernel operation 356 memory and remove itself from kernel operations, much as it would 358 during system shutdown. 357 during system shutdown. 359 358 360 The platform will typically notify the system 359 The platform will typically notify the system operator of the 361 permanent failure in some way. If the device 360 permanent failure in some way. If the device is hotplug-capable, 362 the operator will probably want to remove and 361 the operator will probably want to remove and replace the device. 363 Note, however, not all failures are truly "per 362 Note, however, not all failures are truly "permanent". Some are 364 caused by over-heating, some by a poorly seate 363 caused by over-heating, some by a poorly seated card. Many 365 PCI error events are caused by software bugs, !! 364 PCI error events are caused by software bugs, e.g. DMA's to 366 wild addresses or bogus split transactions due 365 wild addresses or bogus split transactions due to programming 367 errors. See the discussion in Documentation/ar !! 366 errors. See the discussion in powerpc/eeh-pci-error-recovery.txt 368 for additional detail on real-life experience 367 for additional detail on real-life experience of the causes of 369 software errors. 368 software errors. 370 369 371 370 372 Conclusion; General Remarks 371 Conclusion; General Remarks 373 --------------------------- 372 --------------------------- 374 The way the callbacks are called is platform p 373 The way the callbacks are called is platform policy. A platform with 375 no slot reset capability may want to just "ign 374 no slot reset capability may want to just "ignore" drivers that can't 376 recover (disconnect them) and try to let other 375 recover (disconnect them) and try to let other cards on the same segment 377 recover. Keep in mind that in most real life c 376 recover. Keep in mind that in most real life cases, though, there will 378 be only one driver per segment. 377 be only one driver per segment. 379 378 380 Now, a note about interrupts. If you get an in 379 Now, a note about interrupts. If you get an interrupt and your 381 device is dead or has been isolated, there is 380 device is dead or has been isolated, there is a problem :) 382 The current policy is to turn this into a plat 381 The current policy is to turn this into a platform policy. 383 That is, the recovery API only requires that: 382 That is, the recovery API only requires that: 384 383 385 - There is no guarantee that interrupt delive 384 - There is no guarantee that interrupt delivery can proceed from any 386 device on the segment starting from the err 385 device on the segment starting from the error detection and until the 387 slot_reset callback is called, at which poi 386 slot_reset callback is called, at which point interrupts are expected 388 to be fully operational. 387 to be fully operational. 389 388 390 - There is no guarantee that interrupt delive 389 - There is no guarantee that interrupt delivery is stopped, that is, 391 a driver that gets an interrupt after detec 390 a driver that gets an interrupt after detecting an error, or that detects 392 an error within the interrupt handler such 391 an error within the interrupt handler such that it prevents proper 393 ack'ing of the interrupt (and thus removal 392 ack'ing of the interrupt (and thus removal of the source) should just 394 return IRQ_NOTHANDLED. It's up to the platf 393 return IRQ_NOTHANDLED. It's up to the platform to deal with that 395 condition, typically by masking the IRQ sou 394 condition, typically by masking the IRQ source during the duration of 396 the error handling. It is expected that the 395 the error handling. It is expected that the platform "knows" which 397 interrupts are routed to error-management c 396 interrupts are routed to error-management capable slots and can deal 398 with temporarily disabling that IRQ number 397 with temporarily disabling that IRQ number during error processing (this 399 isn't terribly complex). That means some IR 398 isn't terribly complex). That means some IRQ latency for other devices 400 sharing the interrupt, but there is simply 399 sharing the interrupt, but there is simply no other way. High end 401 platforms aren't supposed to share interrup 400 platforms aren't supposed to share interrupts between many devices 402 anyway :) 401 anyway :) 403 402 404 .. note:: 403 .. note:: 405 404 406 Implementation details for the powerpc plat 405 Implementation details for the powerpc platform are discussed in 407 the file Documentation/arch/powerpc/eeh-pci !! 406 the file Documentation/powerpc/eeh-pci-error-recovery.rst 408 407 409 As of this writing, there is a growing list 408 As of this writing, there is a growing list of device drivers with 410 patches implementing error recovery. Not al 409 patches implementing error recovery. Not all of these patches are in 411 mainline yet. These may be used as "example 410 mainline yet. These may be used as "examples": 412 411 413 - drivers/scsi/ipr 412 - drivers/scsi/ipr 414 - drivers/scsi/sym53c8xx_2 413 - drivers/scsi/sym53c8xx_2 415 - drivers/scsi/qla2xxx 414 - drivers/scsi/qla2xxx 416 - drivers/scsi/lpfc 415 - drivers/scsi/lpfc 417 - drivers/next/bnx2.c 416 - drivers/next/bnx2.c 418 - drivers/next/e100.c 417 - drivers/next/e100.c 419 - drivers/net/e1000 418 - drivers/net/e1000 420 - drivers/net/e1000e 419 - drivers/net/e1000e >> 420 - drivers/net/ixgb 421 - drivers/net/ixgbe 421 - drivers/net/ixgbe 422 - drivers/net/cxgb3 422 - drivers/net/cxgb3 423 - drivers/net/s2io.c 423 - drivers/net/s2io.c 424 !! 424 - drivers/net/qlge 425 The cor_error_detected() callback is invoke << 426 the error severity is "correctable". The ca << 427 additional logging to be done if desired. S << 428 << 429 - drivers/cxl/pci.c << 430 425 431 The End 426 The End 432 ------- 427 -------
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.