1 .. SPDX-License-Identifier: GPL-2.0 2 3 ================== 4 PCI Error Recovery 5 ================== 6 7 8 :Authors: - Linas Vepstas <linasvepstas@gmail.c 9 - Richard Lary <rlary@us.ibm.com> 10 - Mike Mason <mmlnx@us.ibm.com> 11 12 13 Many PCI bus controllers are able to detect a 14 PCI errors on the bus, such as parity errors o 15 buses, as well as SERR and PERR errors. Some 16 chipsets are able to deal with these errors; t 17 and the PCI-host bridges found on IBM Power4, 18 pSeries boxes. A typical action taken is to di 19 halting all I/O to it. The goal of a disconne 20 corruption; for example, to halt system memory 21 to "wild" addresses. Typically, a reconnection 22 offered, so that the affected PCI device(s) ar 23 into working condition. The reset phase requir 24 between the affected device drivers and the PC 25 This document describes a generic API for noti 26 of a bus disconnection, and then performing er 27 This API is currently implemented in the 2.6.1 28 29 Reporting and recovery is performed in several 30 a PCI hardware error has resulted in a bus dis 31 is reported as soon as possible to all affecte 32 including multiple instances of a device drive 33 cards. This allows device drivers to avoid dea 34 waiting for some i/o-space register to change, 35 It also gives the drivers a chance to defer in 36 needed. 37 38 Next, recovery is performed in several stages. 39 is forced by the need to handle multi-function 40 devices that have multiple device drivers asso 41 In the first stage, each driver is allowed to 42 of reset it desires, the choices being a simpl 43 or requesting a slot reset. 44 45 If any driver requests a slot reset, that is w 46 47 After a reset and/or a re-enabling of I/O, all 48 again notified, so that they may then perform 49 that may be required. After these have all co 50 "resume normal operations" event is sent out. 51 52 The biggest reason for choosing a kernel-based 53 than a user-space implementation was the need 54 disconnects of PCI devices attached to storage 55 disconnects from devices holding the root file 56 file system is disconnected, a user-space mech 57 through a large number of contortions to compl 58 of the current Linux file systems are not tole 59 from/reconnection to their underlying block de 60 bus errors are easy to manage in the device dr 61 device drivers already handle very similar rec 62 for example, the SCSI-generic layer already pr 63 mechanisms for dealing with SCSI bus errors an 64 65 66 Detailed Design 67 =============== 68 69 Design and implementation details below, based 70 public email discussions with Ben Herrenschmid 71 72 The error recovery API support is exposed to t 73 a structure of function pointers pointed to by 74 pci_driver. A driver that fails to provide the 75 and the actual recovery steps taken are platfo 76 arch/powerpc implementation will simulate a PC 77 78 This structure has the form:: 79 80 struct pci_error_handlers 81 { 82 int (*error_detected)(struct p 83 int (*mmio_enabled)(struct pci 84 int (*slot_reset)(struct pci_d 85 void (*resume)(struct pci_dev 86 void (*cor_error_detected)(str 87 }; 88 89 The possible channel states are:: 90 91 typedef enum { 92 pci_channel_io_normal, /* I/O 93 pci_channel_io_frozen, /* I/O 94 pci_channel_io_perm_failure, / 95 } pci_channel_state_t; 96 97 Possible return values are:: 98 99 enum pci_ers_result { 100 PCI_ERS_RESULT_NONE, /* 101 PCI_ERS_RESULT_CAN_RECOVER, /* 102 PCI_ERS_RESULT_NEED_RESET, /* 103 PCI_ERS_RESULT_DISCONNECT, /* 104 PCI_ERS_RESULT_RECOVERED, /* 105 }; 106 107 A driver does not have to implement all of the 108 if it implements any, it must implement error_ 109 is not implemented, the corresponding feature 110 For example, if mmio_enabled() and resume() ar 111 is assumed that the driver is not doing any di 112 a slot reset. Typically a driver will want to 113 a slot_reset(). 114 115 The actual steps taken by a platform to recove 116 event will be platform-dependent, but will fol 117 sequence described below. 118 119 STEP 0: Error Event 120 ------------------- 121 A PCI bus error is detected by the PCI hardwar 122 is isolated, in that all I/O is blocked: all r 123 all writes are ignored. 124 125 126 STEP 1: Notification 127 -------------------- 128 Platform calls the error_detected() callback o 129 every driver affected by the error. 130 131 At this point, the device might not be accessi 132 the platform (the slot will be isolated on pow 133 already have "noticed" the error because of a 134 is the proper "synchronization point", that is 135 a chance to cleanup, waiting for pending stuff 136 to complete; it can take semaphores, schedule, 137 touch the device. Within this function and aft 138 shouldn't do any new IOs. Called in task conte 139 "quiesce" point. See note about interrupts at 140 141 All drivers participating in this system must 142 The driver must return one of the following re 143 144 - PCI_ERS_RESULT_CAN_RECOVER 145 Driver returns this if it thinks it migh 146 the HW by just banging IOs or if it want 147 a chance to extract some diagnostic info 148 mmio_enable, below). 149 - PCI_ERS_RESULT_NEED_RESET 150 Driver returns this if it can't recover 151 slot reset. 152 - PCI_ERS_RESULT_DISCONNECT 153 Driver returns this if it doesn't want t 154 155 The next step taken will depend on the result 156 drivers. 157 158 If all drivers on the segment/slot return PCI_ 159 then the platform should re-enable IOs on the 160 particular, if the platform doesn't isolate sl 161 proceeds to STEP 2 (MMIO Enable). 162 163 If any driver requested a slot reset (by retur 164 then recovery proceeds to STEP 4 (Slot Reset). 165 166 If the platform is unable to recover the slot, 167 is STEP 6 (Permanent Failure). 168 169 .. note:: 170 171 The current powerpc implementation assumes 172 *not* schedule or semaphore in this routine 173 implementation uses one kernel thread to no 174 thus, if one device sleeps/schedules, all d 175 Doing better requires complex multi-threade 176 recovery implementation (e.g. waiting for a 177 to "join" before proceeding with recovery.) 178 complex and not worth implementing. 179 180 The current powerpc implementation doesn't 181 attempts I/O at this point, or not. I/Os w 182 a value of 0xff on read, and writes will be 183 EEH_MAX_FAILS I/Os are attempted to a froze 184 assumes that the device driver has gone int 185 and prints an error to syslog. A reboot is 186 get the device working again. 187 188 STEP 2: MMIO Enabled 189 -------------------- 190 The platform re-enables MMIO to the device (bu 191 DMA), and then calls the mmio_enabled() callba 192 device drivers. 193 194 This is the "early recovery" call. IOs are all 195 not, with some restrictions. This is NOT a cal 196 start operations again, only to peek/poke at t 197 information, if any, and eventually do things 198 reset or some such, but not restart operations 199 all drivers on a segment agree that they can t 200 link reset was performed by the HW. If the pla 201 without a slot reset or a link reset, it will 202 instead will have gone directly to STEP 3 (Lin 203 204 .. note:: 205 206 The following is proposed; no platform impl 207 Proposal: All I/Os should be done _synchron 208 this callback, errors triggered by them wil 209 the normal pci_check_whatever() API, no new 210 callback will be issued due to an error hap 211 such an error might cause IOs to be re-bloc 212 segment, and thus invalidate the recovery t 213 on the same segment might have done, forcin 214 into one of the next states, that is, link 215 216 The driver should return one of the following 217 - PCI_ERS_RESULT_RECOVERED 218 Driver returns this if it thinks the dev 219 functional and thinks it is ready to sta 220 normal driver operations again. There is 221 guarantee that the driver will actually 222 allowed to proceed, as another driver on 223 same segment might have failed and thus 224 slot reset on platforms that support it. 225 226 - PCI_ERS_RESULT_NEED_RESET 227 Driver returns this if it thinks the dev 228 recoverable in its current state and it 229 reset to proceed. 230 231 - PCI_ERS_RESULT_DISCONNECT 232 Same as above. Total failure, no recover 233 reset driver dead. (To be defined more p 234 235 The next step taken depends on the results ret 236 If all drivers returned PCI_ERS_RESULT_RECOVER 237 proceeds to either STEP3 (Link Reset) or to ST 238 239 If any driver returned PCI_ERS_RESULT_NEED_RES 240 proceeds to STEP 4 (Slot Reset) 241 242 STEP 3: Link Reset 243 ------------------ 244 The platform resets the link. This is a PCI-E 245 and is done whenever a fatal error has been de 246 "solved" by resetting the link. 247 248 STEP 4: Slot Reset 249 ------------------ 250 251 In response to a return value of PCI_ERS_RESUL 252 platform will perform a slot reset on the requ 253 The actual steps taken by a platform to perfor 254 will be platform-dependent. Upon completion of 255 platform will call the device slot_reset() cal 256 257 Powerpc platforms implement two levels of slot 258 soft reset(default) and fundamental(optional) 259 260 Powerpc soft reset consists of asserting the a 261 restoring the PCI BARs and PCI configuration h 262 that is equivalent to what it would be after a 263 power-on followed by power-on BIOS/system firm 264 Soft reset is also known as hot-reset. 265 266 Powerpc fundamental reset is supported by PCI 267 and results in device's state machines, hardwa 268 configuration registers to initialize to their 269 270 For most PCI devices, a soft reset will be suf 271 Optional fundamental reset is provided to supp 272 of PCI Express devices for which a soft reset 273 for recovery. 274 275 If the platform supports PCI hotplug, then the 276 performed by toggling the slot electrical powe 277 278 It is important for the platform to restore th 279 to the "fresh poweron" state, rather than the 280 a slot reset, the device driver will almost al 281 device initialization routines, and an unusual 282 may result in hung devices, kernel panics, or 283 284 This call gives drivers the chance to re-initi 285 (re-download firmware, etc.). At this point, 286 that the card is in a fresh state and is fully 287 is unfrozen and the driver has full access to 288 memory mapped I/O space and DMA. Interrupts (L 289 will also be available. 290 291 Drivers should not restart normal I/O processi 292 at this point. If all device drivers report s 293 callback, the platform will call resume() to c 294 and let the driver restart normal I/O processi 295 296 A driver can still return a critical failure f 297 it can't get the device operational after rese 298 previously tried a soft reset, it might now tr 299 cycle) and then call slot_reset() again. If t 300 be recovered, there is nothing more that can b 301 will typically report a "permanent failure" in 302 device will be considered "dead" in this case. 303 304 Drivers for multi-function cards will need to 305 themselves as to which driver instance will pe 306 or global device initialization. For example, 307 driver performs device init only from PCI func 308 309 + if (PCI_FUNC(pdev->devfn) == 0 310 + sym_reset_scsi_bus(np, 311 312 Result codes: 313 - PCI_ERS_RESULT_DISCONNECT 314 Same as above. 315 316 Drivers for PCI Express cards that require a f 317 set the needs_freset bit in the pci_dev struct 318 For example, the QLogic qla2xxx driver sets th 319 PCI card types:: 320 321 + /* Set EEH reset type to funda 322 + if (IS_QLA24XX(ha) || IS_QLA25 323 + pdev->needs_freset = 1 324 + 325 326 Platform proceeds either to STEP 5 (Resume Ope 327 Failure). 328 329 .. note:: 330 331 The current powerpc implementation does not 332 reset if the driver returned PCI_ERS_RESULT 333 However, it probably should. 334 335 336 STEP 5: Resume Operations 337 ------------------------- 338 The platform will call the resume() callback o 339 drivers if all drivers on the segment have ret 340 PCI_ERS_RESULT_RECOVERED from one of the 3 pre 341 The goal of this callback is to tell the drive 342 that everything is back and running. This call 343 a result code. 344 345 At this point, if a new error happens, the pla 346 a new error recovery sequence. 347 348 STEP 6: Permanent Failure 349 ------------------------- 350 A "permanent failure" has occurred, and the pl 351 the device. The platform will call error_dete 352 pci_channel_state_t value of pci_channel_io_pe 353 354 The device driver should, at this point, assum 355 cancel all pending I/O, refuse all new I/O, re 356 higher layers. The device driver should then c 357 memory and remove itself from kernel operation 358 during system shutdown. 359 360 The platform will typically notify the system 361 permanent failure in some way. If the device 362 the operator will probably want to remove and 363 Note, however, not all failures are truly "per 364 caused by over-heating, some by a poorly seate 365 PCI error events are caused by software bugs, 366 wild addresses or bogus split transactions due 367 errors. See the discussion in Documentation/ar 368 for additional detail on real-life experience 369 software errors. 370 371 372 Conclusion; General Remarks 373 --------------------------- 374 The way the callbacks are called is platform p 375 no slot reset capability may want to just "ign 376 recover (disconnect them) and try to let other 377 recover. Keep in mind that in most real life c 378 be only one driver per segment. 379 380 Now, a note about interrupts. If you get an in 381 device is dead or has been isolated, there is 382 The current policy is to turn this into a plat 383 That is, the recovery API only requires that: 384 385 - There is no guarantee that interrupt delive 386 device on the segment starting from the err 387 slot_reset callback is called, at which poi 388 to be fully operational. 389 390 - There is no guarantee that interrupt delive 391 a driver that gets an interrupt after detec 392 an error within the interrupt handler such 393 ack'ing of the interrupt (and thus removal 394 return IRQ_NOTHANDLED. It's up to the platf 395 condition, typically by masking the IRQ sou 396 the error handling. It is expected that the 397 interrupts are routed to error-management c 398 with temporarily disabling that IRQ number 399 isn't terribly complex). That means some IR 400 sharing the interrupt, but there is simply 401 platforms aren't supposed to share interrup 402 anyway :) 403 404 .. note:: 405 406 Implementation details for the powerpc plat 407 the file Documentation/arch/powerpc/eeh-pci 408 409 As of this writing, there is a growing list 410 patches implementing error recovery. Not al 411 mainline yet. These may be used as "example 412 413 - drivers/scsi/ipr 414 - drivers/scsi/sym53c8xx_2 415 - drivers/scsi/qla2xxx 416 - drivers/scsi/lpfc 417 - drivers/next/bnx2.c 418 - drivers/next/e100.c 419 - drivers/net/e1000 420 - drivers/net/e1000e 421 - drivers/net/ixgbe 422 - drivers/net/cxgb3 423 - drivers/net/s2io.c 424 425 The cor_error_detected() callback is invoke 426 the error severity is "correctable". The ca 427 additional logging to be done if desired. S 428 429 - drivers/cxl/pci.c 430 431 The End 432 -------
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.