~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/PCI/pcieaer-howto.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/PCI/pcieaer-howto.rst (Version linux-6.11.5) and /Documentation/PCI/pcieaer-howto.rst (Version linux-3.10.108)


  1 .. SPDX-License-Identifier: GPL-2.0               
  2 .. include:: <isonum.txt>                         
  3                                                   
  4 ==============================================    
  5 The PCI Express Advanced Error Reporting Drive    
  6 ==============================================    
  7                                                   
  8 :Authors: - T. Long Nguyen <tom.l.nguyen@intel.    
  9           - Yanmin Zhang <yanmin.zhang@intel.co    
 10                                                   
 11 :Copyright: |copy| 2006 Intel Corporation         
 12                                                   
 13 Overview                                          
 14 ===========                                       
 15                                                   
 16 About this guide                                  
 17 ----------------                                  
 18                                                   
 19 This guide describes the basics of the PCI Exp    
 20 Reporting (AER) driver and provides informatio    
 21 well as how to enable the drivers of Endpoint     
 22 the PCIe AER driver.                              
 23                                                   
 24                                                   
 25 What is the PCIe AER Driver?                      
 26 ----------------------------                      
 27                                                   
 28 PCIe error signaling can occur on the PCIe lin    
 29 or on behalf of transactions initiated on the     
 30 defines two error reporting paradigms: the bas    
 31 the Advanced Error Reporting capability. The b    
 32 required of all PCIe components providing a mi    
 33 set of error reporting requirements. Advanced     
 34 capability is implemented with a PCIe Advanced    
 35 extended capability structure providing more r    
 36                                                   
 37 The PCIe AER driver provides the infrastructur    
 38 Error Reporting capability. The PCIe AER drive    
 39 functions:                                        
 40                                                   
 41   - Gathers the comprehensive error informatio    
 42   - Reports error to the users.                   
 43   - Performs error recovery actions.              
 44                                                   
 45 The AER driver only attaches to Root Ports and    
 46 AER capability.                                   
 47                                                   
 48                                                   
 49 User Guide                                        
 50 ==========                                        
 51                                                   
 52 Include the PCIe AER Root Driver into the Linu    
 53 ----------------------------------------------    
 54                                                   
 55 The PCIe AER driver is a Root Port service dri    
 56 via the PCIe Port Bus driver. If a user wants     
 57 must be compiled. It is enabled with CONFIG_PC    
 58 depends on CONFIG_PCIEPORTBUS.                    
 59                                                   
 60 Load PCIe AER Root Driver                         
 61 -------------------------                         
 62                                                   
 63 Some systems have AER support in firmware. Ena    
 64 the same time the firmware handles AER would r    
 65 behavior. Therefore, Linux does not handle AER    
 66 grants AER control to the OS via the ACPI _OSC    
 67 Specification for details regarding _OSC usage    
 68                                                   
 69 AER error output                                  
 70 ----------------                                  
 71                                                   
 72 When a PCIe AER error is captured, an error me    
 73 console. If it's a correctable error, it is ou    
 74 Otherwise, it is printed as an error. So users    
 75 log level to filter out correctable error mess    
 76                                                   
 77 Below shows an example::                          
 78                                                   
 79   0000:50:00.0: PCIe Bus Error: severity=Uncor    
 80   0000:50:00.0:   device [8086:0329] error sta    
 81   0000:50:00.0:    [20] Unsupported Request       
 82   0000:50:00.0:   TLP Header: 04000001 00200a0    
 83                                                   
 84 In the example, 'Requester ID' means the ID of    
 85 the error message to the Root Port. Please ref    
 86 fields.                                           
 87                                                   
 88 AER Statistics / Counters                         
 89 -------------------------                         
 90                                                   
 91 When PCIe AER errors are captured, the counter    
 92 in the form of sysfs attributes which are docu    
 93 Documentation/ABI/testing/sysfs-bus-pci-device    
 94                                                   
 95 Developer Guide                                   
 96 ===============                                   
 97                                                   
 98 To enable error recovery, a software driver mu    
 99                                                   
100 To support AER better, developers need to unde    
101                                                   
102 PCIe errors are classified into two types: cor    
103 and uncorrectable errors. This classification     
104 of those errors, which may result in degraded     
105 failure.                                          
106                                                   
107 Correctable errors pose no impacts on the func    
108 interface. The PCIe protocol can recover witho    
109 intervention or any loss of data. These errors    
110 corrected by hardware.                            
111                                                   
112 Unlike correctable errors, uncorrectable          
113 errors impact functionality of the interface.     
114 can cause a particular transaction or a partic    
115 to be unreliable. Depending on those error con    
116 errors are further classified into non-fatal e    
117 Non-fatal errors cause the particular transact    
118 but the PCIe link itself is fully functional.     
119 the other hand, cause the link to be unreliabl    
120                                                   
121 When PCIe error reporting is enabled, a device    
122 error message to the Root Port above it when i    
123 an error. The Root Port, upon receiving an err    
124 internally processes and logs the error messag    
125 Capability structure. Error information being     
126 the error reporting agent's requestor ID into     
127 Identification Registers and setting the error    
128 Status Register accordingly. If AER error repo    
129 Error Command Register, the Root Port generate    
130 error is detected.                                
131                                                   
132 Note that the errors as described above are re    
133 hierarchy and links. These errors do not inclu    
134 errors because device specific errors will sti    
135 the device driver.                                
136                                                   
137 Provide callbacks                                 
138 -----------------                                 
139                                                   
140 callback reset_link to reset PCIe link            
141 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~            
142                                                   
143 This callback is used to reset the PCIe physic    
144 fatal error happens. The Root Port AER service    
145 default reset_link function, but different Ups    
146 have different specifications to reset the PCI    
147 Upstream Port drivers may provide their own re    
148                                                   
149 Section 3.2.2.2 provides more detailed info on    
150 reset_link.                                       
151                                                   
152 PCI error-recovery callbacks                      
153 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~                      
154                                                   
155 The PCIe AER Root driver uses error callbacks     
156 with downstream device drivers associated with    
157 when performing error recovery actions.           
158                                                   
159 Data struct pci_driver has a pointer, err_hand    
160 pci_error_handlers who consists of a couple of    
161 pointers. The AER driver follows the rules def    
162 pci-error-recovery.rst except PCIe-specific pa    
163 reset_link). Please refer to pci-error-recover    
164 definitions of the callbacks.                     
165                                                   
166 The sections below specify when to call the er    
167                                                   
168 Correctable errors                                
169 ~~~~~~~~~~~~~~~~~~                                
170                                                   
171 Correctable errors pose no impacts on the func    
172 the interface. The PCIe protocol can recover w    
173 software intervention or any loss of data. The    
174 require any recovery actions. The AER driver c    
175 correctable error status register accordingly     
176                                                   
177 Non-correctable (non-fatal and fatal) errors      
178 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~      
179                                                   
180 If an error message indicates a non-fatal erro    
181 at upstream is not required. The AER driver ca    
182 pci_channel_io_normal) to all drivers associat    
183 question. For example::                           
184                                                   
185   Endpoint <==> Downstream Port B <==> Upstrea    
186                                                   
187 If Upstream Port A captures an AER error, the     
188 Downstream Port B and Endpoint.                   
189                                                   
190 A driver may return PCI_ERS_RESULT_CAN_RECOVER    
191 PCI_ERS_RESULT_DISCONNECT, or PCI_ERS_RESULT_N    
192 whether it can recover or the AER driver calls    
193                                                   
194 If an error message indicates a fatal error, k    
195 error_detected(dev, pci_channel_io_frozen) to     
196 a hierarchy in question. Then, performing link    
197 necessary. As different kinds of devices might    
198 to reset link, AER port service driver is requ    
199 function to reset link via callback parameter     
200 function. If reset_link is not NULL, recovery     
201 to reset the link. If error_detected returns P    
202 and reset_link returns PCI_ERS_RESULT_RECOVERE    
203 to mmio_enabled.                                  
204                                                   
205 Frequent Asked Questions                          
206 ------------------------                          
207                                                   
208 Q:                                                
209   What happens if a PCIe device driver does no    
210   error recovery handler (pci_driver->err_hand    
211                                                   
212 A:                                                
213   The devices attached with the driver won't b    
214   error is fatal, kernel will print out warnin    
215   to section 3 for more information.              
216                                                   
217 Q:                                                
218   What happens if an upstream port service dri    
219   callback reset_link?                            
220                                                   
221 A:                                                
222   Fatal error recovery will fail if the errors    
223   upstream ports who are attached by the servi    
224                                                   
225                                                   
226 Software error injection                          
227 ========================                          
228                                                   
229 Debugging PCIe AER error recovery code is quit    
230 is hard to trigger real hardware errors. Softw    
231 injection can be used to fake various kinds of    
232                                                   
233 First you should enable PCIe AER software erro    
234 configuration, that is, following item should     
235                                                   
236 CONFIG_PCIEAER_INJECT=y or CONFIG_PCIEAER_INJE    
237                                                   
238 After reboot with new kernel or insert the mod    
239 /dev/aer_inject should be created.                
240                                                   
241 Then, you need a user space tool named aer-inj    
242 from:                                             
243                                                   
244     https://github.com/intel/aer-inject.git       
245                                                   
246 More information about aer-inject can be found    
247 its source code.                                  
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php