~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/gpu/amdgpu/debugging.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/gpu/amdgpu/debugging.rst (Version linux-6.12-rc7) and /Documentation/gpu/amdgpu/debugging.rst (Version linux-6.11.7)


  1 ===============                                     1 ===============
  2  GPU Debugging                                      2  GPU Debugging
  3 ===============                                     3 ===============
  4                                                     4 
  5 GPUVM Debugging                                     5 GPUVM Debugging
  6 ===============                                     6 ===============
  7                                                     7 
  8 To aid in debugging GPU virtual memory related      8 To aid in debugging GPU virtual memory related problems, the driver supports a
  9 number of options module parameters:                9 number of options module parameters:
 10                                                    10 
 11 `vm_fault_stop` - If non-0, halt the GPU memor     11 `vm_fault_stop` - If non-0, halt the GPU memory controller on a GPU page fault.
 12                                                    12 
 13 `vm_update_mode` - If non-0, use the CPU to up     13 `vm_update_mode` - If non-0, use the CPU to update GPU page tables rather than
 14 the GPU.                                           14 the GPU.
 15                                                    15 
 16                                                    16 
 17 Decoding a GPUVM Page Fault                        17 Decoding a GPUVM Page Fault
 18 ===========================                        18 ===========================
 19                                                    19 
 20 If you see a GPU page fault in the kernel log,     20 If you see a GPU page fault in the kernel log, you can decode it to figure
 21 out what is going wrong in your application.       21 out what is going wrong in your application.  A page fault in your kernel
 22 log may look something like this:                  22 log may look something like this:
 23                                                    23 
 24 ::                                                 24 ::
 25                                                    25 
 26  [gfxhub0] no-retry page fault (src_id:0 ring:     26  [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:3 pasid:32777, for process glxinfo pid 2424 thread glxinfo:cs0 pid 2425)
 27    in page starting at address 0x0000800102800     27    in page starting at address 0x0000800102800000 from IH client 0x1b (UTCL2)
 28  VM_L2_PROTECTION_FAULT_STATUS:0x00301030          28  VM_L2_PROTECTION_FAULT_STATUS:0x00301030
 29         Faulty UTCL2 client ID: TCP (0x8)          29         Faulty UTCL2 client ID: TCP (0x8)
 30         MORE_FAULTS: 0x0                           30         MORE_FAULTS: 0x0
 31         WALKER_ERROR: 0x0                          31         WALKER_ERROR: 0x0
 32         PERMISSION_FAULTS: 0x3                     32         PERMISSION_FAULTS: 0x3
 33         MAPPING_ERROR: 0x0                         33         MAPPING_ERROR: 0x0
 34         RW: 0x0                                    34         RW: 0x0
 35                                                    35 
 36 First you have the memory hub, gfxhub and mmhu     36 First you have the memory hub, gfxhub and mmhub.  gfxhub is the memory
 37 hub used for graphics, compute, and sdma on so     37 hub used for graphics, compute, and sdma on some chips.  mmhub is the
 38 memory hub used for multi-media and sdma on so     38 memory hub used for multi-media and sdma on some chips.
 39                                                    39 
 40 Next you have the vmid and pasid.  If the vmid     40 Next you have the vmid and pasid.  If the vmid is 0, this fault was likely
 41 caused by the kernel driver or firmware.  If t     41 caused by the kernel driver or firmware.  If the vmid is non-0, it is generally
 42 a fault in a user application.  The pasid is u     42 a fault in a user application.  The pasid is used to link a vmid to a system
 43 process id.  If the process is active when the     43 process id.  If the process is active when the fault happens, the process
 44 information will be printed.                       44 information will be printed.
 45                                                    45 
 46 The GPU virtual address that caused the fault      46 The GPU virtual address that caused the fault comes next.
 47                                                    47 
 48 The client ID indicates the GPU block that cau     48 The client ID indicates the GPU block that caused the fault.
 49 Some common client IDs:                            49 Some common client IDs:
 50                                                    50 
 51 - CB/DB: The color/depth backend of the graphi     51 - CB/DB: The color/depth backend of the graphics pipe
 52 - CPF: Command Processor Frontend                  52 - CPF: Command Processor Frontend
 53 - CPC: Command Processor Compute                   53 - CPC: Command Processor Compute
 54 - CPG: Command Processor Graphics                  54 - CPG: Command Processor Graphics
 55 - TCP/SQC/SQG: Shaders                             55 - TCP/SQC/SQG: Shaders
 56 - SDMA: SDMA engines                               56 - SDMA: SDMA engines
 57 - VCN: Video encode/decode engines                 57 - VCN: Video encode/decode engines
 58 - JPEG: JPEG engines                               58 - JPEG: JPEG engines
 59                                                    59 
 60 PERMISSION_FAULTS describe what faults were en     60 PERMISSION_FAULTS describe what faults were encountered:
 61                                                    61 
 62 - bit 0: the PTE was not valid                     62 - bit 0: the PTE was not valid
 63 - bit 1: the PTE read bit was not set              63 - bit 1: the PTE read bit was not set
 64 - bit 2: the PTE write bit was not set             64 - bit 2: the PTE write bit was not set
 65 - bit 3: the PTE execute bit was not set           65 - bit 3: the PTE execute bit was not set
 66                                                    66 
 67 Finally, RW, indicates whether the access was      67 Finally, RW, indicates whether the access was a read (0) or a write (1).
 68                                                    68 
 69 In the example above, a shader (cliend id = TC     69 In the example above, a shader (cliend id = TCP) generated a read (RW = 0x0) to
 70 an invalid page (PERMISSION_FAULTS = 0x3) at G     70 an invalid page (PERMISSION_FAULTS = 0x3) at GPU virtual address
 71 0x0000800102800000.  The user can then inspect     71 0x0000800102800000.  The user can then inspect their shader code and resource
 72 descriptor state to determine what caused the      72 descriptor state to determine what caused the GPU page fault.
 73                                                    73 
 74 UMR                                                74 UMR
 75 ===                                                75 ===
 76                                                    76 
 77 `umr <https://gitlab.freedesktop.org/tomstdeni     77 `umr <https://gitlab.freedesktop.org/tomstdenis/umr>`_ is a general purpose
 78 GPU debugging and diagnostics tool.  Please se     78 GPU debugging and diagnostics tool.  Please see the umr
 79 `documentation <https://umr.readthedocs.io/en/     79 `documentation <https://umr.readthedocs.io/en/main/>`_ for more information
 80 about its capabilities.                            80 about its capabilities.
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php