1 =============== 2 GPU Debugging 3 =============== 4 5 GPUVM Debugging 6 =============== 7 8 To aid in debugging GPU virtual memory related 9 number of options module parameters: 10 11 `vm_fault_stop` - If non-0, halt the GPU memor 12 13 `vm_update_mode` - If non-0, use the CPU to up 14 the GPU. 15 16 17 Decoding a GPUVM Page Fault 18 =========================== 19 20 If you see a GPU page fault in the kernel log, 21 out what is going wrong in your application. 22 log may look something like this: 23 24 :: 25 26 [gfxhub0] no-retry page fault (src_id:0 ring: 27 in page starting at address 0x0000800102800 28 VM_L2_PROTECTION_FAULT_STATUS:0x00301030 29 Faulty UTCL2 client ID: TCP (0x8) 30 MORE_FAULTS: 0x0 31 WALKER_ERROR: 0x0 32 PERMISSION_FAULTS: 0x3 33 MAPPING_ERROR: 0x0 34 RW: 0x0 35 36 First you have the memory hub, gfxhub and mmhu 37 hub used for graphics, compute, and sdma on so 38 memory hub used for multi-media and sdma on so 39 40 Next you have the vmid and pasid. If the vmid 41 caused by the kernel driver or firmware. If t 42 a fault in a user application. The pasid is u 43 process id. If the process is active when the 44 information will be printed. 45 46 The GPU virtual address that caused the fault 47 48 The client ID indicates the GPU block that cau 49 Some common client IDs: 50 51 - CB/DB: The color/depth backend of the graphi 52 - CPF: Command Processor Frontend 53 - CPC: Command Processor Compute 54 - CPG: Command Processor Graphics 55 - TCP/SQC/SQG: Shaders 56 - SDMA: SDMA engines 57 - VCN: Video encode/decode engines 58 - JPEG: JPEG engines 59 60 PERMISSION_FAULTS describe what faults were en 61 62 - bit 0: the PTE was not valid 63 - bit 1: the PTE read bit was not set 64 - bit 2: the PTE write bit was not set 65 - bit 3: the PTE execute bit was not set 66 67 Finally, RW, indicates whether the access was 68 69 In the example above, a shader (cliend id = TC 70 an invalid page (PERMISSION_FAULTS = 0x3) at G 71 0x0000800102800000. The user can then inspect 72 descriptor state to determine what caused the 73 74 UMR 75 === 76 77 `umr <https://gitlab.freedesktop.org/tomstdeni 78 GPU debugging and diagnostics tool. Please se 79 `documentation <https://umr.readthedocs.io/en/ 80 about its capabilities.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.