1 ============================== 1 ============================== 2 Using the tracer for debugging 2 Using the tracer for debugging 3 ============================== 3 ============================== 4 4 5 Copyright 2024 Google LLC. 5 Copyright 2024 Google LLC. 6 6 7 :Author: Steven Rostedt <rostedt@goodmis.org> 7 :Author: Steven Rostedt <rostedt@goodmis.org> 8 :License: The GNU Free Documentation License, 8 :License: The GNU Free Documentation License, Version 1.2 9 (dual licensed under the GPL v2) 9 (dual licensed under the GPL v2) 10 10 11 - Written for: 6.12 11 - Written for: 6.12 12 12 13 Introduction 13 Introduction 14 ------------ 14 ------------ 15 The tracing infrastructure can be very useful 15 The tracing infrastructure can be very useful for debugging the Linux 16 kernel. This document is a place to add variou 16 kernel. This document is a place to add various methods of using the tracer 17 for debugging. 17 for debugging. 18 18 19 First, make sure that the tracefs file system 19 First, make sure that the tracefs file system is mounted:: 20 20 21 $ sudo mount -t tracefs tracefs /sys/kernel/t 21 $ sudo mount -t tracefs tracefs /sys/kernel/tracing 22 22 23 23 24 Using trace_printk() 24 Using trace_printk() 25 -------------------- 25 -------------------- 26 26 27 trace_printk() is a very lightweight utility t 27 trace_printk() is a very lightweight utility that can be used in any context 28 inside the kernel, with the exception of "noin 28 inside the kernel, with the exception of "noinstr" sections. It can be used 29 in normal, softirq, interrupt and even NMI con 29 in normal, softirq, interrupt and even NMI context. The trace data is 30 written to the tracing ring buffer in a lockle 30 written to the tracing ring buffer in a lockless way. To make it even 31 lighter weight, when possible, it will only re 31 lighter weight, when possible, it will only record the pointer to the format 32 string, and save the raw arguments into the bu 32 string, and save the raw arguments into the buffer. The format and the 33 arguments will be post processed when the ring 33 arguments will be post processed when the ring buffer is read. This way the 34 trace_printk() format conversions are not done 34 trace_printk() format conversions are not done during the hot path, where 35 the trace is being recorded. 35 the trace is being recorded. 36 36 37 trace_printk() is meant only for debugging, an 37 trace_printk() is meant only for debugging, and should never be added into 38 a subsystem of the kernel. If you need debuggi 38 a subsystem of the kernel. If you need debugging traces, add trace events 39 instead. If a trace_printk() is found in the k 39 instead. If a trace_printk() is found in the kernel, the following will 40 appear in the dmesg:: 40 appear in the dmesg:: 41 41 42 ******************************************** 42 ********************************************************** 43 ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTI 43 ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE ** 44 ** 44 ** ** 45 ** trace_printk() being used. Allocating ext 45 ** trace_printk() being used. Allocating extra memory. ** 46 ** 46 ** ** 47 ** This means that this is a DEBUG kernel an 47 ** This means that this is a DEBUG kernel and it is ** 48 ** unsafe for production use. 48 ** unsafe for production use. ** 49 ** 49 ** ** 50 ** If you see this message and you are not d 50 ** If you see this message and you are not debugging ** 51 ** the kernel, report this immediately to yo 51 ** the kernel, report this immediately to your vendor! ** 52 ** 52 ** ** 53 ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTI 53 ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE ** 54 ******************************************** 54 ********************************************************** 55 55 56 Debugging kernel crashes 56 Debugging kernel crashes 57 ------------------------ 57 ------------------------ 58 There is various methods of acquiring the stat 58 There is various methods of acquiring the state of the system when a kernel 59 crash occurs. This could be from the oops mess 59 crash occurs. This could be from the oops message in printk, or one could 60 use kexec/kdump. But these just show what happ 60 use kexec/kdump. But these just show what happened at the time of the crash. 61 It can be very useful in knowing what happened 61 It can be very useful in knowing what happened up to the point of the crash. 62 The tracing ring buffer, by default, is a circ 62 The tracing ring buffer, by default, is a circular buffer than will 63 overwrite older events with newer ones. When a 63 overwrite older events with newer ones. When a crash happens, the content of 64 the ring buffer will be all the events that le 64 the ring buffer will be all the events that lead up to the crash. 65 65 66 There are several kernel command line paramete 66 There are several kernel command line parameters that can be used to help in 67 this. The first is "ftrace_dump_on_oops". This 67 this. The first is "ftrace_dump_on_oops". This will dump the tracing ring 68 buffer when a oops occurs to the console. This 68 buffer when a oops occurs to the console. This can be useful if the console 69 is being logged somewhere. If a serial console 69 is being logged somewhere. If a serial console is used, it may be prudent to 70 make sure the ring buffer is relatively small, 70 make sure the ring buffer is relatively small, otherwise the dumping of the 71 ring buffer may take several minutes to hours 71 ring buffer may take several minutes to hours to finish. Here's an example 72 of the kernel command line:: 72 of the kernel command line:: 73 73 74 ftrace_dump_on_oops trace_buf_size=50K 74 ftrace_dump_on_oops trace_buf_size=50K 75 75 76 Note, the tracing buffer is made up of per CPU 76 Note, the tracing buffer is made up of per CPU buffers where each of these 77 buffers is broken up into sub-buffers that are 77 buffers is broken up into sub-buffers that are by default PAGE_SIZE. The 78 above trace_buf_size option above sets each of 78 above trace_buf_size option above sets each of the per CPU buffers to 50K, 79 so, on a machine with 8 CPUs, that's actually 79 so, on a machine with 8 CPUs, that's actually 400K total. 80 80 81 Persistent buffers across boots 81 Persistent buffers across boots 82 ------------------------------- 82 ------------------------------- 83 If the system memory allows it, the tracing ri 83 If the system memory allows it, the tracing ring buffer can be specified at 84 a specific location in memory. If the location 84 a specific location in memory. If the location is the same across boots and 85 the memory is not modified, the tracing buffer 85 the memory is not modified, the tracing buffer can be retrieved from the 86 following boot. There's two ways to reserve me 86 following boot. There's two ways to reserve memory for the use of the ring 87 buffer. 87 buffer. 88 88 89 The more reliable way (on x86) is to reserve m 89 The more reliable way (on x86) is to reserve memory with the "memmap" kernel 90 command line option and then use that memory f 90 command line option and then use that memory for the trace_instance. This 91 requires a bit of knowledge of the physical me 91 requires a bit of knowledge of the physical memory layout of the system. The 92 advantage of using this method, is that the me 92 advantage of using this method, is that the memory for the ring buffer will 93 always be the same:: 93 always be the same:: 94 94 95 memmap==12M$0x284500000 trace_instance=boot_ 95 memmap==12M$0x284500000 trace_instance=boot_map@0x284500000:12M 96 96 97 The memmap above reserves 12 megabytes of memo 97 The memmap above reserves 12 megabytes of memory at the physical memory 98 location 0x284500000. Then the trace_instance 98 location 0x284500000. Then the trace_instance option will create a trace 99 instance "boot_map" at that same location with 99 instance "boot_map" at that same location with the same amount of memory 100 reserved. As the ring buffer is broke up into 100 reserved. As the ring buffer is broke up into per CPU buffers, the 12 101 megabytes will be broken up evenly between tho 101 megabytes will be broken up evenly between those CPUs. If you have 8 CPUs, 102 each per CPU ring buffer will be 1.5 megabytes 102 each per CPU ring buffer will be 1.5 megabytes in size. Note, that also 103 includes meta data, so the amount of memory ac 103 includes meta data, so the amount of memory actually used by the ring buffer 104 will be slightly smaller. 104 will be slightly smaller. 105 105 106 Another more generic but less robust way to al 106 Another more generic but less robust way to allocate a ring buffer mapping 107 at boot is with the "reserve_mem" option:: 107 at boot is with the "reserve_mem" option:: 108 108 109 reserve_mem=12M:4096:trace trace_instance=bo 109 reserve_mem=12M:4096:trace trace_instance=boot_map@trace 110 110 111 The reserve_mem option above will find 12 mega 111 The reserve_mem option above will find 12 megabytes that are available at 112 boot up, and align it by 4096 bytes. It will l 112 boot up, and align it by 4096 bytes. It will label this memory as "trace" 113 that can be used by later command line options 113 that can be used by later command line options. 114 114 115 The trace_instance option creates a "boot_map" 115 The trace_instance option creates a "boot_map" instance and will use the 116 memory reserved by reserve_mem that was labele 116 memory reserved by reserve_mem that was labeled as "trace". This method is 117 more generic but may not be as reliable. Due t 117 more generic but may not be as reliable. Due to KASLR, the memory reserved 118 by reserve_mem may not be located at the same 118 by reserve_mem may not be located at the same location. If this happens, 119 then the ring buffer will not be from the prev 119 then the ring buffer will not be from the previous boot and will be reset. 120 120 121 Sometimes, by using a larger alignment, it can 121 Sometimes, by using a larger alignment, it can keep KASLR from moving things 122 around in such a way that it will move the loc 122 around in such a way that it will move the location of the reserve_mem. By 123 using a larger alignment, you may find better 123 using a larger alignment, you may find better that the buffer is more 124 consistent to where it is placed:: 124 consistent to where it is placed:: 125 125 126 reserve_mem=12M:0x2000000:trace trace_instan 126 reserve_mem=12M:0x2000000:trace trace_instance=boot_map@trace 127 127 128 On boot up, the memory reserved for the ring b 128 On boot up, the memory reserved for the ring buffer is validated. It will go 129 through a series of tests to make sure that th 129 through a series of tests to make sure that the ring buffer contains valid 130 data. If it is, it will then set it up to be a 130 data. If it is, it will then set it up to be available to read from the 131 instance. If it fails any of the tests, it wil 131 instance. If it fails any of the tests, it will clear the entire ring buffer 132 and initialize it as new. 132 and initialize it as new. 133 133 134 The layout of this mapped memory may not be co 134 The layout of this mapped memory may not be consistent from kernel to 135 kernel, so only the same kernel is guaranteed 135 kernel, so only the same kernel is guaranteed to work if the mapping is 136 preserved. Switching to a different kernel ver 136 preserved. Switching to a different kernel version may find a different 137 layout and mark the buffer as invalid. 137 layout and mark the buffer as invalid. 138 138 139 Using trace_printk() in the boot instance 139 Using trace_printk() in the boot instance 140 ----------------------------------------- 140 ----------------------------------------- 141 By default, the content of trace_printk() goes 141 By default, the content of trace_printk() goes into the top level tracing 142 instance. But this instance is never preserved 142 instance. But this instance is never preserved across boots. To have the 143 trace_printk() content, and some other interna 143 trace_printk() content, and some other internal tracing go to the preserved 144 buffer (like dump stacks), either set the inst 144 buffer (like dump stacks), either set the instance to be the trace_printk() 145 destination from the kernel command line, or s 145 destination from the kernel command line, or set it after boot up via the 146 trace_printk_dest option. 146 trace_printk_dest option. 147 147 148 After boot up:: 148 After boot up:: 149 149 150 echo 1 > /sys/kernel/tracing/instances/boot_ 150 echo 1 > /sys/kernel/tracing/instances/boot_map/options/trace_printk_dest 151 151 152 From the kernel command line:: 152 From the kernel command line:: 153 153 154 reserve_mem=12M:4096:trace trace_instance=bo 154 reserve_mem=12M:4096:trace trace_instance=boot_map^traceprintk^traceoff@trace 155 155 156 If setting it from the kernel command line, it 156 If setting it from the kernel command line, it is recommended to also 157 disable tracing with the "traceoff" flag, and 157 disable tracing with the "traceoff" flag, and enable tracing after boot up. 158 Otherwise the trace from the most recent boot 158 Otherwise the trace from the most recent boot will be mixed with the trace 159 from the previous boot, and may make it confus 159 from the previous boot, and may make it confusing to read.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.