~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/trace/tracepoint-analysis.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/trace/tracepoint-analysis.rst (Version linux-6.11.5) and /Documentation/trace/tracepoint-analysis.rst (Version linux-4.9.337)


  1 ==============================================    
  2 Notes on Analysing Behaviour Using Events and     
  3 ==============================================    
  4 :Author: Mel Gorman (PCL information heavily b    
  5                                                   
  6 1. Introduction                                   
  7 ===============                                   
  8                                                   
  9 Tracepoints (see Documentation/trace/tracepoin    
 10 creating custom kernel modules to register pro    
 11 tracing infrastructure.                           
 12                                                   
 13 Simplistically, tracepoints represent importan    
 14 taken in conjunction with other tracepoints to    
 15 what is going on within the system. There are     
 16 gathering and interpreting these events. Lacki    
 17 this document describes some of the methods th    
 18                                                   
 19 This document assumes that debugfs is mounted     
 20 the appropriate tracing options have been conf    
 21 assumed that the PCL tool tools/perf has been     
 22                                                   
 23 2. Listing Available Events                       
 24 ===========================                       
 25                                                   
 26 2.1 Standard Utilities                            
 27 ----------------------                            
 28                                                   
 29 All possible events are visible from /sys/kern    
 30 calling::                                         
 31                                                   
 32   $ find /sys/kernel/tracing/events -type d       
 33                                                   
 34 will give a fair indication of the number of e    
 35                                                   
 36 2.2 PCL (Performance Counters for Linux)          
 37 ----------------------------------------          
 38                                                   
 39 Discovery and enumeration of all counters and     
 40 are available with the perf tool. Getting a li    
 41 simple case of::                                  
 42                                                   
 43   $ perf list 2>&1 | grep Tracepoint              
 44   ext4:ext4_free_inode                     [Tr    
 45   ext4:ext4_request_inode                  [Tr    
 46   ext4:ext4_allocate_inode                 [Tr    
 47   ext4:ext4_write_begin                    [Tr    
 48   ext4:ext4_ordered_write_end              [Tr    
 49   [ .... remaining output snipped .... ]          
 50                                                   
 51                                                   
 52 3. Enabling Events                                
 53 ==================                                
 54                                                   
 55 3.1 System-Wide Event Enabling                    
 56 ------------------------------                    
 57                                                   
 58 See Documentation/trace/events.rst for a prope    
 59 can be enabled system-wide. A short example of    
 60 to page allocation would look something like::    
 61                                                   
 62   $ for i in `find /sys/kernel/tracing/events     
 63                                                   
 64 3.2 System-Wide Event Enabling with SystemTap     
 65 ---------------------------------------------     
 66                                                   
 67 In SystemTap, tracepoints are accessible using    
 68 call. The following is an example that reports    
 69 were allocating the pages.                        
 70 ::                                                
 71                                                   
 72   global page_allocs                              
 73                                                   
 74   probe kernel.trace("mm_page_alloc") {           
 75         page_allocs[execname()]++                 
 76   }                                               
 77                                                   
 78   function print_count() {                        
 79         printf ("%-25s %-s\n", "#Pages Allocat    
 80         foreach (proc in page_allocs-)            
 81                 printf("%-25d %s\n", page_allo    
 82         printf ("\n")                             
 83         delete page_allocs                        
 84   }                                               
 85                                                   
 86   probe timer.s(5) {                              
 87           print_count()                           
 88   }                                               
 89                                                   
 90 3.3 System-Wide Event Enabling with PCL           
 91 ---------------------------------------           
 92                                                   
 93 By specifying the -a switch and analysing slee    
 94 for a duration of time can be examined.           
 95 ::                                                
 96                                                   
 97  $ perf stat -a \                                 
 98         -e kmem:mm_page_alloc -e kmem:mm_page_    
 99         -e kmem:mm_page_free_batched \            
100         sleep 10                                  
101  Performance counter stats for 'sleep 10':        
102                                                   
103            9630  kmem:mm_page_alloc               
104            2143  kmem:mm_page_free                
105            7424  kmem:mm_page_free_batched        
106                                                   
107    10.002577764  seconds time elapsed             
108                                                   
109 Similarly, one could execute a shell and exit     
110 at that point.                                    
111                                                   
112 3.4 Local Event Enabling                          
113 ------------------------                          
114                                                   
115 Documentation/trace/ftrace.rst describes how t    
116 basis using set_ftrace_pid.                       
117                                                   
118 3.5 Local Event Enablement with PCL               
119 -----------------------------------               
120                                                   
121 Events can be activated and tracked for the du    
122 basis using PCL such as follows.                  
123 ::                                                
124                                                   
125   $ perf stat -e kmem:mm_page_alloc -e kmem:mm    
126                  -e kmem:mm_page_free_batched     
127   Time: 0.909                                     
128                                                   
129     Performance counter stats for './hackbench    
130                                                   
131           17803  kmem:mm_page_alloc               
132           12398  kmem:mm_page_free                
133            4827  kmem:mm_page_free_batched        
134                                                   
135     0.973913387  seconds time elapsed             
136                                                   
137 4. Event Filtering                                
138 ==================                                
139                                                   
140 Documentation/trace/ftrace.rst covers in-depth    
141 ftrace.  Obviously using grep and awk of trace    
142 as any script reading trace_pipe.                 
143                                                   
144 5. Analysing Event Variances with PCL             
145 =====================================             
146                                                   
147 Any workload can exhibit variances between run    
148 to know what the standard deviation is. By and    
149 performance analyst to do it by hand. In the e    
150 occurrences are useful to the performance anal    
151 ::                                                
152                                                   
153   $ perf stat --repeat 5 -e kmem:mm_page_alloc    
154                         -e kmem:mm_page_free_b    
155   Time: 0.890                                     
156   Time: 0.895                                     
157   Time: 0.915                                     
158   Time: 1.001                                     
159   Time: 0.899                                     
160                                                   
161    Performance counter stats for './hackbench     
162                                                   
163           16630  kmem:mm_page_alloc         (     
164           11486  kmem:mm_page_free          (     
165            4730  kmem:mm_page_free_batched  (     
166                                                   
167     0.982653002  seconds time elapsed   ( +-      
168                                                   
169 In the event that some higher-level event is r    
170 aggregation of discrete events, then a script     
171                                                   
172 Using --repeat, it is also possible to view ho    
173 time on a system-wide basis using -a and sleep    
174 ::                                                
175                                                   
176   $ perf stat -e kmem:mm_page_alloc -e kmem:mm    
177                 -e kmem:mm_page_free_batched \    
178                 -a --repeat 10 \                  
179                 sleep 1                           
180   Performance counter stats for 'sleep 1' (10     
181                                                   
182            1066  kmem:mm_page_alloc         (     
183             182  kmem:mm_page_free          (     
184             890  kmem:mm_page_free_batched  (     
185                                                   
186     1.002251757  seconds time elapsed   ( +-      
187                                                   
188 6. Higher-Level Analysis with Helper Scripts      
189 ============================================      
190                                                   
191 When events are enabled the events that are tr    
192 /sys/kernel/tracing/trace_pipe in human-readab    
193 options exist as well. By post-processing the     
194 be gathered on-line as appropriate. Examples o    
195                                                   
196   - Reading information from /proc for the PID    
197   - Deriving a higher-level event from a serie    
198   - Calculating latencies between two events      
199                                                   
200 Documentation/trace/postprocess/trace-pageallo    
201 script that can read trace_pipe from STDIN or     
202 on-line, it can be interrupted once to generat    
203 and twice to exit.                                
204                                                   
205 Simplistically, the script just reads STDIN an    
206 also can do more such as                          
207                                                   
208   - Derive high-level events from many low-lev    
209     are freed to the main allocator from the p    
210     that as one per-CPU drain even though ther    
211     for that event                                
212   - It can aggregate based on PID or individua    
213   - In the event memory is getting externally     
214     on whether the fragmentation event was sev    
215   - When receiving an event about a PID, it ca    
216     that if large numbers of events are coming    
217     processes, the parent process responsible     
218     can be identified                             
219                                                   
220 7. Lower-Level Analysis with PCL                  
221 ================================                  
222                                                   
223 There may also be a requirement to identify wh    
224 were generating events within the kernel. To b    
225 data must be recorded. At the time of writing,    
226 ::                                                
227                                                   
228   $ perf record -c 1 \                            
229         -e kmem:mm_page_alloc -e kmem:mm_page_    
230         -e kmem:mm_page_free_batched \            
231         ./hackbench 10                            
232   Time: 0.894                                     
233   [ perf record: Captured and wrote 0.733 MB p    
234                                                   
235 Note the use of '-c 1' to set the event period    
236 period is quite high to minimise overhead but     
237 very coarse as a result.                          
238                                                   
239 This record outputted a file called perf.data     
240 perf report.                                      
241 ::                                                
242                                                   
243   $ perf report                                   
244   # Samples: 30922                                
245   #                                               
246   # Overhead    Command                     Sh    
247   # ........  .........  .....................    
248   #                                               
249       87.27%  hackbench  [vdso]                   
250        6.85%  hackbench  /lib/i686/cmov/libc-2    
251        2.62%  hackbench  /lib/ld-2.9.so           
252        1.52%       perf  [vdso]                   
253        1.22%  hackbench  ./hackbench              
254        0.48%  hackbench  [kernel]                 
255        0.02%       perf  /lib/i686/cmov/libc-2    
256        0.01%       perf  /usr/bin/perf            
257        0.01%       perf  /lib/ld-2.9.so           
258        0.00%  hackbench  /lib/i686/cmov/libpth    
259   #                                               
260   # (For more details, try: perf report --sort    
261   #                                               
262                                                   
263 According to this, the vast majority of events    
264 within the VDSO. With simple binaries, this wi    
265 take a slightly different example. In the cour    
266 noticed that X was generating an insane amount    
267 at it:                                            
268 ::                                                
269                                                   
270   $ perf record -c 1 -f \                         
271                 -e kmem:mm_page_alloc -e kmem:    
272                 -e kmem:mm_page_free_batched \    
273                 -p `pidof X`                      
274                                                   
275 This was interrupted after a few seconds and      
276 ::                                                
277                                                   
278   $ perf report                                   
279   # Samples: 27666                                
280   #                                               
281   # Overhead  Command                             
282   # ........  .......  .......................    
283   #                                               
284       51.95%     Xorg  [vdso]                     
285       47.95%     Xorg  /opt/gfx-test/lib/libpi    
286        0.09%     Xorg  /lib/i686/cmov/libc-2.9    
287        0.01%     Xorg  [kernel]                   
288   #                                               
289   # (For more details, try: perf report --sort    
290   #                                               
291                                                   
292 So, almost half of the events are occurring in    
293 symbol:                                           
294 ::                                                
295                                                   
296   $ perf report --sort comm,dso,symbol            
297   # Samples: 27666                                
298   #                                               
299   # Overhead  Command                             
300   # ........  .......  .......................    
301   #                                               
302       51.95%     Xorg  [vdso]                     
303       47.93%     Xorg  /opt/gfx-test/lib/libpi    
304        0.09%     Xorg  /lib/i686/cmov/libc-2.9    
305        0.01%     Xorg  /opt/gfx-test/lib/libpi    
306        0.01%     Xorg  [kernel]                   
307        0.01%     Xorg  /opt/gfx-test/lib/libpi    
308        0.00%     Xorg  [kernel]                   
309                                                   
310 To see where within the function pixmanFillsse    
311 ::                                                
312                                                   
313   $ perf annotate pixmanFillsse2                  
314   [ ... ]                                         
315     0.00 :         34eeb:       0f 18 08          
316          :      }                                 
317          :                                        
318          :      extern __inline void __attribu    
319          :      _mm_store_si128 (__m128i *__P,    
320          :        *__P = __B;                     
321    12.40 :         34eee:       66 0f 7f 80 40    
322     0.00 :         34ef5:       ff                
323    12.40 :         34ef6:       66 0f 7f 80 50    
324     0.00 :         34efd:       ff                
325    12.39 :         34efe:       66 0f 7f 80 60    
326     0.00 :         34f05:       ff                
327    12.67 :         34f06:       66 0f 7f 80 70    
328     0.00 :         34f0d:       ff                
329    12.58 :         34f0e:       66 0f 7f 40 80    
330    12.31 :         34f13:       66 0f 7f 40 90    
331    12.40 :         34f18:       66 0f 7f 40 a0    
332    12.31 :         34f1d:       66 0f 7f 40 b0    
333                                                   
334 At a glance, it looks like the time is being s    
335 the card.  Further investigation would be need    
336 are being copied around so much but a starting    
337 ancient build of libpixmap out of the library     
338 forgotten about from months ago!                  
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php