~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/tools/perf/Documentation/perf-c2c.txt

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /tools/perf/Documentation/perf-c2c.txt (Version linux-6.12-rc7) and /tools/perf/Documentation/perf-c2c.txt (Version policy-sample)


  1 perf-c2c(1)                                       
  2 ===========                                       
  3                                                   
  4 NAME                                              
  5 ----                                              
  6 perf-c2c - Shared Data C2C/HITM Analyzer.         
  7                                                   
  8 SYNOPSIS                                          
  9 --------                                          
 10 [verse]                                           
 11 'perf c2c record' [<options>] <command>           
 12 'perf c2c record' [<options>] \-- [<record com    
 13 'perf c2c report' [<options>]                     
 14                                                   
 15 DESCRIPTION                                       
 16 -----------                                       
 17 C2C stands for Cache To Cache.                    
 18                                                   
 19 The perf c2c tool provides means for Shared Da    
 20 you to track down the cacheline contentions.      
 21                                                   
 22 On Intel, the tool is based on load latency an    
 23 provided by Intel CPUs. On PowerPC, the tool u    
 24 with thresholding feature. On AMD, the tool us    
 25 limitations, perf c2c is not supported on Zen3    
 26 sample load and store operations, therefore ha    
 27 required. See linkperf:perf-arm-spe[1] for a s    
 28 statistical nature of Arm SPE sampling, not ev    
 29 sampled.                                          
 30                                                   
 31 These events provide:                             
 32   - memory address of the access                  
 33   - type of the access (load and store details    
 34   - latency (in cycles) of the load access        
 35                                                   
 36 The c2c tool provide means to record this data    
 37 for cachelines with highest contention - highe    
 38                                                   
 39 The basic workflow with this tool follows the     
 40 User uses the record command to record events     
 41 display it.                                       
 42                                                   
 43                                                   
 44 RECORD OPTIONS                                    
 45 --------------                                    
 46 -e::                                              
 47 --event=::                                        
 48         Select the PMU event. Use 'perf c2c re    
 49         to list available events.                 
 50                                                   
 51 -v::                                              
 52 --verbose::                                       
 53         Be more verbose (show counter open err    
 54                                                   
 55 -l::                                              
 56 --ldlat::                                         
 57         Configure mem-loads latency. Supported    
 58         only. Ignored on other archs.             
 59                                                   
 60 -k::                                              
 61 --all-kernel::                                    
 62         Configure all used events to run in ke    
 63                                                   
 64 -u::                                              
 65 --all-user::                                      
 66         Configure all used events to run in us    
 67                                                   
 68 REPORT OPTIONS                                    
 69 --------------                                    
 70 -k::                                              
 71 --vmlinux=<file>::                                
 72         vmlinux pathname                          
 73                                                   
 74 -v::                                              
 75 --verbose::                                       
 76         Be more verbose (show counter open err    
 77                                                   
 78 -i::                                              
 79 --input::                                         
 80         Specify the input file to process.        
 81                                                   
 82 -N::                                              
 83 --node-info::                                     
 84         Show extra node info in report (see NO    
 85                                                   
 86 -c::                                              
 87 --coalesce::                                      
 88         Specify sorting fields for single cach    
 89         Following fields are available: tid,pi    
 90         (see COALESCE)                            
 91                                                   
 92 -g::                                              
 93 --call-graph::                                    
 94         Setup callchains parameters.              
 95         Please refer to perf-report man page f    
 96                                                   
 97 --stdio::                                         
 98         Force the stdio output (see STDIO OUTP    
 99                                                   
100 --stats::                                         
101         Display only statistic tables and forc    
102                                                   
103 --full-symbols::                                  
104         Display full length of symbols.           
105                                                   
106 --no-source::                                     
107         Do not display Source:Line column.        
108                                                   
109 --show-all::                                      
110         Show all captured HITM lines, with no     
111                                                   
112 -f::                                              
113 --force::                                         
114         Don't do ownership validation.            
115                                                   
116 -d::                                              
117 --display::                                       
118         Switch to HITM type (rmt, lcl) or peer    
119         and sort on. Total HITMs (tot) as defa    
120         as default.                               
121                                                   
122 --stitch-lbr::                                    
123         Show callgraph with stitched LBRs, whi    
124         callgraph. The perf.data file must hav    
125         perf c2c record --call-graph lbr.         
126         Disabled by default. In common cases w    
127         it can recreate better call stacks tha    
128         output. But this approach is not foolp    
129         where it creates incorrect call stacks    
130         The known limitations include exceptio    
131         setjmp/longjmp will have calls/returns    
132                                                   
133 --double-cl::                                     
134         Group the detection of shared cachelin    
135         granularity. Some architectures have a    
136         feature, which causes cacheline sharin    
137         size is doubled.                          
138                                                   
139 C2C RECORD                                        
140 ----------                                        
141 The perf c2c record command setup options rela    
142 and calls standard perf record command.           
143                                                   
144 Following perf record options are configured b    
145 (check perf record man page for details)          
146                                                   
147   -W,-d,--phys-data,--sample-cpu                  
148                                                   
149 Unless specified otherwise with '-e' option, f    
150 default on Intel:                                 
151                                                   
152   cpu/mem-loads,ldlat=30/P                        
153   cpu/mem-stores/P                                
154                                                   
155 following on AMD:                                 
156                                                   
157   ibs_op//                                        
158                                                   
159 and following on PowerPC:                         
160                                                   
161   cpu/mem-loads/                                  
162   cpu/mem-stores/                                 
163                                                   
164 User can pass any 'perf record' option behind     
165 callchains and system wide monitoring):           
166                                                   
167   $ perf c2c record -- -g -a                      
168                                                   
169 Please check RECORD OPTIONS section for specif    
170                                                   
171 C2C REPORT                                        
172 ----------                                        
173 The perf c2c report command displays shared da    
174 display modes: stdio and tui (default).           
175                                                   
176 The report command workflow is following:         
177   - sort all the data based on the cacheline a    
178   - store access details for each cacheline       
179   - sort all cachelines based on user settings    
180   - display data                                  
181                                                   
182 In general perf report output consist of 2 bas    
183   1) most expensive cachelines list               
184   2) offsets details for each cacheline           
185                                                   
186 For each cacheline in the 1) list we display f    
187 (Both stdio and TUI modes follow the same fiel    
188                                                   
189   Index                                           
190   - zero based index to identify the cacheline    
191                                                   
192   Cacheline                                       
193   - cacheline address (hex number)                
194                                                   
195   Rmt/Lcl Hitm (Display with HITM types)          
196   - cacheline percentage of all Remote/Local H    
197                                                   
198   Peer Snoop (Display with peer type)             
199   - cacheline percentage of all peer accesses     
200                                                   
201   LLC Load Hitm - Total, LclHitm, RmtHitm (For    
202   - count of Total/Local/Remote load HITMs        
203                                                   
204   Load Peer - Total, Local, Remote (For displa    
205   - count of Total/Local/Remote load from peer    
206                                                   
207   Total records                                   
208   - sum of all cachelines accesses                
209                                                   
210   Total loads                                     
211   - sum of all load accesses                      
212                                                   
213   Total stores                                    
214   - sum of all store accesses                     
215                                                   
216   Store Reference - L1Hit, L1Miss, N/A            
217     L1Hit - store accesses that hit L1            
218     L1Miss - store accesses that missed L1        
219     N/A - store accesses with memory level is     
220                                                   
221   Core Load Hit - FB, L1, L2                      
222   - count of load hits in FB (Fill Buffer), L1    
223                                                   
224   LLC Load Hit - LlcHit, LclHitm                  
225   - count of LLC load accesses, includes LLC h    
226                                                   
227   RMT Load Hit - RmtHit, RmtHitm                  
228   - count of remote load accesses, includes re    
229     on Arm neoverse cores, RmtHit is used to a    
230     includes remote DRAM or any upward cache l    
231                                                   
232   Load Dram - Lcl, Rmt                            
233   - count of local and remote DRAM accesses       
234                                                   
235 For each offset in the 2) list we display foll    
236                                                   
237   HITM - Rmt, Lcl (Display with HITM types)       
238   - % of Remote/Local HITM accesses for given     
239                                                   
240   Peer Snoop - Rmt, Lcl (Display with peer typ    
241   - % of Remote/Local peer accesses for given     
242                                                   
243   Store Refs - L1 Hit, L1 Miss, N/A               
244   - % of store accesses that hit L1, missed L1    
245     level for given offset within cacheline       
246                                                   
247   Data address - Offset                           
248   - offset address                                
249                                                   
250   Pid                                             
251   - pid of the process responsible for the acc    
252                                                   
253   Tid                                             
254   - tid of the process responsible for the acc    
255                                                   
256   Code address                                    
257   - code address responsible for the accesses     
258                                                   
259   cycles - rmt hitm, lcl hitm, load (Display w    
260     - sum of cycles for given accesses - Remot    
261                                                   
262   cycles - rmt peer, lcl peer, load (Display w    
263     - sum of cycles for given accesses - Remot    
264                                                   
265   cpu cnt                                         
266     - number of cpus that participated on the     
267                                                   
268   Symbol                                          
269     - code symbol related to the 'Code address    
270                                                   
271   Shared Object                                   
272     - shared object name related to the 'Code     
273                                                   
274   Source:Line                                     
275     - source information related to the 'Code     
276                                                   
277   Node                                            
278     - nodes participating on the access (see N    
279                                                   
280 NODE INFO                                         
281 ---------                                         
282 The 'Node' field displays nodes that accesses     
283 offset. Its output comes in 3 flavors:            
284   - node IDs separated by ','                     
285   - node IDs with stats for each ID, in follow    
286       Node{cpus %hitms %stores} (Display with     
287       Node{cpus %peers %stores} (Display with     
288   - node IDs with list of affected CPUs in fol    
289       Node{cpu list}                              
290                                                   
291 User can switch between above flavors with -N     
292 use 'n' key to interactively switch in TUI mod    
293                                                   
294 COALESCE                                          
295 --------                                          
296 User can specify how to sort offsets for cache    
297                                                   
298 Following fields are available and governs the    
299 output fields set for cacheline offsets output    
300                                                   
301   tid   - coalesced by process TIDs               
302   pid   - coalesced by process PIDs               
303   iaddr - coalesced by code address, following    
304              Code address, Code symbol, Shared    
305   dso   - coalesced by shared object              
306                                                   
307 By default the coalescing is setup with 'pid,i    
308                                                   
309 STDIO OUTPUT                                      
310 ------------                                      
311 The stdio output displays data on standard out    
312                                                   
313 Following tables are displayed:                   
314   Trace Event Information                         
315   - overall statistics of memory accesses         
316                                                   
317   Global Shared Cache Line Event Information      
318   - overall statistics on shared cachelines       
319                                                   
320   Shared Data Cache Line Table                    
321   - list of most expensive cachelines             
322                                                   
323   Shared Cache Line Distribution Pareto           
324   - list of all accessed offsets for each cach    
325                                                   
326 TUI OUTPUT                                        
327 ----------                                        
328 The TUI output provides interactive interface     
329 through cachelines list and to display offset     
330                                                   
331 For details please refer to the help window by    
332                                                   
333 CREDITS                                           
334 -------                                           
335 Although Don Zickus, Dick Fowles and Joe Mario    
336 to get this implemented, we got lots of early     
337 Carvalho de Melo, Stephane Eranian, Jiri Olsa     
338                                                   
339 C2C BLOG                                          
340 --------                                          
341 Check Joe's blog on c2c tool for detailed use     
342   https://joemario.github.io/blog/2016/09/01/c    
343                                                   
344 SEE ALSO                                          
345 --------                                          
346 linkperf:perf-record[1], linkperf:perf-mem[1],    
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php