~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/tools/perf/Documentation/perf-amd-ibs.txt

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /tools/perf/Documentation/perf-amd-ibs.txt (Architecture i386) and /tools/perf/Documentation/perf-amd-ibs.txt (Architecture mips)


  1 perf-amd-ibs(1)                                     1 perf-amd-ibs(1)
  2 ===============                                     2 ===============
  3                                                     3 
  4 NAME                                                4 NAME
  5 ----                                                5 ----
  6 perf-amd-ibs - Support for AMD Instruction-Bas      6 perf-amd-ibs - Support for AMD Instruction-Based Sampling (IBS) with perf tool
  7                                                     7 
  8 SYNOPSIS                                            8 SYNOPSIS
  9 --------                                            9 --------
 10 [verse]                                            10 [verse]
 11 'perf record' -e ibs_op//                          11 'perf record' -e ibs_op//
 12 'perf record' -e ibs_fetch//                       12 'perf record' -e ibs_fetch//
 13                                                    13 
 14 DESCRIPTION                                        14 DESCRIPTION
 15 -----------                                        15 -----------
 16                                                    16 
 17 Instruction-Based Sampling (IBS) provides prec     17 Instruction-Based Sampling (IBS) provides precise Instruction Pointer (IP)
 18 profiling support on AMD platforms. IBS has tw     18 profiling support on AMD platforms. IBS has two independent components: IBS
 19 Op and IBS Fetch. IBS Op sampling provides inf     19 Op and IBS Fetch. IBS Op sampling provides information about instruction
 20 execution (micro-op execution to be precise) w     20 execution (micro-op execution to be precise) with details like d-cache
 21 hit/miss, d-TLB hit/miss, cache miss latency,      21 hit/miss, d-TLB hit/miss, cache miss latency, load/store data source, branch
 22 behavior etc. IBS Fetch sampling provides info     22 behavior etc. IBS Fetch sampling provides information about instruction fetch
 23 with details like i-cache hit/miss, i-TLB hit/     23 with details like i-cache hit/miss, i-TLB hit/miss, fetch latency etc. IBS is
 24 per-smt-thread i.e. each SMT hardware thread c     24 per-smt-thread i.e. each SMT hardware thread contains standalone IBS units.
 25                                                    25 
 26 Both, IBS Op and IBS Fetch, are exposed as PMU     26 Both, IBS Op and IBS Fetch, are exposed as PMUs by Linux and can be exploited
 27 using the Linux perf utility. The following fi     27 using the Linux perf utility. The following files will be created at boot time
 28 if IBS is supported by the hardware and kernel     28 if IBS is supported by the hardware and kernel.
 29                                                    29 
 30   /sys/bus/event_source/devices/ibs_op/            30   /sys/bus/event_source/devices/ibs_op/
 31   /sys/bus/event_source/devices/ibs_fetch/         31   /sys/bus/event_source/devices/ibs_fetch/
 32                                                    32 
 33 IBS Op PMU supports two events: cycles and mic     33 IBS Op PMU supports two events: cycles and micro ops. IBS Fetch PMU supports
 34 one event: fetch ops.                              34 one event: fetch ops.
 35                                                    35 
 36 IBS PMUs do not have user/kernel filtering cap     36 IBS PMUs do not have user/kernel filtering capability and thus it requires
 37 CAP_SYS_ADMIN or CAP_PERFMON privilege.            37 CAP_SYS_ADMIN or CAP_PERFMON privilege.
 38                                                    38 
 39 IBS VS. REGULAR CORE PMU                           39 IBS VS. REGULAR CORE PMU
 40 ------------------------                           40 ------------------------
 41                                                    41 
 42 IBS gives samples with precise IP, i.e. the IP     42 IBS gives samples with precise IP, i.e. the IP recorded with IBS sample has
 43 no skid. Whereas the IP recorded by regular co     43 no skid. Whereas the IP recorded by regular core PMU will have some skid
 44 (sample was generated at IP X but perf would r     44 (sample was generated at IP X but perf would record it at IP X+n). Hence,
 45 regular core PMU might not help for profiling      45 regular core PMU might not help for profiling with instruction level
 46 precision. Further, IBS provides additional in     46 precision. Further, IBS provides additional information about the sample in
 47 question. On the other hand, regular core PMU      47 question. On the other hand, regular core PMU has it's own advantages like
 48 plethora of events, counting mode (less interf     48 plethora of events, counting mode (less interference), up to 6 parallel
 49 counters, event grouping support, filtering ca     49 counters, event grouping support, filtering capabilities etc.
 50                                                    50 
 51 Three regular core PMU events are internally f     51 Three regular core PMU events are internally forwarded to IBS Op PMU when
 52 precise_ip attribute is set:                       52 precise_ip attribute is set:
 53                                                    53 
 54         -e cpu-cycles:p becomes -e ibs_op//        54         -e cpu-cycles:p becomes -e ibs_op//
 55         -e r076:p becomes -e ibs_op//              55         -e r076:p becomes -e ibs_op//
 56         -e r0C1:p becomes -e ibs_op/cnt_ctl=1/     56         -e r0C1:p becomes -e ibs_op/cnt_ctl=1/
 57                                                    57 
 58 EXAMPLES                                           58 EXAMPLES
 59 --------                                           59 --------
 60                                                    60 
 61 IBS Op PMU                                         61 IBS Op PMU
 62 ~~~~~~~~~~                                         62 ~~~~~~~~~~
 63                                                    63 
 64 System-wide profile, cycles event, sampling pe     64 System-wide profile, cycles event, sampling period: 100000
 65                                                    65 
 66         # perf record -e ibs_op// -c 100000 -a     66         # perf record -e ibs_op// -c 100000 -a
 67                                                    67 
 68 Per-cpu profile (cpu10), cycles event, samplin     68 Per-cpu profile (cpu10), cycles event, sampling period: 100000
 69                                                    69 
 70         # perf record -e ibs_op// -c 100000 -C     70         # perf record -e ibs_op// -c 100000 -C 10
 71                                                    71 
 72 Per-cpu profile (cpu10), cycles event, samplin     72 Per-cpu profile (cpu10), cycles event, sampling freq: 1000
 73                                                    73 
 74         # perf record -e ibs_op// -F 1000 -C 1     74         # perf record -e ibs_op// -F 1000 -C 10
 75                                                    75 
 76 System-wide profile, uOps event, sampling peri     76 System-wide profile, uOps event, sampling period: 100000
 77                                                    77 
 78         # perf record -e ibs_op/cnt_ctl=1/ -c      78         # perf record -e ibs_op/cnt_ctl=1/ -c 100000 -a
 79                                                    79 
 80 Same command, but also capture IBS register ra     80 Same command, but also capture IBS register raw dump along with perf sample:
 81                                                    81 
 82         # perf record -e ibs_op/cnt_ctl=1/ -c      82         # perf record -e ibs_op/cnt_ctl=1/ -c 100000 -a --raw-samples
 83                                                    83 
 84 System-wide profile, uOps event, sampling peri     84 System-wide profile, uOps event, sampling period: 100000, L3MissOnly (Zen4 onward)
 85                                                    85 
 86         # perf record -e ibs_op/cnt_ctl=1,l3mi     86         # perf record -e ibs_op/cnt_ctl=1,l3missonly=1/ -c 100000 -a
 87                                                    87 
 88 Per process(upstream v6.2 onward), uOps event,     88 Per process(upstream v6.2 onward), uOps event, sampling period: 100000
 89                                                    89 
 90         # perf record -e ibs_op/cnt_ctl=1/ -c      90         # perf record -e ibs_op/cnt_ctl=1/ -c 100000 -p 1234
 91                                                    91 
 92 Per process(upstream v6.2 onward), uOps event,     92 Per process(upstream v6.2 onward), uOps event, sampling period: 100000
 93                                                    93 
 94         # perf record -e ibs_op/cnt_ctl=1/ -c      94         # perf record -e ibs_op/cnt_ctl=1/ -c 100000 -- ls
 95                                                    95 
 96 To analyse recorded profile in aggregate mode      96 To analyse recorded profile in aggregate mode
 97                                                    97 
 98         # perf report                              98         # perf report
 99         /* Select a line and press 'a' to dril     99         /* Select a line and press 'a' to drill down at instruction level. */
100                                                   100 
101 To go over each sample                            101 To go over each sample
102                                                   102 
103         # perf script                             103         # perf script
104                                                   104 
105 Raw dump of IBS registers when profiled with -    105 Raw dump of IBS registers when profiled with --raw-samples
106                                                   106 
107         # perf report -D                          107         # perf report -D
108         /* Look for PERF_RECORD_SAMPLE */         108         /* Look for PERF_RECORD_SAMPLE */
109                                                   109 
110         Example register raw dump:                110         Example register raw dump:
111                                                   111 
112         ibs_op_ctl:     000002c30006186a MaxCn    112         ibs_op_ctl:     000002c30006186a MaxCnt    100000 L3MissOnly 0 En 1
113                 Val 1 CntCtl 0=cycles CurCnt      113                 Val 1 CntCtl 0=cycles CurCnt       707
114         IbsOpRip:       ffffffff8204aea7          114         IbsOpRip:       ffffffff8204aea7
115         ibs_op_data:    0000010002550001 CompT    115         ibs_op_data:    0000010002550001 CompToRetCtr     1 TagToRetCtr   597
116                 BrnRet 0  RipInvalid 0 BrnFuse    116                 BrnRet 0  RipInvalid 0 BrnFuse 0 Microcode 1
117         ibs_op_data2:   0000000000000013 RmtNo    117         ibs_op_data2:   0000000000000013 RmtNode 1 DataSrc 3=DRAM
118         ibs_op_data3:   0000000031960092 LdOp     118         ibs_op_data3:   0000000031960092 LdOp 0 StOp 1 DcL1TlbMiss 0
119                 DcL2TlbMiss 0 DcL1TlbHit2M 1 D    119                 DcL2TlbMiss 0 DcL1TlbHit2M 1 DcL1TlbHit1G 0 DcL2TlbHit2M 0
120                 DcMiss 1 DcMisAcc 0 DcWcMemAcc    120                 DcMiss 1 DcMisAcc 0 DcWcMemAcc 0 DcUcMemAcc 0 DcLockedOp 0
121                 DcMissNoMabAlloc 0 DcLinAddrVa    121                 DcMissNoMabAlloc 0 DcLinAddrValid 1 DcPhyAddrValid 1
122                 DcL2TlbHit1G 0 L2Miss 1 SwPf 0    122                 DcL2TlbHit1G 0 L2Miss 1 SwPf 0 OpMemWidth 32 bytes
123                 OpDcMissOpenMemReqs 12 DcMissL    123                 OpDcMissOpenMemReqs 12 DcMissLat     0 TlbRefillLat     0
124         IbsDCLinAd:     ff110008a5398920          124         IbsDCLinAd:     ff110008a5398920
125         IbsDCPhysAd:    00000008a5398920          125         IbsDCPhysAd:    00000008a5398920
126                                                   126 
127 IBS applied in a real world usecase               127 IBS applied in a real world usecase
128                                                   128 
129         ~90% regression was observed in tbench    129         ~90% regression was observed in tbench with specific scheduler hint
130         which was counter intuitive. IBS profi    130         which was counter intuitive. IBS profile of good and bad run captured
131         using perf helped in identifying exact    131         using perf helped in identifying exact cause of the problem:
132                                                   132 
133         https://lore.kernel.org/r/202209210636    133         https://lore.kernel.org/r/20220921063638.2489-1-kprateek.nayak@amd.com
134                                                   134 
135 IBS Fetch PMU                                     135 IBS Fetch PMU
136 ~~~~~~~~~~~~~                                     136 ~~~~~~~~~~~~~
137                                                   137 
138 Similar commands can be used with Fetch PMU as    138 Similar commands can be used with Fetch PMU as well.
139                                                   139 
140 System-wide profile, fetch ops event, sampling    140 System-wide profile, fetch ops event, sampling period: 100000
141                                                   141 
142         # perf record -e ibs_fetch// -c 100000    142         # perf record -e ibs_fetch// -c 100000 -a
143                                                   143 
144 System-wide profile, fetch ops event, sampling    144 System-wide profile, fetch ops event, sampling period: 100000, Random enable
145                                                   145 
146         # perf record -e ibs_fetch/rand_en=1/     146         # perf record -e ibs_fetch/rand_en=1/ -c 100000 -a
147                                                   147 
148         Random enable adds small degree of var    148         Random enable adds small degree of variability to sample period. This
149         helps in cases like long running loops    149         helps in cases like long running loops where PMU is tagging the same
150         instruction over and over because of f    150         instruction over and over because of fixed sample period.
151                                                   151 
152 etc.                                              152 etc.
153                                                   153 
154 PERF MEM AND PERF C2C                             154 PERF MEM AND PERF C2C
155 ---------------------                             155 ---------------------
156                                                   156 
157 perf mem is a memory access profiler tool and     157 perf mem is a memory access profiler tool and perf c2c is a shared data
158 cacheline analyser tool. Both of them internal    158 cacheline analyser tool. Both of them internally uses IBS Op PMU on AMD.
159 Below is a simple example of the perf mem tool    159 Below is a simple example of the perf mem tool.
160                                                   160 
161         # perf mem record -c 100000 -- make       161         # perf mem record -c 100000 -- make
162         # perf mem report                         162         # perf mem report
163                                                   163 
164 A normal perf mem report output will provide d    164 A normal perf mem report output will provide detailed memory access profile.
165 However, it can also be aggregated based on ou    165 However, it can also be aggregated based on output fields. For example:
166                                                   166 
167         # perf mem report -F mem,sample,snoop     167         # perf mem report -F mem,sample,snoop
168         Samples: 3M of event 'ibs_op//', Event    168         Samples: 3M of event 'ibs_op//', Event count (approx.): 23524876
169         Memory access                             169         Memory access                                 Samples  Snoop
170         N/A                                       170         N/A                                           1903343  N/A
171         L1 hit                                    171         L1 hit                                        1056754  N/A
172         L2 hit                                    172         L2 hit                                          75231  N/A
173         L3 hit                                    173         L3 hit                                           9496  HitM
174         L3 hit                                    174         L3 hit                                           2270  N/A
175         RAM hit                                   175         RAM hit                                          8710  N/A
176         Remote node, same socket RAM hit          176         Remote node, same socket RAM hit                 3241  N/A
177         Remote core, same node Any cache hit      177         Remote core, same node Any cache hit             1572  HitM
178         Remote core, same node Any cache hit      178         Remote core, same node Any cache hit              514  N/A
179         Remote node, same socket Any cache hit    179         Remote node, same socket Any cache hit           1216  HitM
180         Remote node, same socket Any cache hit    180         Remote node, same socket Any cache hit            350  N/A
181         Uncached hit                              181         Uncached hit                                       18  N/A
182                                                   182 
183 Please refer to their man page for more detail    183 Please refer to their man page for more detail.
184                                                   184 
185 SEE ALSO                                          185 SEE ALSO
186 --------                                          186 --------
187                                                   187 
188 linkperf:perf-record[1], linkperf:perf-script[    188 linkperf:perf-record[1], linkperf:perf-script[1], linkperf:perf-report[1],
189 linkperf:perf-mem[1], linkperf:perf-c2c[1]        189 linkperf:perf-mem[1], linkperf:perf-c2c[1]
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php