1 perf-amd-ibs(1) 2 =============== 3 4 NAME 5 ---- 6 perf-amd-ibs - Support for AMD Instruction-Bas 7 8 SYNOPSIS 9 -------- 10 [verse] 11 'perf record' -e ibs_op// 12 'perf record' -e ibs_fetch// 13 14 DESCRIPTION 15 ----------- 16 17 Instruction-Based Sampling (IBS) provides prec 18 profiling support on AMD platforms. IBS has tw 19 Op and IBS Fetch. IBS Op sampling provides inf 20 execution (micro-op execution to be precise) w 21 hit/miss, d-TLB hit/miss, cache miss latency, 22 behavior etc. IBS Fetch sampling provides info 23 with details like i-cache hit/miss, i-TLB hit/ 24 per-smt-thread i.e. each SMT hardware thread c 25 26 Both, IBS Op and IBS Fetch, are exposed as PMU 27 using the Linux perf utility. The following fi 28 if IBS is supported by the hardware and kernel 29 30 /sys/bus/event_source/devices/ibs_op/ 31 /sys/bus/event_source/devices/ibs_fetch/ 32 33 IBS Op PMU supports two events: cycles and mic 34 one event: fetch ops. 35 36 IBS PMUs do not have user/kernel filtering cap 37 CAP_SYS_ADMIN or CAP_PERFMON privilege. 38 39 IBS VS. REGULAR CORE PMU 40 ------------------------ 41 42 IBS gives samples with precise IP, i.e. the IP 43 no skid. Whereas the IP recorded by regular co 44 (sample was generated at IP X but perf would r 45 regular core PMU might not help for profiling 46 precision. Further, IBS provides additional in 47 question. On the other hand, regular core PMU 48 plethora of events, counting mode (less interf 49 counters, event grouping support, filtering ca 50 51 Three regular core PMU events are internally f 52 precise_ip attribute is set: 53 54 -e cpu-cycles:p becomes -e ibs_op// 55 -e r076:p becomes -e ibs_op// 56 -e r0C1:p becomes -e ibs_op/cnt_ctl=1/ 57 58 EXAMPLES 59 -------- 60 61 IBS Op PMU 62 ~~~~~~~~~~ 63 64 System-wide profile, cycles event, sampling pe 65 66 # perf record -e ibs_op// -c 100000 -a 67 68 Per-cpu profile (cpu10), cycles event, samplin 69 70 # perf record -e ibs_op// -c 100000 -C 71 72 Per-cpu profile (cpu10), cycles event, samplin 73 74 # perf record -e ibs_op// -F 1000 -C 1 75 76 System-wide profile, uOps event, sampling peri 77 78 # perf record -e ibs_op/cnt_ctl=1/ -c 79 80 Same command, but also capture IBS register ra 81 82 # perf record -e ibs_op/cnt_ctl=1/ -c 83 84 System-wide profile, uOps event, sampling peri 85 86 # perf record -e ibs_op/cnt_ctl=1,l3mi 87 88 Per process(upstream v6.2 onward), uOps event, 89 90 # perf record -e ibs_op/cnt_ctl=1/ -c 91 92 Per process(upstream v6.2 onward), uOps event, 93 94 # perf record -e ibs_op/cnt_ctl=1/ -c 95 96 To analyse recorded profile in aggregate mode 97 98 # perf report 99 /* Select a line and press 'a' to dril 100 101 To go over each sample 102 103 # perf script 104 105 Raw dump of IBS registers when profiled with - 106 107 # perf report -D 108 /* Look for PERF_RECORD_SAMPLE */ 109 110 Example register raw dump: 111 112 ibs_op_ctl: 000002c30006186a MaxCn 113 Val 1 CntCtl 0=cycles CurCnt 114 IbsOpRip: ffffffff8204aea7 115 ibs_op_data: 0000010002550001 CompT 116 BrnRet 0 RipInvalid 0 BrnFuse 117 ibs_op_data2: 0000000000000013 RmtNo 118 ibs_op_data3: 0000000031960092 LdOp 119 DcL2TlbMiss 0 DcL1TlbHit2M 1 D 120 DcMiss 1 DcMisAcc 0 DcWcMemAcc 121 DcMissNoMabAlloc 0 DcLinAddrVa 122 DcL2TlbHit1G 0 L2Miss 1 SwPf 0 123 OpDcMissOpenMemReqs 12 DcMissL 124 IbsDCLinAd: ff110008a5398920 125 IbsDCPhysAd: 00000008a5398920 126 127 IBS applied in a real world usecase 128 129 ~90% regression was observed in tbench 130 which was counter intuitive. IBS profi 131 using perf helped in identifying exact 132 133 https://lore.kernel.org/r/202209210636 134 135 IBS Fetch PMU 136 ~~~~~~~~~~~~~ 137 138 Similar commands can be used with Fetch PMU as 139 140 System-wide profile, fetch ops event, sampling 141 142 # perf record -e ibs_fetch// -c 100000 143 144 System-wide profile, fetch ops event, sampling 145 146 # perf record -e ibs_fetch/rand_en=1/ 147 148 Random enable adds small degree of var 149 helps in cases like long running loops 150 instruction over and over because of f 151 152 etc. 153 154 PERF MEM AND PERF C2C 155 --------------------- 156 157 perf mem is a memory access profiler tool and 158 cacheline analyser tool. Both of them internal 159 Below is a simple example of the perf mem tool 160 161 # perf mem record -c 100000 -- make 162 # perf mem report 163 164 A normal perf mem report output will provide d 165 However, it can also be aggregated based on ou 166 167 # perf mem report -F mem,sample,snoop 168 Samples: 3M of event 'ibs_op//', Event 169 Memory access 170 N/A 171 L1 hit 172 L2 hit 173 L3 hit 174 L3 hit 175 RAM hit 176 Remote node, same socket RAM hit 177 Remote core, same node Any cache hit 178 Remote core, same node Any cache hit 179 Remote node, same socket Any cache hit 180 Remote node, same socket Any cache hit 181 Uncached hit 182 183 Please refer to their man page for more detail 184 185 SEE ALSO 186 -------- 187 188 linkperf:perf-record[1], linkperf:perf-script[ 189 linkperf:perf-mem[1], linkperf:perf-c2c[1]
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.