1 perf-arm-spe(1) 2 ================ 3 4 NAME 5 ---- 6 perf-arm-spe - Support for Arm Statistical Pro 7 8 SYNOPSIS 9 -------- 10 [verse] 11 'perf record' -e arm_spe// 12 13 DESCRIPTION 14 ----------- 15 16 The SPE (Statistical Profiling Extension) feat 17 events down to individual instructions. Rathe 18 instruction to sample and then captures data f 19 in cycles. For loads and stores it also includ 20 21 The sampling has 5 stages: 22 23 1. Choose an operation 24 2. Collect data about the operation 25 3. Optionally discard the record based on a 26 4. Write the record to memory 27 5. Interrupt when the buffer is full 28 29 Choose an operation 30 ~~~~~~~~~~~~~~~~~~~ 31 32 This is chosen from a sample population, for S 33 architectural instructions or all micro-ops. S 34 architecture provides a mechanism for the SPE 35 sample. This minimum interval is used by the d 36 perturbation is also added to the sampling int 37 38 Collect data about the operation 39 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 40 41 Program counter, PMU events, timings and data 42 Sampling ensures there is only one sampled ope 43 44 Optionally discard the record based on a filte 45 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 46 47 Based on programmable criteria, choose whether 48 discarded then the flow stops here for this sa 49 50 Write the record to memory 51 ~~~~~~~~~~~~~~~~~~~~~~~~~~ 52 53 The record is appended to a memory buffer 54 55 Interrupt when the buffer is full 56 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 57 58 When the buffer fills, an interrupt is sent an 59 Perf saves the raw data in the perf.data file. 60 61 Opening the file 62 ---------------- 63 64 Up until this point no decoding of the SPE dat 65 recorded file is opened with 'perf report' or 66 the data, Perf generates "synthetic samples" a 67 recording. These samples are the same as if no 68 although they may have more attributes associa 69 just the instruction pointer, but an SPE sampl 70 71 Why Sampling? 72 ------------- 73 74 - Sampling, rather than tracing, cuts down th 75 hardware. Only one sampled operation is in fl 76 77 - Allows precise attribution data, including: 78 addresses. 79 80 - Allows correlation between an instruction a 81 indicates which particular cache was hit, but 82 different implementations can have different 83 84 However, SPE does not provide any call-graph i 85 86 Collisions 87 ---------- 88 89 When an operation is sampled while a previous 90 occurs. The new sample is dropped. Collisions 91 should be set to avoid collisions. 92 93 The 'sample_collision' PMU event can be used t 94 count is based on collisions _before_ filterin 95 number for samples dropped that would have mad 96 guide. 97 98 The effect of microarchitectural sampling 99 ----------------------------------------- 100 101 If an implementation samples micro-operations 102 be weighted accordingly. 103 104 For example, if a given instruction A is alway 105 becomes twice as likely to appear in the sampl 106 107 The coarse effect of conversions, and, if appl 108 estimated from the 'sample_pop' and 'inst_reti 109 110 Kernel Requirements 111 ------------------- 112 113 The ARM_SPE_PMU config must be set to build as 114 115 Depending on CPU model, the kernel may need to 116 (kpti=off). If KPTI needs to be disabled, this 117 inaccessible. Try passing 'kpti=off' on the ke 118 119 For the full criteria that determine whether K 120 unmap_kernel_at_el0() in the kernel sources. C 121 are on the CPUs in kpti_safe_list, or on Arm v 122 123 The SPE interrupt must also be described by th 124 disabled (or isn't required to be disabled) bu 125 /sys/bus/event_source/devices/, then it's poss 126 ACPI or DT. In this case no warning will be pr 127 128 Capturing SPE with perf command-line tools 129 ------------------------------------------ 130 131 You can record a session with SPE samples: 132 133 perf record -e arm_spe// -- ./mybench 134 135 The sample period is set from the -c option, a 136 it's recommended to set this to a higher value 137 138 Config parameters 139 ~~~~~~~~~~~~~~~~~ 140 141 These are placed between the // in the event a 142 arm_spe/load_filter=1,min_latency=10/' 143 144 branch_filter=1 - collect branches only 145 event_filter=<mask> - filter on specific eve 146 jitter=1 - use jitter to avoid re 147 load_filter=1 - collect loads only (PM 148 min_latency=<n> - collect only samples w 149 pa_enable=1 - collect physical addre 150 pct_enable=1 - collect physical times 151 store_filter=1 - collect stores only (P 152 ts_enable=1 - enable timestamping wi 153 154 +++*+++ Latency is the total latency from the 155 than only the execution latency. 156 157 Only some events can be filtered on; these inc 158 159 bit 1 - instruction retired (i.e. omit s 160 bit 3 - L1D refill 161 bit 5 - TLB refill 162 bit 7 - mispredict 163 bit 11 - misaligned access 164 165 So to sample just retired instructions: 166 167 perf record -e arm_spe/event_filter=2/ -- ./ 168 169 or just mispredicted branches: 170 171 perf record -e arm_spe/event_filter=0x80/ -- 172 173 Viewing the data 174 ~~~~~~~~~~~~~~~~~ 175 176 By default perf report and perf script will as 177 attributes/events of the SPE record. Because i 178 them, the samples in these groups are not nece 179 groups: 180 181 Available samples 182 0 arm_spe// 183 0 dummy:u 184 21 l1d-miss 185 897 l1d-access 186 5 llc-miss 187 7 llc-access 188 2 tlb-miss 189 1K tlb-access 190 36 branch-miss 191 0 remote-access 192 900 memory 193 194 The arm_spe// and dummy:u events are implement 195 196 To get a full list of unique samples that are 197 generate 'instruction' samples. The period opt 198 instruction unless you want to further downsam 199 200 perf report --itrace=i1i 201 202 Memory access details are also stored on the s 203 204 perf report --mem-mode 205 206 Common errors 207 ~~~~~~~~~~~~~ 208 209 - "Cannot find PMU `arm_spe'. Missing kernel 210 211 Module not built or loaded, KPTI not disabl 212 or running on a VM. See 'Kernel Requirement 213 214 - "Arm SPE CONTEXT packets not found in the t 215 216 Root privilege is required to collect conte 217 assigning PIDs to kernel samples. For users 218 219 - Excessively large perf.data file size 220 221 Increase sampling interval (see above) 222 223 224 SEE ALSO 225 -------- 226 227 linkperf:perf-record[1], linkperf:perf-script[ 228 linkperf:perf-inject[1]
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.