~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/tools/perf/Documentation/perf-arm-spe.txt

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /tools/perf/Documentation/perf-arm-spe.txt (Version linux-6.12-rc7) and /tools/perf/Documentation/perf-arm-spe.txt (Version linux-6.4.16)


  1 perf-arm-spe(1)                                     1 perf-arm-spe(1)
  2 ================                                    2 ================
  3                                                     3 
  4 NAME                                                4 NAME
  5 ----                                                5 ----
  6 perf-arm-spe - Support for Arm Statistical Pro      6 perf-arm-spe - Support for Arm Statistical Profiling Extension within Perf tools
  7                                                     7 
  8 SYNOPSIS                                            8 SYNOPSIS
  9 --------                                            9 --------
 10 [verse]                                            10 [verse]
 11 'perf record' -e arm_spe//                         11 'perf record' -e arm_spe//
 12                                                    12 
 13 DESCRIPTION                                        13 DESCRIPTION
 14 -----------                                        14 -----------
 15                                                    15 
 16 The SPE (Statistical Profiling Extension) feat     16 The SPE (Statistical Profiling Extension) feature provides accurate attribution of latencies and
 17  events down to individual instructions. Rathe     17  events down to individual instructions. Rather than being interrupt-driven, it picks an
 18 instruction to sample and then captures data f     18 instruction to sample and then captures data for it during execution. Data includes execution time
 19 in cycles. For loads and stores it also includ     19 in cycles. For loads and stores it also includes data address, cache miss events, and data origin.
 20                                                    20 
 21 The sampling has 5 stages:                         21 The sampling has 5 stages:
 22                                                    22 
 23   1. Choose an operation                           23   1. Choose an operation
 24   2. Collect data about the operation              24   2. Collect data about the operation
 25   3. Optionally discard the record based on a      25   3. Optionally discard the record based on a filter
 26   4. Write the record to memory                    26   4. Write the record to memory
 27   5. Interrupt when the buffer is full             27   5. Interrupt when the buffer is full
 28                                                    28 
 29 Choose an operation                                29 Choose an operation
 30 ~~~~~~~~~~~~~~~~~~~                                30 ~~~~~~~~~~~~~~~~~~~
 31                                                    31 
 32 This is chosen from a sample population, for S     32 This is chosen from a sample population, for SPE this is an IMPLEMENTATION DEFINED choice of all
 33 architectural instructions or all micro-ops. S     33 architectural instructions or all micro-ops. Sampling happens at a programmable interval. The
 34 architecture provides a mechanism for the SPE      34 architecture provides a mechanism for the SPE driver to infer the minimum interval at which it should
 35 sample. This minimum interval is used by the d     35 sample. This minimum interval is used by the driver if no interval is specified. A pseudo-random
 36 perturbation is also added to the sampling int     36 perturbation is also added to the sampling interval by default.
 37                                                    37 
 38 Collect data about the operation                   38 Collect data about the operation
 39 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                   39 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 40                                                    40 
 41 Program counter, PMU events, timings and data      41 Program counter, PMU events, timings and data addresses related to the operation are recorded.
 42 Sampling ensures there is only one sampled ope     42 Sampling ensures there is only one sampled operation is in flight.
 43                                                    43 
 44 Optionally discard the record based on a filte     44 Optionally discard the record based on a filter
 45 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~     45 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 46                                                    46 
 47 Based on programmable criteria, choose whether     47 Based on programmable criteria, choose whether to keep the record or discard it. If the record is
 48 discarded then the flow stops here for this sa     48 discarded then the flow stops here for this sample.
 49                                                    49 
 50 Write the record to memory                         50 Write the record to memory
 51 ~~~~~~~~~~~~~~~~~~~~~~~~~~                         51 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 52                                                    52 
 53 The record is appended to a memory buffer          53 The record is appended to a memory buffer
 54                                                    54 
 55 Interrupt when the buffer is full                  55 Interrupt when the buffer is full
 56 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                  56 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 57                                                    57 
 58 When the buffer fills, an interrupt is sent an     58 When the buffer fills, an interrupt is sent and the driver signals Perf to collect the records.
 59 Perf saves the raw data in the perf.data file.     59 Perf saves the raw data in the perf.data file.
 60                                                    60 
 61 Opening the file                                   61 Opening the file
 62 ----------------                                   62 ----------------
 63                                                    63 
 64 Up until this point no decoding of the SPE dat     64 Up until this point no decoding of the SPE data was done by either the kernel or Perf. Only when the
 65 recorded file is opened with 'perf report' or      65 recorded file is opened with 'perf report' or 'perf script' does the decoding happen. When decoding
 66 the data, Perf generates "synthetic samples" a     66 the data, Perf generates "synthetic samples" as if these were generated at the time of the
 67 recording. These samples are the same as if no     67 recording. These samples are the same as if normal sampling was done by Perf without using SPE,
 68 although they may have more attributes associa     68 although they may have more attributes associated with them. For example a normal sample may have
 69 just the instruction pointer, but an SPE sampl     69 just the instruction pointer, but an SPE sample can have data addresses and latency attributes.
 70                                                    70 
 71 Why Sampling?                                      71 Why Sampling?
 72 -------------                                      72 -------------
 73                                                    73 
 74  - Sampling, rather than tracing, cuts down th     74  - Sampling, rather than tracing, cuts down the profiling problem to something more manageable for
 75  hardware. Only one sampled operation is in fl     75  hardware. Only one sampled operation is in flight at a time.
 76                                                    76 
 77  - Allows precise attribution data, including:     77  - Allows precise attribution data, including: Full PC of instruction, data virtual and physical
 78  addresses.                                        78  addresses.
 79                                                    79 
 80  - Allows correlation between an instruction a     80  - Allows correlation between an instruction and events, such as TLB and cache miss. (Data source
 81  indicates which particular cache was hit, but     81  indicates which particular cache was hit, but the meaning is implementation defined because
 82  different implementations can have different      82  different implementations can have different cache configurations.)
 83                                                    83 
 84 However, SPE does not provide any call-graph i     84 However, SPE does not provide any call-graph information, and relies on statistical methods.
 85                                                    85 
 86 Collisions                                         86 Collisions
 87 ----------                                         87 ----------
 88                                                    88 
 89 When an operation is sampled while a previous      89 When an operation is sampled while a previous sampled operation has not finished, a collision
 90 occurs. The new sample is dropped. Collisions      90 occurs. The new sample is dropped. Collisions affect the integrity of the data, so the sample rate
 91 should be set to avoid collisions.                 91 should be set to avoid collisions.
 92                                                    92 
 93 The 'sample_collision' PMU event can be used t     93 The 'sample_collision' PMU event can be used to determine the number of lost samples. Although this
 94 count is based on collisions _before_ filterin     94 count is based on collisions _before_ filtering occurs. Therefore this can not be used as an exact
 95 number for samples dropped that would have mad     95 number for samples dropped that would have made it through the filter, but can be a rough
 96 guide.                                             96 guide.
 97                                                    97 
 98 The effect of microarchitectural sampling          98 The effect of microarchitectural sampling
 99 -----------------------------------------          99 -----------------------------------------
100                                                   100 
101 If an implementation samples micro-operations     101 If an implementation samples micro-operations instead of instructions, the results of sampling must
102 be weighted accordingly.                          102 be weighted accordingly.
103                                                   103 
104 For example, if a given instruction A is alway    104 For example, if a given instruction A is always converted into two micro-operations, A0 and A1, it
105 becomes twice as likely to appear in the sampl    105 becomes twice as likely to appear in the sample population.
106                                                   106 
107 The coarse effect of conversions, and, if appl    107 The coarse effect of conversions, and, if applicable, sampling of speculative operations, can be
108 estimated from the 'sample_pop' and 'inst_reti    108 estimated from the 'sample_pop' and 'inst_retired' PMU events.
109                                                   109 
110 Kernel Requirements                               110 Kernel Requirements
111 -------------------                               111 -------------------
112                                                   112 
113 The ARM_SPE_PMU config must be set to build as    113 The ARM_SPE_PMU config must be set to build as either a module or statically.
114                                                   114 
115 Depending on CPU model, the kernel may need to    115 Depending on CPU model, the kernel may need to be booted with page table isolation disabled
116 (kpti=off). If KPTI needs to be disabled, this    116 (kpti=off). If KPTI needs to be disabled, this will fail with a console message "profiling buffer
117 inaccessible. Try passing 'kpti=off' on the ke    117 inaccessible. Try passing 'kpti=off' on the kernel command line".
118                                                   118 
119 For the full criteria that determine whether K << 
120 unmap_kernel_at_el0() in the kernel sources. C << 
121 are on the CPUs in kpti_safe_list, or on Arm v << 
122                                                << 
123 The SPE interrupt must also be described by th << 
124 disabled (or isn't required to be disabled) bu << 
125 /sys/bus/event_source/devices/, then it's poss << 
126 ACPI or DT. In this case no warning will be pr << 
127                                                << 
128 Capturing SPE with perf command-line tools        119 Capturing SPE with perf command-line tools
129 ------------------------------------------        120 ------------------------------------------
130                                                   121 
131 You can record a session with SPE samples:        122 You can record a session with SPE samples:
132                                                   123 
133   perf record -e arm_spe// -- ./mybench           124   perf record -e arm_spe// -- ./mybench
134                                                   125 
135 The sample period is set from the -c option, a    126 The sample period is set from the -c option, and because the minimum interval is used by default
136 it's recommended to set this to a higher value    127 it's recommended to set this to a higher value. The value is written to PMSIRR.INTERVAL.
137                                                   128 
138 Config parameters                                 129 Config parameters
139 ~~~~~~~~~~~~~~~~~                                 130 ~~~~~~~~~~~~~~~~~
140                                                   131 
141 These are placed between the // in the event a    132 These are placed between the // in the event and comma separated. For example '-e
142 arm_spe/load_filter=1,min_latency=10/'            133 arm_spe/load_filter=1,min_latency=10/'
143                                                   134 
144   branch_filter=1     - collect branches only     135   branch_filter=1     - collect branches only (PMSFCR.B)
145   event_filter=<mask> - filter on specific eve    136   event_filter=<mask> - filter on specific events (PMSEVFR) - see bitfield description below
146   jitter=1            - use jitter to avoid re    137   jitter=1            - use jitter to avoid resonance when sampling (PMSIRR.RND)
147   load_filter=1       - collect loads only (PM    138   load_filter=1       - collect loads only (PMSFCR.LD)
148   min_latency=<n>     - collect only samples w    139   min_latency=<n>     - collect only samples with this latency or higher* (PMSLATFR)
149   pa_enable=1         - collect physical addre    140   pa_enable=1         - collect physical address (as well as VA) of loads/stores (PMSCR.PA) - requires privilege
150   pct_enable=1        - collect physical times    141   pct_enable=1        - collect physical timestamp instead of virtual timestamp (PMSCR.PCT) - requires privilege
151   store_filter=1      - collect stores only (P    142   store_filter=1      - collect stores only (PMSFCR.ST)
152   ts_enable=1         - enable timestamping wi    143   ts_enable=1         - enable timestamping with value of generic timer (PMSCR.TS)
153                                                   144 
154 +++*+++ Latency is the total latency from the     145 +++*+++ Latency is the total latency from the point at which sampling started on that instruction, rather
155 than only the execution latency.                  146 than only the execution latency.
156                                                   147 
157 Only some events can be filtered on; these inc    148 Only some events can be filtered on; these include:
158                                                   149 
159   bit 1     - instruction retired (i.e. omit s    150   bit 1     - instruction retired (i.e. omit speculative instructions)
160   bit 3     - L1D refill                          151   bit 3     - L1D refill
161   bit 5     - TLB refill                          152   bit 5     - TLB refill
162   bit 7     - mispredict                          153   bit 7     - mispredict
163   bit 11    - misaligned access                   154   bit 11    - misaligned access
164                                                   155 
165 So to sample just retired instructions:           156 So to sample just retired instructions:
166                                                   157 
167   perf record -e arm_spe/event_filter=2/ -- ./    158   perf record -e arm_spe/event_filter=2/ -- ./mybench
168                                                   159 
169 or just mispredicted branches:                    160 or just mispredicted branches:
170                                                   161 
171   perf record -e arm_spe/event_filter=0x80/ --    162   perf record -e arm_spe/event_filter=0x80/ -- ./mybench
172                                                   163 
173 Viewing the data                                  164 Viewing the data
174 ~~~~~~~~~~~~~~~~~                                 165 ~~~~~~~~~~~~~~~~~
175                                                   166 
176 By default perf report and perf script will as    167 By default perf report and perf script will assign samples to separate groups depending on the
177 attributes/events of the SPE record. Because i    168 attributes/events of the SPE record. Because instructions can have multiple events associated with
178 them, the samples in these groups are not nece    169 them, the samples in these groups are not necessarily unique. For example perf report shows these
179 groups:                                           170 groups:
180                                                   171 
181   Available samples                               172   Available samples
182   0 arm_spe//                                     173   0 arm_spe//
183   0 dummy:u                                       174   0 dummy:u
184   21 l1d-miss                                     175   21 l1d-miss
185   897 l1d-access                                  176   897 l1d-access
186   5 llc-miss                                      177   5 llc-miss
187   7 llc-access                                    178   7 llc-access
188   2 tlb-miss                                      179   2 tlb-miss
189   1K tlb-access                                   180   1K tlb-access
190   36 branch-miss                                  181   36 branch-miss
191   0 remote-access                                 182   0 remote-access
192   900 memory                                      183   900 memory
193                                                   184 
194 The arm_spe// and dummy:u events are implement    185 The arm_spe// and dummy:u events are implementation details and are expected to be empty.
195                                                   186 
196 To get a full list of unique samples that are     187 To get a full list of unique samples that are not sorted into groups, set the itrace option to
197 generate 'instruction' samples. The period opt    188 generate 'instruction' samples. The period option is also taken into account, so set it to 1
198 instruction unless you want to further downsam    189 instruction unless you want to further downsample the already sampled SPE data:
199                                                   190 
200   perf report --itrace=i1i                        191   perf report --itrace=i1i
201                                                   192 
202 Memory access details are also stored on the s    193 Memory access details are also stored on the samples and this can be viewed with:
203                                                   194 
204   perf report --mem-mode                          195   perf report --mem-mode
205                                                   196 
206 Common errors                                     197 Common errors
207 ~~~~~~~~~~~~~                                     198 ~~~~~~~~~~~~~
208                                                   199 
209  - "Cannot find PMU `arm_spe'. Missing kernel     200  - "Cannot find PMU `arm_spe'. Missing kernel support?"
210                                                   201 
211    Module not built or loaded, KPTI not disabl !! 202    Module not built or loaded, KPTI not disabled (see above), or running on a VM
212    or running on a VM. See 'Kernel Requirement << 
213                                                   203 
214  - "Arm SPE CONTEXT packets not found in the t    204  - "Arm SPE CONTEXT packets not found in the traces."
215                                                   205 
216    Root privilege is required to collect conte    206    Root privilege is required to collect context packets. But these only increase the accuracy of
217    assigning PIDs to kernel samples. For users    207    assigning PIDs to kernel samples. For userspace sampling this can be ignored.
218                                                   208 
219  - Excessively large perf.data file size          209  - Excessively large perf.data file size
220                                                   210 
221    Increase sampling interval (see above)         211    Increase sampling interval (see above)
222                                                   212 
223                                                   213 
224 SEE ALSO                                          214 SEE ALSO
225 --------                                          215 --------
226                                                   216 
227 linkperf:perf-record[1], linkperf:perf-script[    217 linkperf:perf-record[1], linkperf:perf-script[1], linkperf:perf-report[1],
228 linkperf:perf-inject[1]                           218 linkperf:perf-inject[1]
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php