1 .. SPDX-License-Identifier: GPL-2.0 2 3 .. _perf_index: 4 5 ==== 6 Perf 7 ==== 8 9 Perf Event Attributes 10 ===================== 11 12 :Author: Andrew Murray <andrew.murray@arm.com> 13 :Date: 2019-03-06 14 15 exclude_user 16 ------------ 17 18 This attribute excludes userspace. 19 20 Userspace always runs at EL0 and thus this attribute will exclude EL0. 21 22 23 exclude_kernel 24 -------------- 25 26 This attribute excludes the kernel. 27 28 The kernel runs at EL2 with VHE and EL1 without. Guest kernels always run 29 at EL1. 30 31 For the host this attribute will exclude EL1 and additionally EL2 on a VHE 32 system. 33 34 For the guest this attribute will exclude EL1. Please note that EL2 is 35 never counted within a guest. 36 37 38 exclude_hv 39 ---------- 40 41 This attribute excludes the hypervisor. 42 43 For a VHE host this attribute is ignored as we consider the host kernel to 44 be the hypervisor. 45 46 For a non-VHE host this attribute will exclude EL2 as we consider the 47 hypervisor to be any code that runs at EL2 which is predominantly used for 48 guest/host transitions. 49 50 For the guest this attribute has no effect. Please note that EL2 is 51 never counted within a guest. 52 53 54 exclude_host / exclude_guest 55 ---------------------------- 56 57 These attributes exclude the KVM host and guest, respectively. 58 59 The KVM host may run at EL0 (userspace), EL1 (non-VHE kernel) and EL2 (VHE 60 kernel or non-VHE hypervisor). 61 62 The KVM guest may run at EL0 (userspace) and EL1 (kernel). 63 64 Due to the overlapping exception levels between host and guests we cannot 65 exclusively rely on the PMU's hardware exception filtering - therefore we 66 must enable/disable counting on the entry and exit to the guest. This is 67 performed differently on VHE and non-VHE systems. 68 69 For non-VHE systems we exclude EL2 for exclude_host - upon entering and 70 exiting the guest we disable/enable the event as appropriate based on the 71 exclude_host and exclude_guest attributes. 72 73 For VHE systems we exclude EL1 for exclude_guest and exclude both EL0,EL2 74 for exclude_host. Upon entering and exiting the guest we modify the event 75 to include/exclude EL0 as appropriate based on the exclude_host and 76 exclude_guest attributes. 77 78 The statements above also apply when these attributes are used within a 79 non-VHE guest however please note that EL2 is never counted within a guest. 80 81 82 Accuracy 83 -------- 84 85 On non-VHE hosts we enable/disable counters on the entry/exit of host/guest 86 transition at EL2 - however there is a period of time between 87 enabling/disabling the counters and entering/exiting the guest. We are 88 able to eliminate counters counting host events on the boundaries of guest 89 entry/exit when counting guest events by filtering out EL2 for 90 exclude_host. However when using !exclude_hv there is a small blackout 91 window at the guest entry/exit where host events are not captured. 92 93 On VHE systems there are no blackout windows. 94 95 Perf Userspace PMU Hardware Counter Access 96 ========================================== 97 98 Overview 99 -------- 100 The perf userspace tool relies on the PMU to monitor events. It offers an 101 abstraction layer over the hardware counters since the underlying 102 implementation is cpu-dependent. 103 Arm64 allows userspace tools to have access to the registers storing the 104 hardware counters' values directly. 105 106 This targets specifically self-monitoring tasks in order to reduce the overhead 107 by directly accessing the registers without having to go through the kernel. 108 109 How-to 110 ------ 111 The focus is set on the armv8 PMUv3 which makes sure that the access to the pmu 112 registers is enabled and that the userspace has access to the relevant 113 information in order to use them. 114 115 In order to have access to the hardware counters, the global sysctl 116 kernel/perf_user_access must first be enabled: 117 118 .. code-block:: sh 119 120 echo 1 > /proc/sys/kernel/perf_user_access 121 122 It is necessary to open the event using the perf tool interface with config1:1 123 attr bit set: the sys_perf_event_open syscall returns a fd which can 124 subsequently be used with the mmap syscall in order to retrieve a page of memory 125 containing information about the event. The PMU driver uses this page to expose 126 to the user the hardware counter's index and other necessary data. Using this 127 index enables the user to access the PMU registers using the `mrs` instruction. 128 Access to the PMU registers is only valid while the sequence lock is unchanged. 129 In particular, the PMSELR_EL0 register is zeroed each time the sequence lock is 130 changed. 131 132 The userspace access is supported in libperf using the perf_evsel__mmap() 133 and perf_evsel__read() functions. See `tools/lib/perf/tests/test-evsel.c`_ for 134 an example. 135 136 About heterogeneous systems 137 --------------------------- 138 On heterogeneous systems such as big.LITTLE, userspace PMU counter access can 139 only be enabled when the tasks are pinned to a homogeneous subset of cores and 140 the corresponding PMU instance is opened by specifying the 'type' attribute. 141 The use of generic event types is not supported in this case. 142 143 Have a look at `tools/perf/arch/arm64/tests/user-events.c`_ for an example. It 144 can be run using the perf tool to check that the access to the registers works 145 correctly from userspace: 146 147 .. code-block:: sh 148 149 perf test -v user 150 151 About chained events and counter sizes 152 -------------------------------------- 153 The user can request either a 32-bit (config1:0 == 0) or 64-bit (config1:0 == 1) 154 counter along with userspace access. The sys_perf_event_open syscall will fail 155 if a 64-bit counter is requested and the hardware doesn't support 64-bit 156 counters. Chained events are not supported in conjunction with userspace counter 157 access. If a 32-bit counter is requested on hardware with 64-bit counters, then 158 userspace must treat the upper 32-bits read from the counter as UNKNOWN. The 159 'pmc_width' field in the user page will indicate the valid width of the counter 160 and should be used to mask the upper bits as needed. 161 162 .. Links 163 .. _tools/perf/arch/arm64/tests/user-events.c: 164 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c 165 .. _tools/lib/perf/tests/test-evsel.c: 166 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c 167 168 Event Counting Threshold 169 ========================================== 170 171 Overview 172 -------- 173 174 FEAT_PMUv3_TH (Armv8.8) permits a PMU counter to increment only on 175 events whose count meets a specified threshold condition. For example if 176 threshold_compare is set to 2 ('Greater than or equal'), and the 177 threshold is set to 2, then the PMU counter will now only increment by 178 when an event would have previously incremented the PMU counter by 2 or 179 more on a single processor cycle. 180 181 To increment by 1 after passing the threshold condition instead of the 182 number of events on that cycle, add the 'threshold_count' option to the 183 commandline. 184 185 How-to 186 ------ 187 188 These are the parameters for controlling the feature: 189 190 .. list-table:: 191 :header-rows: 1 192 193 * - Parameter 194 - Description 195 * - threshold 196 - Value to threshold the event by. A value of 0 means that 197 thresholding is disabled and the other parameters have no effect. 198 * - threshold_compare 199 - | Comparison function to use, with the following values supported: 200 | 201 | 0: Not-equal 202 | 1: Equals 203 | 2: Greater-than-or-equal 204 | 3: Less-than 205 * - threshold_count 206 - If this is set, count by 1 after passing the threshold condition 207 instead of the value of the event on this cycle. 208 209 The threshold, threshold_compare and threshold_count values can be 210 provided per event, for example: 211 212 .. code-block:: sh 213 214 perf stat -e stall_slot/threshold=2,threshold_compare=2/ \ 215 -e dtlb_walk/threshold=10,threshold_compare=3,threshold_count/ 216 217 In this example the stall_slot event will count by 2 or more on every 218 cycle where 2 or more stalls happen. And dtlb_walk will count by 1 on 219 every cycle where the number of dtlb walks were less than 10. 220 221 The maximum supported threshold value can be read from the caps of each 222 PMU, for example: 223 224 .. code-block:: sh 225 226 cat /sys/bus/event_source/devices/armv8_pmuv3/caps/threshold_max 227 228 0x000000ff 229 230 If a value higher than this is given, then opening the event will result 231 in an error. The highest possible maximum is 4095, as the config field 232 for threshold is limited to 12 bits, and the Perf tool will refuse to 233 parse higher values. 234 235 If the PMU doesn't support FEAT_PMUv3_TH, then threshold_max will read 236 0, and attempting to set a threshold value will also result in an error. 237 threshold_max will also read as 0 on aarch32 guests, even if the host 238 is running on hardware with the feature.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.