1 Intel hybrid support 1 Intel hybrid support 2 -------------------- 2 -------------------- 3 Support for Intel hybrid events within perf to 3 Support for Intel hybrid events within perf tools. 4 4 5 For some Intel platforms, such as AlderLake, w 5 For some Intel platforms, such as AlderLake, which is hybrid platform and 6 it consists of atom cpu and core cpu. Each cpu 6 it consists of atom cpu and core cpu. Each cpu has dedicated event list. 7 Part of events are available on core cpu, part 7 Part of events are available on core cpu, part of events are available 8 on atom cpu and even part of events are availa 8 on atom cpu and even part of events are available on both. 9 9 10 Kernel exports two new cpu pmus via sysfs: 10 Kernel exports two new cpu pmus via sysfs: 11 /sys/devices/cpu_core 11 /sys/devices/cpu_core 12 /sys/devices/cpu_atom 12 /sys/devices/cpu_atom 13 13 14 The 'cpus' files are created under the directo 14 The 'cpus' files are created under the directories. For example, 15 15 16 cat /sys/devices/cpu_core/cpus 16 cat /sys/devices/cpu_core/cpus 17 0-15 17 0-15 18 18 19 cat /sys/devices/cpu_atom/cpus 19 cat /sys/devices/cpu_atom/cpus 20 16-23 20 16-23 21 21 22 It indicates cpu0-cpu15 are core cpus and cpu1 22 It indicates cpu0-cpu15 are core cpus and cpu16-cpu23 are atom cpus. 23 23 >> 24 Quickstart >> 25 >> 26 List hybrid event >> 27 ----------------- >> 28 24 As before, use perf-list to list the symbolic 29 As before, use perf-list to list the symbolic event. 25 30 26 perf list 31 perf list 27 32 28 inst_retired.any 33 inst_retired.any 29 [Fixed Counter: Counts the number of i 34 [Fixed Counter: Counts the number of instructions retired. Unit: cpu_atom] 30 inst_retired.any 35 inst_retired.any 31 [Number of instructions retired. Fixed 36 [Number of instructions retired. Fixed Counter - architectural event. Unit: cpu_core] 32 37 33 The 'Unit: xxx' is added to brief description 38 The 'Unit: xxx' is added to brief description to indicate which pmu 34 the event is belong to. Same event name but wi 39 the event is belong to. Same event name but with different pmu can 35 be supported. 40 be supported. 36 41 37 Enable hybrid event with a specific pmu 42 Enable hybrid event with a specific pmu >> 43 --------------------------------------- 38 44 39 To enable a core only event or atom only event 45 To enable a core only event or atom only event, following syntax is supported: 40 46 41 cpu_core/<event name>/ 47 cpu_core/<event name>/ 42 or 48 or 43 cpu_atom/<event name>/ 49 cpu_atom/<event name>/ 44 50 45 For example, count the 'cycles' event on core 51 For example, count the 'cycles' event on core cpus. 46 52 47 perf stat -e cpu_core/cycles/ 53 perf stat -e cpu_core/cycles/ 48 54 49 Create two events for one hardware event autom 55 Create two events for one hardware event automatically >> 56 ------------------------------------------------------ 50 57 51 When creating one event and the event is avail 58 When creating one event and the event is available on both atom and core, 52 two events are created automatically. One is f 59 two events are created automatically. One is for atom, the other is for 53 core. Most of hardware events and cache events 60 core. Most of hardware events and cache events are available on both 54 cpu_core and cpu_atom. 61 cpu_core and cpu_atom. 55 62 56 For hardware events, they have pre-defined con 63 For hardware events, they have pre-defined configs (e.g. 0 for cycles). 57 But on hybrid platform, kernel needs to know w 64 But on hybrid platform, kernel needs to know where the event comes from 58 (from atom or from core). The original perf ev 65 (from atom or from core). The original perf event type PERF_TYPE_HARDWARE 59 can't carry pmu information. So now this type 66 can't carry pmu information. So now this type is extended to be PMU aware 60 type. The PMU type ID is stored at attr.config 67 type. The PMU type ID is stored at attr.config[63:32]. 61 68 62 PMU type ID is retrieved from sysfs. 69 PMU type ID is retrieved from sysfs. 63 /sys/devices/cpu_atom/type 70 /sys/devices/cpu_atom/type 64 /sys/devices/cpu_core/type 71 /sys/devices/cpu_core/type 65 72 66 The new attr.config layout for PERF_TYPE_HARDW 73 The new attr.config layout for PERF_TYPE_HARDWARE: 67 74 68 PERF_TYPE_HARDWARE: 0xEEEEEEEE 75 PERF_TYPE_HARDWARE: 0xEEEEEEEE000000AA 69 AA: hardwa 76 AA: hardware event ID 70 EEEEEEEE: 77 EEEEEEEE: PMU type ID 71 78 72 Cache event is similar. The type PERF_TYPE_HW_ 79 Cache event is similar. The type PERF_TYPE_HW_CACHE is extended to be 73 PMU aware type. The PMU type ID is stored at a 80 PMU aware type. The PMU type ID is stored at attr.config[63:32]. 74 81 75 The new attr.config layout for PERF_TYPE_HW_CA 82 The new attr.config layout for PERF_TYPE_HW_CACHE: 76 83 77 PERF_TYPE_HW_CACHE: 0xEEEEEEEE 84 PERF_TYPE_HW_CACHE: 0xEEEEEEEE00DDCCBB 78 BB: hardwa 85 BB: hardware cache ID 79 CC: hardwa 86 CC: hardware cache op ID 80 DD: hardwa 87 DD: hardware cache op result ID 81 EEEEEEEE: 88 EEEEEEEE: PMU type ID 82 89 83 When enabling a hardware event without specifi 90 When enabling a hardware event without specified pmu, such as, 84 perf stat -e cycles -a (use system-wide in thi 91 perf stat -e cycles -a (use system-wide in this example), two events 85 are created automatically. 92 are created automatically. 86 93 87 -------------------------------------------- 94 ------------------------------------------------------------ 88 perf_event_attr: 95 perf_event_attr: 89 size 120 96 size 120 90 config 0x4000000 97 config 0x400000000 91 sample_type IDENTIFIE 98 sample_type IDENTIFIER 92 read_format TOTAL_TIM 99 read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING 93 disabled 1 100 disabled 1 94 inherit 1 101 inherit 1 95 exclude_guest 1 102 exclude_guest 1 96 -------------------------------------------- 103 ------------------------------------------------------------ 97 104 98 and 105 and 99 106 100 -------------------------------------------- 107 ------------------------------------------------------------ 101 perf_event_attr: 108 perf_event_attr: 102 size 120 109 size 120 103 config 0x8000000 110 config 0x800000000 104 sample_type IDENTIFIE 111 sample_type IDENTIFIER 105 read_format TOTAL_TIM 112 read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING 106 disabled 1 113 disabled 1 107 inherit 1 114 inherit 1 108 exclude_guest 1 115 exclude_guest 1 109 -------------------------------------------- 116 ------------------------------------------------------------ 110 117 111 type 0 is PERF_TYPE_HARDWARE. 118 type 0 is PERF_TYPE_HARDWARE. 112 0x4 in 0x400000000 indicates it's cpu_core pmu 119 0x4 in 0x400000000 indicates it's cpu_core pmu. 113 0x8 in 0x800000000 indicates it's cpu_atom pmu 120 0x8 in 0x800000000 indicates it's cpu_atom pmu (atom pmu type id is random). 114 121 115 The kernel creates 'cycles' (0x400000000) on c 122 The kernel creates 'cycles' (0x400000000) on cpu0-cpu15 (core cpus), 116 and create 'cycles' (0x800000000) on cpu16-cpu 123 and create 'cycles' (0x800000000) on cpu16-cpu23 (atom cpus). 117 124 118 For perf-stat result, it displays two events: 125 For perf-stat result, it displays two events: 119 126 120 Performance counter stats for 'system wide': 127 Performance counter stats for 'system wide': 121 128 122 6,744,979 cpu_core/cycles/ 129 6,744,979 cpu_core/cycles/ 123 1,965,552 cpu_atom/cycles/ 130 1,965,552 cpu_atom/cycles/ 124 131 125 The first 'cycles' is core event, the second ' 132 The first 'cycles' is core event, the second 'cycles' is atom event. 126 133 127 Thread mode example: 134 Thread mode example: >> 135 -------------------- 128 136 129 perf-stat reports the scaled counts for hybrid 137 perf-stat reports the scaled counts for hybrid event and with a percentage 130 displayed. The percentage is the event's runni 138 displayed. The percentage is the event's running time/enabling time. 131 139 132 One example, 'triad_loop' runs on cpu16 (atom 140 One example, 'triad_loop' runs on cpu16 (atom core), while we can see the 133 scaled value for core cycles is 160,444,092 an 141 scaled value for core cycles is 160,444,092 and the percentage is 0.47%. 134 142 135 perf stat -e cycles \-- taskset -c 16 ./triad_ 143 perf stat -e cycles \-- taskset -c 16 ./triad_loop 136 144 137 As previous, two events are created. 145 As previous, two events are created. 138 146 139 ---------------------------------------------- 147 ------------------------------------------------------------ 140 perf_event_attr: 148 perf_event_attr: 141 size 120 149 size 120 142 config 0x400000000 150 config 0x400000000 143 sample_type IDENTIFIER 151 sample_type IDENTIFIER 144 read_format TOTAL_TIME_ 152 read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING 145 disabled 1 153 disabled 1 146 inherit 1 154 inherit 1 147 enable_on_exec 1 155 enable_on_exec 1 148 exclude_guest 1 156 exclude_guest 1 149 ---------------------------------------------- 157 ------------------------------------------------------------ 150 158 151 and 159 and 152 160 153 ---------------------------------------------- 161 ------------------------------------------------------------ 154 perf_event_attr: 162 perf_event_attr: 155 size 120 163 size 120 156 config 0x800000000 164 config 0x800000000 157 sample_type IDENTIFIER 165 sample_type IDENTIFIER 158 read_format TOTAL_TIME_ 166 read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING 159 disabled 1 167 disabled 1 160 inherit 1 168 inherit 1 161 enable_on_exec 1 169 enable_on_exec 1 162 exclude_guest 1 170 exclude_guest 1 163 ---------------------------------------------- 171 ------------------------------------------------------------ 164 172 165 Performance counter stats for 'taskset -c 16 173 Performance counter stats for 'taskset -c 16 ./triad_loop': 166 174 167 233,066,666 cpu_core/cycles/ 175 233,066,666 cpu_core/cycles/ (0.43%) 168 604,097,080 cpu_atom/cycles/ 176 604,097,080 cpu_atom/cycles/ (99.57%) 169 177 170 perf-record: 178 perf-record: >> 179 ------------ 171 180 172 If there is no '-e' specified in perf record, 181 If there is no '-e' specified in perf record, on hybrid platform, 173 it creates two default 'cycles' and adds them 182 it creates two default 'cycles' and adds them to event list. One 174 is for core, the other is for atom. 183 is for core, the other is for atom. 175 184 176 perf-stat: 185 perf-stat: >> 186 ---------- 177 187 178 If there is no '-e' specified in perf stat, on 188 If there is no '-e' specified in perf stat, on hybrid platform, 179 besides of software events, following events a 189 besides of software events, following events are created and 180 added to event list in order. 190 added to event list in order. 181 191 182 cpu_core/cycles/, 192 cpu_core/cycles/, 183 cpu_atom/cycles/, 193 cpu_atom/cycles/, 184 cpu_core/instructions/, 194 cpu_core/instructions/, 185 cpu_atom/instructions/, 195 cpu_atom/instructions/, 186 cpu_core/branches/, 196 cpu_core/branches/, 187 cpu_atom/branches/, 197 cpu_atom/branches/, 188 cpu_core/branch-misses/, 198 cpu_core/branch-misses/, 189 cpu_atom/branch-misses/ 199 cpu_atom/branch-misses/ 190 200 191 Of course, both perf-stat and perf-record supp 201 Of course, both perf-stat and perf-record support to enable 192 hybrid event with a specific pmu. 202 hybrid event with a specific pmu. 193 203 194 e.g. 204 e.g. 195 perf stat -e cpu_core/cycles/ 205 perf stat -e cpu_core/cycles/ 196 perf stat -e cpu_atom/cycles/ 206 perf stat -e cpu_atom/cycles/ 197 perf stat -e cpu_core/r1a/ 207 perf stat -e cpu_core/r1a/ 198 perf stat -e cpu_atom/L1-icache-loads/ 208 perf stat -e cpu_atom/L1-icache-loads/ 199 perf stat -e cpu_core/cycles/,cpu_atom/instruc 209 perf stat -e cpu_core/cycles/,cpu_atom/instructions/ 200 perf stat -e '{cpu_core/cycles/,cpu_core/instr 210 perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}' 201 211 202 But '{cpu_core/cycles/,cpu_atom/instructions/} 212 But '{cpu_core/cycles/,cpu_atom/instructions/}' will return 203 warning and disable grouping, because the pmus 213 warning and disable grouping, because the pmus in group are 204 not matched (cpu_core vs. cpu_atom). 214 not matched (cpu_core vs. cpu_atom).
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.