1 2 ------------------------------ 3 ****** perf by examples ****** 4 ------------------------------ 5 6 [ From an e-mail by Ingo Molnar, https://lore. 7 8 9 First, discovery/enumeration of available coun 10 'perf list': 11 12 titan:~> perf list 13 [...] 14 kmem:kmalloc [Tr 15 kmem:kmem_cache_alloc [Tr 16 kmem:kmalloc_node [Tr 17 kmem:kmem_cache_alloc_node [Tr 18 kmem:kfree [Tr 19 kmem:kmem_cache_free [Tr 20 kmem:mm_page_free [Tr 21 kmem:mm_page_free_batched [Tr 22 kmem:mm_page_alloc [Tr 23 kmem:mm_page_alloc_zone_locked [Tr 24 kmem:mm_page_pcpu_drain [Tr 25 kmem:mm_page_alloc_extfrag [Tr 26 27 Then any (or all) of the above event sources c 28 measured. For example the page alloc/free prop 29 run' are: 30 31 titan:~> perf stat -e kmem:mm_page_pcpu_drain 32 -e kmem:mm_page_free_batched -e kmem:mm_page_ 33 Time: 0.575 34 35 Performance counter stats for './hackbench 10 36 37 13857 kmem:mm_page_pcpu_drain 38 27576 kmem:mm_page_alloc 39 6025 kmem:mm_page_free_batched 40 20934 kmem:mm_page_free 41 42 0.613972165 seconds time elapsed 43 44 You can observe the statistical properties as 45 'repeat the workload N times' feature of perf 46 47 titan:~> perf stat --repeat 5 -e kmem:mm_page 48 kmem:mm_page_alloc -e kmem:mm_page_free_bat 49 kmem:mm_page_free ./hackbench 10 50 Time: 0.627 51 Time: 0.644 52 Time: 0.564 53 Time: 0.559 54 Time: 0.626 55 56 Performance counter stats for './hackbench 10 57 58 12920 kmem:mm_page_pcpu_drain ( 59 25035 kmem:mm_page_alloc ( 60 6104 kmem:mm_page_free_batched ( 61 18376 kmem:mm_page_free ( 62 63 0.643954516 seconds time elapsed ( +- 64 65 Furthermore, these tracepoints can be used to 66 well. For example the page allocations done by 67 captured the following way: 68 69 titan:~/git> perf record -e kmem:mm_page_allo 70 Counting objects: 1148, done. 71 Delta compression using up to 2 threads. 72 Compressing objects: 100% (450/450), done. 73 Writing objects: 100% (1148/1148), done. 74 Total 1148 (delta 690), reused 1148 (delta 69 75 [ perf record: Captured and wrote 0.267 MB pe 76 77 To check which functions generated page alloca 78 79 titan:~/git> perf report 80 # Samples: 10646 81 # 82 # Overhead Command Sha 83 # ........ ............... ................ 84 # 85 23.57% git-repack /lib64/libc-2.5.s 86 21.81% git /lib64/libc-2.5.s 87 14.59% git ./git 88 11.79% git-repack ./git 89 7.12% git /lib64/ld-2.5.so 90 3.16% git-repack /lib64/libpthread 91 2.09% git-repack /bin/bash 92 1.97% rm /lib64/libc-2.5.s 93 1.39% mv /lib64/ld-2.5.so 94 1.37% mv /lib64/libc-2.5.s 95 1.12% git-repack /lib64/ld-2.5.so 96 0.95% rm /lib64/ld-2.5.so 97 0.90% git-update-serv /lib64/libc-2.5.s 98 0.73% git-update-serv /lib64/ld-2.5.so 99 0.68% perf /lib64/libpthread 100 0.64% git-repack /usr/lib64/libz.s 101 102 Or to see it on a more finegrained level: 103 104 titan:~/git> perf report --sort comm,dso,symbo 105 # Samples: 10646 106 # 107 # Overhead Command Shar 108 # ........ ............... ................. 109 # 110 9.35% git-repack ./git 111 9.12% git ./git 112 7.31% git /lib64/libc-2.5.s 113 6.34% git-repack /lib64/libc-2.5.s 114 6.24% git-repack /lib64/libc-2.5.s 115 5.82% git-repack /lib64/libc-2.5.s 116 5.47% git /lib64/libc-2.5.s 117 2.99% git /lib64/libc-2.5.s 118 119 Furthermore, call-graph sampling can be done t 120 allocations - to see precisely what kind of pa 121 are: 122 123 titan:~/git> perf record -g -e kmem:mm_page_a 124 Counting objects: 1148, done. 125 Delta compression using up to 2 threads. 126 Compressing objects: 100% (450/450), done. 127 Writing objects: 100% (1148/1148), done. 128 Total 1148 (delta 690), reused 1148 (delta 69 129 [ perf record: Captured and wrote 0.963 MB pe 130 131 titan:~/git> perf report -g 132 # Samples: 10686 133 # 134 # Overhead Command Sha 135 # ........ ............... ................ 136 # 137 23.25% git-repack /lib64/libc-2.5.s 138 | 139 |--50.00%-- _int_free 140 | 141 |--37.50%-- __GI___fork 142 | make_child 143 | 144 |--12.50%-- ptmalloc_unlock_al 145 | make_child 146 | 147 --6.25%-- __GI_strcpy 148 21.61% git /lib64/libc-2.5.s 149 | 150 |--30.00%-- __GI_read 151 | | 152 | --83.33%-- git_con 153 | git_conf 154 | | 155 [...] 156 157 Or you can observe the whole system's page all 158 seconds: 159 160 titan:~/git> perf stat -a -e kmem:mm_page_pcpu 161 kmem:mm_page_alloc -e kmem:mm_page_free_batche 162 kmem:mm_page_free sleep 10 163 164 Performance counter stats for 'sleep 10': 165 166 171585 kmem:mm_page_pcpu_drain 167 322114 kmem:mm_page_alloc 168 73623 kmem:mm_page_free_batched 169 254115 kmem:mm_page_free 170 171 10.000591410 seconds time elapsed 172 173 Or observe how fluctuating the page allocation 174 analysis done over ten 1-second intervals: 175 176 titan:~/git> perf stat --repeat 10 -a -e kmem 177 kmem:mm_page_alloc -e kmem:mm_page_free_bat 178 kmem:mm_page_free sleep 1 179 180 Performance counter stats for 'sleep 1' (10 r 181 182 17254 kmem:mm_page_pcpu_drain ( 183 34394 kmem:mm_page_alloc ( 184 7509 kmem:mm_page_free_batched ( 185 25653 kmem:mm_page_free ( 186 187 1.058135029 seconds time elapsed ( +- 188 189 Or you can annotate the recorded 'git gc' run 190 and check which instructions/source-code gener 191 192 titan:~/git> perf annotate __GI___fork 193 --------------------------------------------- 194 Percent | Source code & Disassembly of 195 --------------------------------------------- 196 : 197 : 198 : Disassembly of section .plt: 199 : Disassembly of section .text: 200 : 201 : 00000031a2e95560 <__fork>: 202 [...] 203 0.00 : 31a2e95602: b8 38 00 00 0 204 0.00 : 31a2e95607: 0f 05 205 83.42 : 31a2e95609: 48 3d 00 f0 f 206 0.00 : 31a2e9560f: 0f 87 4d 01 0 207 0.00 : 31a2e95615: 85 c0 208 209 ( this shows that 83.42% of __GI___fork's page 210 the 0x38 system call it performs. ) 211 212 etc. etc. - a lot more is possible. I could li 213 other different usecases straight away - neith 214 possible via /proc/vmstat. 215 216 /proc/vmstat is not in the same league really, 217 expressive power of system analysis and perfor 218 analysis. 219 220 All that the above results needed were those n 221 in include/tracing/events/kmem.h. 222 223 Ingo 224 225
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.