~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/admin-guide/perf/alibaba_pmu.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/admin-guide/perf/alibaba_pmu.rst (Architecture alpha) and /Documentation/admin-guide/perf/alibaba_pmu.rst (Architecture sparc64)


  1 ==============================================      1 =============================================================
  2 Alibaba's T-Head SoC Uncore Performance Monito      2 Alibaba's T-Head SoC Uncore Performance Monitoring Unit (PMU)
  3 ==============================================      3 =============================================================
  4                                                     4 
  5 The Yitian 710, custom-built by Alibaba Group'      5 The Yitian 710, custom-built by Alibaba Group's chip development business,
  6 T-Head, implements uncore PMU for performance       6 T-Head, implements uncore PMU for performance and functional debugging to
  7 facilitate system maintenance.                      7 facilitate system maintenance.
  8                                                     8 
  9 DDR Sub-System Driveway (DRW) PMU Driver            9 DDR Sub-System Driveway (DRW) PMU Driver
 10 =========================================          10 =========================================
 11                                                    11 
 12 Yitian 710 employs eight DDR5/4 channels, four     12 Yitian 710 employs eight DDR5/4 channels, four on each die. Each DDR5 channel
 13 is independent of others to service system mem     13 is independent of others to service system memory requests. And one DDR5
 14 channel is split into two independent sub-chan     14 channel is split into two independent sub-channels. The DDR Sub-System Driveway
 15 implements separate PMUs for each sub-channel      15 implements separate PMUs for each sub-channel to monitor various performance
 16 metrics.                                           16 metrics.
 17                                                    17 
 18 The Driveway PMU devices are named as ali_drw_     18 The Driveway PMU devices are named as ali_drw_<sys_base_addr> with perf.
 19 For example, ali_drw_21000 and ali_drw_21080 a     19 For example, ali_drw_21000 and ali_drw_21080 are two PMU devices for two
 20 sub-channels of the same channel in die 0. And     20 sub-channels of the same channel in die 0. And the PMU device of die 1 is
 21 prefixed with ali_drw_400XXXXX, e.g. ali_drw_4     21 prefixed with ali_drw_400XXXXX, e.g. ali_drw_40021000.
 22                                                    22 
 23 Each sub-channel has 36 PMU counters in total,     23 Each sub-channel has 36 PMU counters in total, which is classified into
 24 four groups:                                       24 four groups:
 25                                                    25 
 26 - Group 0: PMU Cycle Counter. This group has o     26 - Group 0: PMU Cycle Counter. This group has one pair of counters
 27   pmu_cycle_cnt_low and pmu_cycle_cnt_high, th     27   pmu_cycle_cnt_low and pmu_cycle_cnt_high, that is used as the cycle count
 28   based on DDRC core clock.                        28   based on DDRC core clock.
 29                                                    29 
 30 - Group 1: PMU Bandwidth Counters. This group      30 - Group 1: PMU Bandwidth Counters. This group has 8 counters that are used
 31   to count the total access number of either t     31   to count the total access number of either the eight bank groups in a
 32   selected rank, or four ranks separately in t     32   selected rank, or four ranks separately in the first 4 counters. The base
 33   transfer unit is 64B.                            33   transfer unit is 64B.
 34                                                    34 
 35 - Group 2: PMU Retry Counters. This group has      35 - Group 2: PMU Retry Counters. This group has 10 counters, that intend to
 36   count the total retry number of each type of     36   count the total retry number of each type of uncorrectable error.
 37                                                    37 
 38 - Group 3: PMU Common Counters. This group has     38 - Group 3: PMU Common Counters. This group has 16 counters, that are used
 39   to count the common events.                      39   to count the common events.
 40                                                    40 
 41 For now, the Driveway PMU driver only uses cou     41 For now, the Driveway PMU driver only uses counters in group 0 and group 3.
 42                                                    42 
 43 The DDR Controller (DDRCTL) and DDR PHY combin     43 The DDR Controller (DDRCTL) and DDR PHY combine to create a complete solution
 44 for connecting an SoC application bus to DDR m     44 for connecting an SoC application bus to DDR memory devices. The DDRCTL
 45 receives transactions Host Interface (HIF) whi     45 receives transactions Host Interface (HIF) which is custom-defined by Synopsys.
 46 These transactions are queued internally and s     46 These transactions are queued internally and scheduled for access while
 47 satisfying the SDRAM protocol timing requireme     47 satisfying the SDRAM protocol timing requirements, transaction priorities, and
 48 dependencies between the transactions. The DDR     48 dependencies between the transactions. The DDRCTL in turn issues commands on
 49 the DDR PHY Interface (DFI) to the PHY module,     49 the DDR PHY Interface (DFI) to the PHY module, which launches and captures data
 50 to and from the SDRAM. The driveway PMUs have      50 to and from the SDRAM. The driveway PMUs have hardware logic to gather
 51 statistics and performance logging signals on      51 statistics and performance logging signals on HIF, DFI, etc.
 52                                                    52 
 53 By counting the READ, WRITE and RMW commands s     53 By counting the READ, WRITE and RMW commands sent to the DDRC through the HIF
 54 interface, we could calculate the bandwidth. E     54 interface, we could calculate the bandwidth. Example usage of counting memory
 55 data bandwidth::                                   55 data bandwidth::
 56                                                    56 
 57   perf stat \                                      57   perf stat \
 58     -e ali_drw_21000/hif_wr/ \                     58     -e ali_drw_21000/hif_wr/ \
 59     -e ali_drw_21000/hif_rd/ \                     59     -e ali_drw_21000/hif_rd/ \
 60     -e ali_drw_21000/hif_rmw/ \                    60     -e ali_drw_21000/hif_rmw/ \
 61     -e ali_drw_21000/cycle/ \                      61     -e ali_drw_21000/cycle/ \
 62     -e ali_drw_21080/hif_wr/ \                     62     -e ali_drw_21080/hif_wr/ \
 63     -e ali_drw_21080/hif_rd/ \                     63     -e ali_drw_21080/hif_rd/ \
 64     -e ali_drw_21080/hif_rmw/ \                    64     -e ali_drw_21080/hif_rmw/ \
 65     -e ali_drw_21080/cycle/ \                      65     -e ali_drw_21080/cycle/ \
 66     -e ali_drw_23000/hif_wr/ \                     66     -e ali_drw_23000/hif_wr/ \
 67     -e ali_drw_23000/hif_rd/ \                     67     -e ali_drw_23000/hif_rd/ \
 68     -e ali_drw_23000/hif_rmw/ \                    68     -e ali_drw_23000/hif_rmw/ \
 69     -e ali_drw_23000/cycle/ \                      69     -e ali_drw_23000/cycle/ \
 70     -e ali_drw_23080/hif_wr/ \                     70     -e ali_drw_23080/hif_wr/ \
 71     -e ali_drw_23080/hif_rd/ \                     71     -e ali_drw_23080/hif_rd/ \
 72     -e ali_drw_23080/hif_rmw/ \                    72     -e ali_drw_23080/hif_rmw/ \
 73     -e ali_drw_23080/cycle/ \                      73     -e ali_drw_23080/cycle/ \
 74     -e ali_drw_25000/hif_wr/ \                     74     -e ali_drw_25000/hif_wr/ \
 75     -e ali_drw_25000/hif_rd/ \                     75     -e ali_drw_25000/hif_rd/ \
 76     -e ali_drw_25000/hif_rmw/ \                    76     -e ali_drw_25000/hif_rmw/ \
 77     -e ali_drw_25000/cycle/ \                      77     -e ali_drw_25000/cycle/ \
 78     -e ali_drw_25080/hif_wr/ \                     78     -e ali_drw_25080/hif_wr/ \
 79     -e ali_drw_25080/hif_rd/ \                     79     -e ali_drw_25080/hif_rd/ \
 80     -e ali_drw_25080/hif_rmw/ \                    80     -e ali_drw_25080/hif_rmw/ \
 81     -e ali_drw_25080/cycle/ \                      81     -e ali_drw_25080/cycle/ \
 82     -e ali_drw_27000/hif_wr/ \                     82     -e ali_drw_27000/hif_wr/ \
 83     -e ali_drw_27000/hif_rd/ \                     83     -e ali_drw_27000/hif_rd/ \
 84     -e ali_drw_27000/hif_rmw/ \                    84     -e ali_drw_27000/hif_rmw/ \
 85     -e ali_drw_27000/cycle/ \                      85     -e ali_drw_27000/cycle/ \
 86     -e ali_drw_27080/hif_wr/ \                     86     -e ali_drw_27080/hif_wr/ \
 87     -e ali_drw_27080/hif_rd/ \                     87     -e ali_drw_27080/hif_rd/ \
 88     -e ali_drw_27080/hif_rmw/ \                    88     -e ali_drw_27080/hif_rmw/ \
 89     -e ali_drw_27080/cycle/ -- sleep 10            89     -e ali_drw_27080/cycle/ -- sleep 10
 90                                                    90 
 91 Example usage of counting all memory read/writ     91 Example usage of counting all memory read/write bandwidth by metric::
 92                                                    92 
 93   perf stat -M ddr_read_bandwidth.all -- sleep     93   perf stat -M ddr_read_bandwidth.all -- sleep 10
 94   perf stat -M ddr_write_bandwidth.all -- slee     94   perf stat -M ddr_write_bandwidth.all -- sleep 10
 95                                                    95 
 96 The average DRAM bandwidth can be calculated a     96 The average DRAM bandwidth can be calculated as follows:
 97                                                    97 
 98 - Read Bandwidth =  perf_hif_rd * DDRC_WIDTH *     98 - Read Bandwidth =  perf_hif_rd * DDRC_WIDTH * DDRC_Freq / DDRC_Cycle
 99 - Write Bandwidth = (perf_hif_wr + perf_hif_rm     99 - Write Bandwidth = (perf_hif_wr + perf_hif_rmw) * DDRC_WIDTH * DDRC_Freq / DDRC_Cycle
100                                                   100 
101 Here, DDRC_WIDTH = 64 bytes.                      101 Here, DDRC_WIDTH = 64 bytes.
102                                                   102 
103 The current driver does not support sampling.     103 The current driver does not support sampling. So "perf record" is
104 unsupported.  Also attach to a task is unsuppo    104 unsupported.  Also attach to a task is unsupported as the events are all
105 uncore.                                           105 uncore.
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php