1 .. SPDX-License-Identifier: GPL-2.0 1 .. SPDX-License-Identifier: GPL-2.0 2 .. include:: <isonum.txt> 2 .. include:: <isonum.txt> 3 3 4 ============================================== 4 =============================================== 5 ``amd-pstate`` CPU Performance Scaling Driver 5 ``amd-pstate`` CPU Performance Scaling Driver 6 ============================================== 6 =============================================== 7 7 8 :Copyright: |copy| 2021 Advanced Micro Devices 8 :Copyright: |copy| 2021 Advanced Micro Devices, Inc. 9 9 10 :Author: Huang Rui <ray.huang@amd.com> 10 :Author: Huang Rui <ray.huang@amd.com> 11 11 12 12 13 Introduction 13 Introduction 14 =================== 14 =================== 15 15 16 ``amd-pstate`` is the AMD CPU performance scal 16 ``amd-pstate`` is the AMD CPU performance scaling driver that introduces a 17 new CPU frequency control mechanism on modern 17 new CPU frequency control mechanism on modern AMD APU and CPU series in 18 Linux kernel. The new mechanism is based on Co 18 Linux kernel. The new mechanism is based on Collaborative Processor 19 Performance Control (CPPC) which provides fine 19 Performance Control (CPPC) which provides finer grain frequency management 20 than legacy ACPI hardware P-States. Current AM 20 than legacy ACPI hardware P-States. Current AMD CPU/APU platforms are using 21 the ACPI P-states driver to manage CPU frequen 21 the ACPI P-states driver to manage CPU frequency and clocks with switching 22 only in 3 P-states. CPPC replaces the ACPI P-s !! 22 only in 3 P-states. CPPC replaces the ACPI P-states controls, allows a 23 flexible, low-latency interface for the Linux 23 flexible, low-latency interface for the Linux kernel to directly 24 communicate the performance hints to hardware. 24 communicate the performance hints to hardware. 25 25 26 ``amd-pstate`` leverages the Linux kernel gove 26 ``amd-pstate`` leverages the Linux kernel governors such as ``schedutil``, 27 ``ondemand``, etc. to manage the performance h 27 ``ondemand``, etc. to manage the performance hints which are provided by 28 CPPC hardware functionality that internally fo 28 CPPC hardware functionality that internally follows the hardware 29 specification (for details refer to AMD64 Arch 29 specification (for details refer to AMD64 Architecture Programmer's Manual 30 Volume 2: System Programming [1]_). Currently, !! 30 Volume 2: System Programming [1]_). Currently ``amd-pstate`` supports basic 31 frequency control function according to kernel 31 frequency control function according to kernel governors on some of the 32 Zen2 and Zen3 processors, and we will implemen 32 Zen2 and Zen3 processors, and we will implement more AMD specific functions 33 in future after we verify them on the hardware 33 in future after we verify them on the hardware and SBIOS. 34 34 35 35 36 AMD CPPC Overview 36 AMD CPPC Overview 37 ======================= 37 ======================= 38 38 39 Collaborative Processor Performance Control (C 39 Collaborative Processor Performance Control (CPPC) interface enumerates a 40 continuous, abstract, and unit-less performanc 40 continuous, abstract, and unit-less performance value in a scale that is 41 not tied to a specific performance state / fre 41 not tied to a specific performance state / frequency. This is an ACPI 42 standard [2]_ which software can specify appli 42 standard [2]_ which software can specify application performance goals and 43 hints as a relative target to the infrastructu 43 hints as a relative target to the infrastructure limits. AMD processors 44 provide the low latency register model (MSR) i !! 44 provides the low latency register model (MSR) instead of AML code 45 interpreter for performance adjustments. ``amd 45 interpreter for performance adjustments. ``amd-pstate`` will initialize a 46 ``struct cpufreq_driver`` instance, ``amd_psta !! 46 ``struct cpufreq_driver`` instance ``amd_pstate_driver`` with the callbacks 47 to manage each performance update behavior. :: 47 to manage each performance update behavior. :: 48 48 49 Highest Perf ------>+-----------------------+ 49 Highest Perf ------>+-----------------------+ +-----------------------+ 50 | | 50 | | | | 51 | | 51 | | | | 52 | | 52 | | Max Perf ---->| | 53 | | 53 | | | | 54 | | 54 | | | | 55 Nominal Perf ------>+-----------------------+ 55 Nominal Perf ------>+-----------------------+ +-----------------------+ 56 | | 56 | | | | 57 | | 57 | | | | 58 | | 58 | | | | 59 | | 59 | | | | 60 | | 60 | | | | 61 | | 61 | | | | 62 | | 62 | | Desired Perf ---->| | 63 | | 63 | | | | 64 | | 64 | | | | 65 | | 65 | | | | 66 | | 66 | | | | 67 | | 67 | | | | 68 | | 68 | | | | 69 | | 69 | | | | 70 | | 70 | | | | 71 | | 71 | | | | 72 Lowest non- | | 72 Lowest non- | | | | 73 linear perf ------>+-----------------------+ 73 linear perf ------>+-----------------------+ +-----------------------+ 74 | | 74 | | | | 75 | | 75 | | Lowest perf ---->| | 76 | | 76 | | | | 77 Lowest perf ------>+-----------------------+ 77 Lowest perf ------>+-----------------------+ +-----------------------+ 78 | | 78 | | | | 79 | | 79 | | | | 80 | | 80 | | | | 81 0 ------>+-----------------------+ 81 0 ------>+-----------------------+ +-----------------------+ 82 82 83 AMD P-Sta 83 AMD P-States Performance Scale 84 84 85 85 86 .. _perf_cap: 86 .. _perf_cap: 87 87 88 AMD CPPC Performance Capability 88 AMD CPPC Performance Capability 89 -------------------------------- 89 -------------------------------- 90 90 91 Highest Performance (RO) 91 Highest Performance (RO) 92 ......................... 92 ......................... 93 93 94 This is the absolute maximum performance an in !! 94 It is the absolute maximum performance an individual processor may reach, 95 assuming ideal conditions. This performance le 95 assuming ideal conditions. This performance level may not be sustainable 96 for long durations and may only be achievable 96 for long durations and may only be achievable if other platform components 97 are in a specific state; for example, it may r !! 97 are in a specific state; for example, it may require other processors be in 98 an idle state. This would be equivalent to the 98 an idle state. This would be equivalent to the highest frequencies 99 supported by the processor. 99 supported by the processor. 100 100 101 Nominal (Guaranteed) Performance (RO) 101 Nominal (Guaranteed) Performance (RO) 102 ...................................... 102 ...................................... 103 103 104 This is the maximum sustained performance leve !! 104 It is the maximum sustained performance level of the processor, assuming 105 ideal operating conditions. In the absence of !! 105 ideal operating conditions. In absence of an external constraint (power, 106 thermal, etc.), this is the performance level !! 106 thermal, etc.) this is the performance level the processor is expected to 107 be able to maintain continuously. All cores/pr 107 be able to maintain continuously. All cores/processors are expected to be 108 able to sustain their nominal performance stat 108 able to sustain their nominal performance state simultaneously. 109 109 110 Lowest non-linear Performance (RO) 110 Lowest non-linear Performance (RO) 111 ................................... 111 ................................... 112 112 113 This is the lowest performance level at which !! 113 It is the lowest performance level at which nonlinear power savings are 114 achieved, for example, due to the combined eff 114 achieved, for example, due to the combined effects of voltage and frequency 115 scaling. Above this threshold, lower performan 115 scaling. Above this threshold, lower performance levels should be generally 116 more energy efficient than higher performance 116 more energy efficient than higher performance levels. This register 117 effectively conveys the most efficient perform 117 effectively conveys the most efficient performance level to ``amd-pstate``. 118 118 119 Lowest Performance (RO) 119 Lowest Performance (RO) 120 ........................ 120 ........................ 121 121 122 This is the absolute lowest performance level !! 122 It is the absolute lowest performance level of the processor. Selecting a 123 performance level lower than the lowest nonlin 123 performance level lower than the lowest nonlinear performance level may 124 cause an efficiency penalty but should reduce 124 cause an efficiency penalty but should reduce the instantaneous power 125 consumption of the processor. 125 consumption of the processor. 126 126 127 AMD CPPC Performance Control 127 AMD CPPC Performance Control 128 ------------------------------ 128 ------------------------------ 129 129 130 ``amd-pstate`` passes performance goals throug 130 ``amd-pstate`` passes performance goals through these registers. The 131 register drives the behavior of the desired pe 131 register drives the behavior of the desired performance target. 132 132 133 Minimum requested performance (RW) 133 Minimum requested performance (RW) 134 ................................... 134 ................................... 135 135 136 ``amd-pstate`` specifies the minimum allowed p 136 ``amd-pstate`` specifies the minimum allowed performance level. 137 137 138 Maximum requested performance (RW) 138 Maximum requested performance (RW) 139 ................................... 139 ................................... 140 140 141 ``amd-pstate`` specifies a limit the maximum p 141 ``amd-pstate`` specifies a limit the maximum performance that is expected 142 to be supplied by the hardware. 142 to be supplied by the hardware. 143 143 144 Desired performance target (RW) 144 Desired performance target (RW) 145 ................................... 145 ................................... 146 146 147 ``amd-pstate`` specifies a desired target in t 147 ``amd-pstate`` specifies a desired target in the CPPC performance scale as 148 a relative number. This can be expressed as pe 148 a relative number. This can be expressed as percentage of nominal 149 performance (infrastructure max). Below the no 149 performance (infrastructure max). Below the nominal sustained performance 150 level, desired performance expresses the avera 150 level, desired performance expresses the average performance level of the 151 processor subject to hardware. Above the nomin 151 processor subject to hardware. Above the nominal performance level, 152 the processor must provide at least nominal pe !! 152 processor must provide at least nominal performance requested and go higher 153 if current operating conditions allow. 153 if current operating conditions allow. 154 154 155 Energy Performance Preference (EPP) (RW) 155 Energy Performance Preference (EPP) (RW) 156 ......................................... 156 ......................................... 157 157 158 This attribute provides a hint to the hardware !! 158 Provides a hint to the hardware if software wants to bias toward performance 159 toward performance (0x0) or energy efficiency !! 159 (0x0) or energy efficiency (0xff). 160 160 161 161 162 Key Governors Support 162 Key Governors Support 163 ======================= 163 ======================= 164 164 165 ``amd-pstate`` can be used with all the (gener 165 ``amd-pstate`` can be used with all the (generic) scaling governors listed 166 by the ``scaling_available_governors`` policy 166 by the ``scaling_available_governors`` policy attribute in ``sysfs``. Then, 167 it is responsible for the configuration of pol 167 it is responsible for the configuration of policy objects corresponding to 168 CPUs and provides the ``CPUFreq`` core (and th 168 CPUs and provides the ``CPUFreq`` core (and the scaling governors attached 169 to the policy objects) with accurate informati 169 to the policy objects) with accurate information on the maximum and minimum 170 operating frequencies supported by the hardwar 170 operating frequencies supported by the hardware. Users can check the 171 ``scaling_cur_freq`` information comes from th 171 ``scaling_cur_freq`` information comes from the ``CPUFreq`` core. 172 172 173 ``amd-pstate`` mainly supports ``schedutil`` a 173 ``amd-pstate`` mainly supports ``schedutil`` and ``ondemand`` for dynamic 174 frequency control. It is to fine tune the proc 174 frequency control. It is to fine tune the processor configuration on 175 ``amd-pstate`` to the ``schedutil`` with CPU C 175 ``amd-pstate`` to the ``schedutil`` with CPU CFS scheduler. ``amd-pstate`` 176 registers the adjust_perf callback to implemen !! 176 registers adjust_perf callback to implement the CPPC similar performance 177 similar to CPPC. It is initialized by ``sugov_ !! 177 update behavior. It is initialized by ``sugov_start`` and then populate the 178 CPU's update_util_data pointer to assign ``sug !! 178 CPU's update_util_data pointer to assign ``sugov_update_single_perf`` as 179 utilization update callback function in the CP !! 179 the utilization update callback function in CPU scheduler. CPU scheduler 180 will call ``cpufreq_update_util`` and assigns !! 180 will call ``cpufreq_update_util`` and assign the target performance 181 to the ``struct sugov_cpu`` that the utilizati !! 181 according to the ``struct sugov_cpu`` that utilization update belongs to. 182 Then, ``amd-pstate`` updates the desired perfo !! 182 Then ``amd-pstate`` updates the desired performance according to the CPU 183 scheduler assigned. 183 scheduler assigned. 184 184 185 .. _processor_support: << 186 185 187 Processor Support 186 Processor Support 188 ======================= 187 ======================= 189 188 190 The ``amd-pstate`` initialization will fail if !! 189 The ``amd-pstate`` initialization will fail if the _CPC in ACPI SBIOS is 191 SBIOS does not exist in the detected processor !! 190 not existed at the detected processor, and it uses ``acpi_cpc_valid`` to 192 to check the existence of ``_CPC``. All Zen ba !! 191 check the _CPC existence. All Zen based processors support legacy ACPI 193 ACPI hardware P-States function, so when ``amd !! 192 hardware P-States function, so while the ``amd-pstate`` fails to be 194 the kernel will fall back to initialize the `` !! 193 initialized, the kernel will fall back to initialize ``acpi-cpufreq`` >> 194 driver. 195 195 196 There are two types of hardware implementation 196 There are two types of hardware implementations for ``amd-pstate``: one is 197 `Full MSR Support <perf_cap_>`_ and another is 197 `Full MSR Support <perf_cap_>`_ and another is `Shared Memory Support 198 <perf_cap_>`_. It can use the :c:macro:`X86_FE !! 198 <perf_cap_>`_. It can use :c:macro:`X86_FEATURE_CPPC` feature flag (for 199 indicate the different types. (For details, re !! 199 details refer to Processor Programming Reference (PPR) for AMD Family 200 Reference (PPR) for AMD Family 19h Model 51h, !! 200 19h Model 51h, Revision A1 Processors [3]_) to indicate the different 201 ``amd-pstate`` is to register different ``stat !! 201 types. ``amd-pstate`` is to register different ``static_call`` instances 202 hardware implementations. !! 202 for different hardware implementations. 203 203 204 Currently, some of the Zen2 and Zen3 processor !! 204 Currently, some of Zen2 and Zen3 processors support ``amd-pstate``. In the 205 future, it will be supported on more and more 205 future, it will be supported on more and more AMD processors. 206 206 207 Full MSR Support 207 Full MSR Support 208 ----------------- 208 ----------------- 209 209 210 Some new Zen3 processors such as Cezanne provi 210 Some new Zen3 processors such as Cezanne provide the MSR registers directly 211 while the :c:macro:`X86_FEATURE_CPPC` CPU feat 211 while the :c:macro:`X86_FEATURE_CPPC` CPU feature flag is set. 212 ``amd-pstate`` can handle the MSR register to 212 ``amd-pstate`` can handle the MSR register to implement the fast switch 213 function in ``CPUFreq`` that can reduce the la !! 213 function in ``CPUFreq`` that can shrink latency of frequency control on the 214 interrupt context. The functions with a ``psta !! 214 interrupt context. The functions with ``pstate_xxx`` prefix represent the 215 operations on MSR registers. !! 215 operations of MSR registers. 216 216 217 Shared Memory Support 217 Shared Memory Support 218 ---------------------- 218 ---------------------- 219 219 220 If the :c:macro:`X86_FEATURE_CPPC` CPU feature !! 220 If :c:macro:`X86_FEATURE_CPPC` CPU feature flag is not set, that means the 221 processor supports the shared memory solution. !! 221 processor supports shared memory solution. In this case, ``amd-pstate`` 222 uses the ``cppc_acpi`` helper methods to imple 222 uses the ``cppc_acpi`` helper methods to implement the callback functions 223 that are defined on ``static_call``. The funct !! 223 that defined on ``static_call``. The functions with ``cppc_xxx`` prefix 224 represent the operations of ACPI CPPC helpers !! 224 represent the operations of acpi cppc helpers for shared memory solution. 225 225 226 226 227 AMD P-States and ACPI hardware P-States always 227 AMD P-States and ACPI hardware P-States always can be supported in one 228 processor. But AMD P-States has the higher pri 228 processor. But AMD P-States has the higher priority and if it is enabled 229 with :c:macro:`MSR_AMD_CPPC_ENABLE` or ``cppc_ 229 with :c:macro:`MSR_AMD_CPPC_ENABLE` or ``cppc_set_enable``, it will respond 230 to the request from AMD P-States. 230 to the request from AMD P-States. 231 231 232 232 233 User Space Interface in ``sysfs`` - Per-policy !! 233 User Space Interface in ``sysfs`` 234 ============================================== !! 234 ================================== 235 235 236 ``amd-pstate`` exposes several global attribut 236 ``amd-pstate`` exposes several global attributes (files) in ``sysfs`` to 237 control its functionality at the system level. !! 237 control its functionality at the system level. They located in the 238 ``/sys/devices/system/cpu/cpufreq/policyX/`` d 238 ``/sys/devices/system/cpu/cpufreq/policyX/`` directory and affect all CPUs. :: 239 239 240 root@hr-test1:/home/ray# ls /sys/devices/syst 240 root@hr-test1:/home/ray# ls /sys/devices/system/cpu/cpufreq/policy0/*amd* 241 /sys/devices/system/cpu/cpufreq/policy0/amd_p 241 /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_highest_perf 242 /sys/devices/system/cpu/cpufreq/policy0/amd_p 242 /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_lowest_nonlinear_freq 243 /sys/devices/system/cpu/cpufreq/policy0/amd_p 243 /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_max_freq 244 244 245 245 246 ``amd_pstate_highest_perf / amd_pstate_max_fre 246 ``amd_pstate_highest_perf / amd_pstate_max_freq`` 247 247 248 Maximum CPPC performance and CPU frequency tha 248 Maximum CPPC performance and CPU frequency that the driver is allowed to 249 set, in percent of the maximum supported CPPC !! 249 set in percent of the maximum supported CPPC performance level (the highest 250 performance supported in `AMD CPPC Performance 250 performance supported in `AMD CPPC Performance Capability <perf_cap_>`_). 251 In some ASICs, the highest CPPC performance is !! 251 In some of ASICs, the highest CPPC performance is not the one in the _CPC 252 table, so we need to expose it to sysfs. If bo !! 252 table, so we need to expose it to sysfs. If boost is not active but 253 still supported, this maximum frequency will b !! 253 supported, this maximum frequency will be larger than the one in 254 ``cpuinfo``. On systems that support preferred !! 254 ``cpuinfo``. 255 different values for some cores than others an << 256 advertised by the platform at bootup. << 257 This attribute is read-only. 255 This attribute is read-only. 258 256 259 ``amd_pstate_lowest_nonlinear_freq`` 257 ``amd_pstate_lowest_nonlinear_freq`` 260 258 261 The lowest non-linear CPPC CPU frequency that !! 259 The lowest non-linear CPPC CPU frequency that the driver is allowed to set 262 in percent of the maximum supported CPPC perfo !! 260 in percent of the maximum supported CPPC performance level (Please see the 263 lowest non-linear performance in `AMD CPPC Per 261 lowest non-linear performance in `AMD CPPC Performance Capability 264 <perf_cap_>`_.) !! 262 <perf_cap_>`_). 265 This attribute is read-only. 263 This attribute is read-only. 266 264 267 ``amd_pstate_hw_prefcore`` !! 265 For other performance and frequency values, we can read them back from 268 << 269 Whether the platform supports the preferred co << 270 enabled. This attribute is read-only. << 271 << 272 ``amd_pstate_prefcore_ranking`` << 273 << 274 The performance ranking of the core. This numb << 275 larger numbers are preferred at the time of re << 276 runtime based on platform conditions. This att << 277 << 278 ``energy_performance_available_preferences`` << 279 << 280 A list of all the supported EPP preferences th << 281 ``energy_performance_preference`` on this syst << 282 These profiles represent different hints that << 283 to the low-level firmware about the user's des << 284 tradeoff. ``default`` represents the epp valu << 285 firmware. This attribute is read-only. << 286 << 287 ``energy_performance_preference`` << 288 << 289 The current energy performance preference can << 290 and user can change current preference accordi << 291 Please get all support profiles list from << 292 ``energy_performance_available_preferences`` a << 293 integer values defined between 0 to 255 when E << 294 firmware, if EPP feature is disabled, driver w << 295 This attribute is read-write. << 296 << 297 ``boost`` << 298 The `boost` sysfs attribute provides control o << 299 performance boost, allowing users to manage th << 300 of the CPU. This attribute can be used to enab << 301 on individual CPUs. << 302 << 303 When the boost feature is enabled, the CPU can << 304 beyond the base frequency, providing enhanced << 305 On the other hand, disabling the boost feature << 306 base frequency, which may be desirable in cert << 307 efficiency or manage temperature. << 308 << 309 To manipulate the `boost` attribute, users can << 310 boost or `1` to enable it, for the respective << 311 `/sys/devices/system/cpu/cpuX/cpufreq/boost`, << 312 << 313 Other performance and frequency values can be << 314 ``/sys/devices/system/cpu/cpuX/acpi_cppc/``, s 266 ``/sys/devices/system/cpu/cpuX/acpi_cppc/``, see :ref:`cppc_sysfs`. 315 267 316 268 317 ``amd-pstate`` vs ``acpi-cpufreq`` 269 ``amd-pstate`` vs ``acpi-cpufreq`` 318 ====================================== 270 ====================================== 319 271 320 On the majority of AMD platforms supported by !! 272 On majority of AMD platforms supported by ``acpi-cpufreq``, the ACPI tables 321 provided by the platform firmware are used for !! 273 provided by the platform firmware used for CPU performance scaling, but 322 only provide 3 P-states on AMD processors. !! 274 only provides 3 P-states on AMD processors. 323 However, on modern AMD APU and CPU series, har !! 275 However, on modern AMD APU and CPU series, it provides the collaborative 324 Processor Performance Control according to the !! 276 processor performance control according to ACPI protocol and customize this 325 for AMD platforms. That is, fine-grained and c !! 277 for AMD platforms. That is fine-grain and continuous frequency range 326 instead of the legacy hardware P-states. ``amd 278 instead of the legacy hardware P-states. ``amd-pstate`` is the kernel 327 module which supports the new AMD P-States mec !! 279 module which supports the new AMD P-States mechanism on most of future AMD 328 platforms. The AMD P-States mechanism is the m !! 280 platforms. The AMD P-States mechanism will be the more performance and energy 329 efficiency frequency management method on AMD 281 efficiency frequency management method on AMD processors. 330 282 >> 283 Kernel Module Options for ``amd-pstate`` >> 284 ========================================= 331 285 332 ``amd-pstate`` Driver Operation Modes !! 286 ``shared_mem`` 333 ====================================== !! 287 Use a module param (shared_mem) to enable related processors manually with 334 !! 288 **amd_pstate.shared_mem=1**. 335 ``amd_pstate`` CPPC has 3 operation modes: aut !! 289 Due to the performance issue on the processors with `Shared Memory Support 336 non-autonomous (passive) mode and guided auton !! 290 <perf_cap_>`_, so we disable it for the moment and will enable this by default 337 Active/passive/guided mode can be chosen by di !! 291 once we address performance issue on this solution. 338 << 339 - In autonomous mode, platform ignores the des << 340 and takes into account only the values set t << 341 performance preference registers. << 342 - In non-autonomous mode, platform gets desire << 343 from OS directly through Desired Performance << 344 - In guided-autonomous mode, platform sets ope << 345 autonomously according to the current worklo << 346 OS through min and max performance registers << 347 << 348 Active Mode << 349 ------------ << 350 << 351 ``amd_pstate=active`` << 352 << 353 This is the low-level firmware control mode wh << 354 driver with ``amd_pstate=active`` passed to th << 355 In this mode, ``amd_pstate_epp`` driver provid << 356 wants to bias toward performance (0x0) or ener << 357 then CPPC power algorithm will calculate the r << 358 cores frequency according to the power supply << 359 hardware conditions. << 360 << 361 Passive Mode << 362 ------------ << 363 << 364 ``amd_pstate=passive`` << 365 << 366 It will be enabled if the ``amd_pstate=passive << 367 In this mode, ``amd_pstate`` driver software s << 368 performance scale as a relative number. This c << 369 performance (infrastructure max). Below the no << 370 desired performance expresses the average perf << 371 to the Performance Reduction Tolerance registe << 372 processor must provide at least nominal perfor << 373 operating conditions allow. << 374 << 375 Guided Mode << 376 ----------- << 377 << 378 ``amd_pstate=guided`` << 379 << 380 If ``amd_pstate=guided`` is passed to kernel c << 381 is activated. In this mode, driver requests m << 382 level and the platform autonomously selects a << 383 and appropriate to the current workload. << 384 << 385 ``amd-pstate`` Preferred Core << 386 ================================= << 387 << 388 The core frequency is subjected to the process << 389 Not all cores are able to reach the maximum fr << 390 infrastructure limits. Consequently, AMD has r << 391 maximum frequency of a part. This means that a << 392 maximum frequency. To find the best process sc << 393 scenario, OS needs to know the core ordering i << 394 highest performance capability register of the << 395 << 396 ``amd-pstate`` preferred core enables the sche << 397 cores that can achieve a higher frequency with << 398 core rankings can dynamically change based on << 399 thermals and ageing. << 400 << 401 The priority metric will be initialized by the << 402 driver will also determine whether or not ``am << 403 supported by the platform. << 404 << 405 ``amd-pstate`` driver will provide an initial << 406 The platform uses the CPPC interfaces to commu << 407 operating system and scheduler to make sure th << 408 with highest performance firstly for schedulin << 409 driver receives a message with the highest per << 410 update the core ranking and set the cpu's prio << 411 << 412 ``amd-pstate`` Preferred Core Switch << 413 ===================================== << 414 Kernel Parameters << 415 ----------------- << 416 292 417 ``amd-pstate`` peferred core`` has two states: !! 293 The way to check whether current processor is `Full MSR Support <perf_cap_>`_ 418 Enable/disable states can be chosen by differe !! 294 or `Shared Memory Support <perf_cap_>`_ : :: 419 Default enable ``amd-pstate`` preferred core. << 420 295 421 ``amd_prefcore=disable`` !! 296 ray@hr-test1:~$ lscpu | grep cppc 422 !! 297 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm 423 For systems that support ``amd-pstate`` prefer << 424 always be advertised by the platform. But OS c << 425 kernel parameter ``amd_prefcore=disable``. << 426 << 427 User Space Interface in ``sysfs`` - General << 428 =========================================== << 429 << 430 Global Attributes << 431 ----------------- << 432 << 433 ``amd-pstate`` exposes several global attribut << 434 control its functionality at the system level. << 435 ``/sys/devices/system/cpu/amd_pstate/`` direct << 436 298 437 ``status`` !! 299 If CPU Flags have cppc, then this processor supports `Full MSR Support 438 Operation mode of the driver: "active" !! 300 <perf_cap_>`_. Otherwise it supports `Shared Memory Support <perf_cap_>`_. 439 301 440 "active" << 441 The driver is functional and i << 442 << 443 "passive" << 444 The driver is functional and i << 445 << 446 "guided" << 447 The driver is functional and i << 448 << 449 "disable" << 450 The driver is unregistered and << 451 << 452 This attribute can be written to in or << 453 operation mode or to unregister it. T << 454 one of the possible values of it and, << 455 these values to the sysfs file will ca << 456 to the operation mode represented by t << 457 unregistered in the "disable" case. << 458 << 459 ``prefcore`` << 460 Preferred core state of the driver: "e << 461 << 462 "enabled" << 463 Enable the ``amd-pstate`` pref << 464 << 465 "disabled" << 466 Disable the ``amd-pstate`` pre << 467 << 468 << 469 This attribute is read-only to check t << 470 by the kernel parameter. << 471 302 472 ``cpupower`` tool support for ``amd-pstate`` 303 ``cpupower`` tool support for ``amd-pstate`` 473 ============================================== 304 =============================================== 474 305 475 ``amd-pstate`` is supported by the ``cpupower` !! 306 ``amd-pstate`` is supported on ``cpupower`` tool that can be used to dump the frequency 476 frequency information. Development is in progr !! 307 information. And it is in progress to support more and more operations for new 477 operations for the new ``amd-pstate`` module w !! 308 ``amd-pstate`` module with this tool. :: 478 309 479 root@hr-test1:/home/ray# cpupower frequency-i 310 root@hr-test1:/home/ray# cpupower frequency-info 480 analyzing CPU 0: 311 analyzing CPU 0: 481 driver: amd-pstate 312 driver: amd-pstate 482 CPUs which run at the same hardware frequen 313 CPUs which run at the same hardware frequency: 0 483 CPUs which need to have their frequency coo 314 CPUs which need to have their frequency coordinated by software: 0 484 maximum transition latency: 131 us 315 maximum transition latency: 131 us 485 hardware limits: 400 MHz - 4.68 GHz 316 hardware limits: 400 MHz - 4.68 GHz 486 available cpufreq governors: ondemand conse 317 available cpufreq governors: ondemand conservative powersave userspace performance schedutil 487 current policy: frequency should be within 318 current policy: frequency should be within 400 MHz and 4.68 GHz. 488 The governor "schedutil" ma 319 The governor "schedutil" may decide which speed to use 489 within this range. 320 within this range. 490 current CPU frequency: Unable to call hardw 321 current CPU frequency: Unable to call hardware 491 current CPU frequency: 4.02 GHz (asserted b 322 current CPU frequency: 4.02 GHz (asserted by call to kernel) 492 boost state support: 323 boost state support: 493 Supported: yes 324 Supported: yes 494 Active: yes 325 Active: yes 495 AMD PSTATE Highest Performance: 166. Maxi 326 AMD PSTATE Highest Performance: 166. Maximum Frequency: 4.68 GHz. 496 AMD PSTATE Nominal Performance: 117. Nomi 327 AMD PSTATE Nominal Performance: 117. Nominal Frequency: 3.30 GHz. 497 AMD PSTATE Lowest Non-linear Performance: 328 AMD PSTATE Lowest Non-linear Performance: 39. Lowest Non-linear Frequency: 1.10 GHz. 498 AMD PSTATE Lowest Performance: 15. Lowest 329 AMD PSTATE Lowest Performance: 15. Lowest Frequency: 400 MHz. 499 330 500 331 501 Diagnostics and Tuning 332 Diagnostics and Tuning 502 ======================= 333 ======================= 503 334 504 Trace Events 335 Trace Events 505 -------------- 336 -------------- 506 337 507 There are two static trace events that can be 338 There are two static trace events that can be used for ``amd-pstate`` 508 diagnostics. One of them is the ``cpu_frequenc !! 339 diagnostics. One of them is the cpu_frequency trace event generally used 509 by ``CPUFreq``, and the other one is the ``amd 340 by ``CPUFreq``, and the other one is the ``amd_pstate_perf`` trace event 510 specific to ``amd-pstate``. The following seq 341 specific to ``amd-pstate``. The following sequence of shell commands can 511 be used to enable them and see their output (i !! 342 be used to enable them and see their output (if the kernel is generally 512 configured to support event tracing). :: 343 configured to support event tracing). :: 513 344 514 root@hr-test1:/home/ray# cd /sys/kernel/traci 345 root@hr-test1:/home/ray# cd /sys/kernel/tracing/ 515 root@hr-test1:/sys/kernel/tracing# echo 1 > e 346 root@hr-test1:/sys/kernel/tracing# echo 1 > events/amd_cpu/enable 516 root@hr-test1:/sys/kernel/tracing# cat trace 347 root@hr-test1:/sys/kernel/tracing# cat trace 517 # tracer: nop 348 # tracer: nop 518 # 349 # 519 # entries-in-buffer/entries-written: 47827/42 350 # entries-in-buffer/entries-written: 47827/42233061 #P:2 520 # 351 # 521 # _-----=> irq 352 # _-----=> irqs-off 522 # / _----=> nee 353 # / _----=> need-resched 523 # | / _---=> har 354 # | / _---=> hardirq/softirq 524 # || / _--=> pre 355 # || / _--=> preempt-depth 525 # ||| / dela 356 # ||| / delay 526 # TASK-PID CPU# |||| TIMESTA 357 # TASK-PID CPU# |||| TIMESTAMP FUNCTION 527 # | | | |||| | 358 # | | | |||| | | 528 <idle>-0 [015] dN... 4995.979 359 <idle>-0 [015] dN... 4995.979886: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=15 changed=false fast_switch=true 529 <idle>-0 [007] d.h.. 4995.979 360 <idle>-0 [007] d.h.. 4995.979893: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=7 changed=false fast_switch=true 530 cat-2161 [000] d.... 4995.980 361 cat-2161 [000] d.... 4995.980841: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=0 changed=false fast_switch=true 531 sshd-2125 [004] d.s.. 4995.980 362 sshd-2125 [004] d.s.. 4995.980968: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=4 changed=false fast_switch=true 532 <idle>-0 [007] d.s.. 4995.980 363 <idle>-0 [007] d.s.. 4995.980968: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=7 changed=false fast_switch=true 533 <idle>-0 [003] d.s.. 4995.980 364 <idle>-0 [003] d.s.. 4995.980971: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=3 changed=false fast_switch=true 534 <idle>-0 [011] d.s.. 4995.980 365 <idle>-0 [011] d.s.. 4995.980996: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=11 changed=false fast_switch=true 535 366 536 The ``cpu_frequency`` trace event will be trig !! 367 The cpu_frequency trace event will be triggered either by the ``schedutil`` scaling 537 governor (for the policies it is attached to), 368 governor (for the policies it is attached to), or by the ``CPUFreq`` core (for the 538 policies with other scaling governors). 369 policies with other scaling governors). 539 370 540 371 541 Tracer Tool << 542 ------------- << 543 << 544 ``amd_pstate_tracer.py`` can record and parse << 545 generate performance plots. This utility can b << 546 performance of ``amd-pstate`` driver. The trac << 547 pstate tracer. << 548 << 549 Tracer tool located in ``linux/tools/power/x86 << 550 used in two ways. If trace file is available, << 551 with command :: << 552 << 553 ./amd_pstate_trace.py [-c cpus] -t <trace_fil << 554 << 555 Or generate trace file with root privilege, th << 556 << 557 sudo ./amd_pstate_trace.py [-c cpus] -n <test << 558 << 559 The test result can be found in ``results/test << 560 about part of the output. :: << 561 << 562 common_cpu common_secs common_usecs min_pe << 563 CPU_005 712 116384 39 << 564 CPU_006 712 116408 39 << 565 << 566 Unit Tests for amd-pstate << 567 ------------------------- << 568 << 569 ``amd-pstate-ut`` is a test module for testing << 570 << 571 * It can help all users to verify their proce << 572 << 573 * Kernel can have a basic function test to av << 574 << 575 * We can introduce more functional or perform << 576 << 577 1. Test case descriptions << 578 << 579 1). Basic tests << 580 << 581 Test prerequisite and basic functions << 582 << 583 +---------+--------------------------- << 584 | Index | Functions << 585 +=========+=========================== << 586 | 1 | amd_pstate_ut_acpi_cpc_val << 587 | | << 588 | | << 589 +---------+--------------------------- << 590 | 2 | amd_pstate_ut_check_enable << 591 | | << 592 | | << 593 | | << 594 | | << 595 | | << 596 +---------+--------------------------- << 597 | 3 | amd_pstate_ut_check_perf << 598 | | << 599 +---------+--------------------------- << 600 | 4 | amd_pstate_ut_check_freq << 601 | | << 602 | | << 603 | | << 604 | | << 605 +---------+--------------------------- << 606 << 607 2). Tbench test << 608 << 609 Test and monitor the cpu changes when << 610 These changes include desire performan << 611 The specified governor is ondemand or << 612 Tbench can also be tested on the ``acp << 613 << 614 3). Gitsource test << 615 << 616 Test and monitor the cpu changes when << 617 These changes include desire performan << 618 The specified governor is ondemand or << 619 Gitsource can also be tested on the `` << 620 << 621 #. How to execute the tests << 622 << 623 We use test module in the kselftest framewo << 624 We create ``amd-pstate-ut`` module and tie << 625 details refer to Linux Kernel Selftests [4] << 626 << 627 1). Build << 628 << 629 + open the :c:macro:`CONFIG_X86_AMD_PS << 630 + set the :c:macro:`CONFIG_X86_AMD_PST << 631 + make project << 632 + make selftest :: << 633 << 634 $ cd linux << 635 $ make -C tools/testing/selftests << 636 << 637 + make perf :: << 638 << 639 $ cd tools/perf/ << 640 $ make << 641 << 642 << 643 2). Installation & Steps :: << 644 << 645 $ make -C tools/testing/selftests inst << 646 $ cp tools/perf/perf /usr/bin/perf << 647 $ sudo ./kselftest/run_kselftest.sh -c << 648 << 649 3). Specified test case :: << 650 << 651 $ cd ~/kselftest/amd-pstate << 652 $ sudo ./run.sh -t basic << 653 $ sudo ./run.sh -t tbench << 654 $ sudo ./run.sh -t tbench -m acpi-cpuf << 655 $ sudo ./run.sh -t gitsource << 656 $ sudo ./run.sh -t gitsource -m acpi-c << 657 $ ./run.sh --help << 658 ./run.sh: illegal option -- - << 659 Usage: ./run.sh [OPTION...] << 660 [-h <help>] << 661 [-o <output-file-for-dump>] << 662 [-c <all: All testing, << 663 basic: Basic testing, << 664 tbench: Tbench testing, << 665 gitsource: Gitsource test << 666 [-t <tbench time limit>] << 667 [-p <tbench process number>] << 668 [-l <loop times for tbench>] << 669 [-i <amd tracer interval>] << 670 [-m <comparative test: acpi-cp << 671 << 672 << 673 4). Results << 674 << 675 + basic << 676 << 677 When you finish test, you will get th << 678 << 679 $ dmesg | grep "amd_pstate_ut" | tee << 680 [12977.570663] amd_pstate_ut: 1 a << 681 [12977.570673] amd_pstate_ut: 2 a << 682 [12977.571207] amd_pstate_ut: 3 a << 683 [12977.571212] amd_pstate_ut: 4 a << 684 << 685 + tbench << 686 << 687 When you finish test, you will get se << 688 The selftest.tbench.csv file contains << 689 The png images shows the performance, << 690 Open selftest.tbench.csv : << 691 << 692 +------------------------------------ << 693 + Governor << 694 +------------------------------------ << 695 + Unit << 696 +==================================== << 697 + amd-pstate-ondemand << 698 +------------------------------------ << 699 + amd-pstate-ondemand << 700 +------------------------------------ << 701 + amd-pstate-ondemand << 702 +------------------------------------ << 703 + amd-pstate-ondemand << 704 +------------------------------------ << 705 + amd-pstate-schedutil << 706 +------------------------------------ << 707 + amd-pstate-schedutil << 708 +------------------------------------ << 709 + amd-pstate-schedutil << 710 +------------------------------------ << 711 + amd-pstate-schedutil << 712 +------------------------------------ << 713 + acpi-cpufreq-ondemand << 714 +------------------------------------ << 715 + acpi-cpufreq-ondemand << 716 +------------------------------------ << 717 + acpi-cpufreq-ondemand << 718 +------------------------------------ << 719 + acpi-cpufreq-ondemand << 720 +------------------------------------ << 721 + acpi-cpufreq-schedutil << 722 +------------------------------------ << 723 + acpi-cpufreq-schedutil << 724 +------------------------------------ << 725 + acpi-cpufreq-schedutil << 726 +------------------------------------ << 727 + acpi-cpufreq-schedutil << 728 +------------------------------------ << 729 + acpi-cpufreq-ondemand VS acpi-cpufr << 730 +------------------------------------ << 731 + amd-pstate-ondemand VS amd-pstate-s << 732 +------------------------------------ << 733 + acpi-cpufreq-ondemand VS amd-pstate << 734 +------------------------------------ << 735 + acpi-cpufreq-schedutil VS amd-pstat << 736 +------------------------------------ << 737 << 738 + gitsource << 739 << 740 When you finish test, you will get se << 741 The selftest.gitsource.csv file conta << 742 The png images shows the performance, << 743 Open selftest.gitsource.csv : << 744 << 745 +------------------------------------ << 746 + Governor << 747 +------------------------------------ << 748 + Unit << 749 +==================================== << 750 + amd-pstate-ondemand << 751 +------------------------------------ << 752 + amd-pstate-ondemand << 753 +------------------------------------ << 754 + amd-pstate-ondemand << 755 +------------------------------------ << 756 + amd-pstate-ondemand << 757 +------------------------------------ << 758 + amd-pstate-schedutil << 759 +------------------------------------ << 760 + amd-pstate-schedutil << 761 +------------------------------------ << 762 + amd-pstate-schedutil << 763 +------------------------------------ << 764 + amd-pstate-schedutil << 765 +------------------------------------ << 766 + acpi-cpufreq-ondemand << 767 +------------------------------------ << 768 + acpi-cpufreq-ondemand << 769 +------------------------------------ << 770 + acpi-cpufreq-ondemand << 771 +------------------------------------ << 772 + acpi-cpufreq-ondemand << 773 +------------------------------------ << 774 + acpi-cpufreq-schedutil << 775 +------------------------------------ << 776 + acpi-cpufreq-schedutil << 777 +------------------------------------ << 778 + acpi-cpufreq-schedutil << 779 +------------------------------------ << 780 + acpi-cpufreq-schedutil << 781 +------------------------------------ << 782 + acpi-cpufreq-ondemand VS acpi-cpufr << 783 +------------------------------------ << 784 + amd-pstate-ondemand VS amd-pstate-s << 785 +------------------------------------ << 786 + acpi-cpufreq-ondemand VS amd-pstate << 787 +------------------------------------ << 788 + acpi-cpufreq-schedutil VS amd-pstat << 789 +------------------------------------ << 790 << 791 Reference 372 Reference 792 =========== 373 =========== 793 374 794 .. [1] AMD64 Architecture Programmer's Manual 375 .. [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming, 795 https://www.amd.com/system/files/TechDo 376 https://www.amd.com/system/files/TechDocs/24593.pdf 796 377 797 .. [2] Advanced Configuration and Power Interf 378 .. [2] Advanced Configuration and Power Interface Specification, 798 https://uefi.org/sites/default/files/re 379 https://uefi.org/sites/default/files/resources/ACPI_Spec_6_4_Jan22.pdf 799 380 800 .. [3] Processor Programming Reference (PPR) f 381 .. [3] Processor Programming Reference (PPR) for AMD Family 19h Model 51h, Revision A1 Processors 801 https://www.amd.com/system/files/TechDo 382 https://www.amd.com/system/files/TechDocs/56569-A1-PUB.zip 802 << 803 .. [4] Linux Kernel Selftests, << 804 https://www.kernel.org/doc/html/latest/ <<
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.