~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/admin-guide/pm/intel_idle.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/admin-guide/pm/intel_idle.rst (Architecture mips) and /Documentation/admin-guide/pm/intel_idle.rst (Architecture sparc64)


  1 .. SPDX-License-Identifier: GPL-2.0                 1 .. SPDX-License-Identifier: GPL-2.0
  2 .. include:: <isonum.txt>                           2 .. include:: <isonum.txt>
  3                                                     3 
  4 ==============================================      4 ==============================================
  5 ``intel_idle`` CPU Idle Time Management Driver      5 ``intel_idle`` CPU Idle Time Management Driver
  6 ==============================================      6 ==============================================
  7                                                     7 
  8 :Copyright: |copy| 2020 Intel Corporation           8 :Copyright: |copy| 2020 Intel Corporation
  9                                                     9 
 10 :Author: Rafael J. Wysocki <rafael.j.wysocki@in     10 :Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
 11                                                    11 
 12                                                    12 
 13 General Information                                13 General Information
 14 ===================                                14 ===================
 15                                                    15 
 16 ``intel_idle`` is a part of the                    16 ``intel_idle`` is a part of the
 17 :doc:`CPU idle time management subsystem <cpui     17 :doc:`CPU idle time management subsystem <cpuidle>` in the Linux kernel
 18 (``CPUIdle``).  It is the default CPU idle tim     18 (``CPUIdle``).  It is the default CPU idle time management driver for the
 19 Nehalem and later generations of Intel process     19 Nehalem and later generations of Intel processors, but the level of support for
 20 a particular processor model in it depends on      20 a particular processor model in it depends on whether or not it recognizes that
 21 processor model and may also depend on informa     21 processor model and may also depend on information coming from the platform
 22 firmware.  [To understand ``intel_idle`` it is     22 firmware.  [To understand ``intel_idle`` it is necessary to know how ``CPUIdle``
 23 works in general, so this is the time to get f     23 works in general, so this is the time to get familiar with
 24 Documentation/admin-guide/pm/cpuidle.rst if yo     24 Documentation/admin-guide/pm/cpuidle.rst if you have not done that yet.]
 25                                                    25 
 26 ``intel_idle`` uses the ``MWAIT`` instruction      26 ``intel_idle`` uses the ``MWAIT`` instruction to inform the processor that the
 27 logical CPU executing it is idle and so it may     27 logical CPU executing it is idle and so it may be possible to put some of the
 28 processor's functional blocks into low-power s     28 processor's functional blocks into low-power states.  That instruction takes two
 29 arguments (passed in the ``EAX`` and ``ECX`` r     29 arguments (passed in the ``EAX`` and ``ECX`` registers of the target CPU), the
 30 first of which, referred to as a *hint*, can b     30 first of which, referred to as a *hint*, can be used by the processor to
 31 determine what can be done (for details refer      31 determine what can be done (for details refer to Intel Software Developer’s
 32 Manual [1]_).  Accordingly, ``intel_idle`` ref     32 Manual [1]_).  Accordingly, ``intel_idle`` refuses to work with processors in
 33 which the support for the ``MWAIT`` instructio     33 which the support for the ``MWAIT`` instruction has been disabled (for example,
 34 via the platform firmware configuration menu)      34 via the platform firmware configuration menu) or which do not support that
 35 instruction at all.                                35 instruction at all.
 36                                                    36 
 37 ``intel_idle`` is not modular, so it cannot be     37 ``intel_idle`` is not modular, so it cannot be unloaded, which means that the
 38 only way to pass early-configuration-time para     38 only way to pass early-configuration-time parameters to it is via the kernel
 39 command line.                                      39 command line.
 40                                                    40 
 41                                                    41 
 42 .. _intel-idle-enumeration-of-states:              42 .. _intel-idle-enumeration-of-states:
 43                                                    43 
 44 Enumeration of Idle States                         44 Enumeration of Idle States
 45 ==========================                         45 ==========================
 46                                                    46 
 47 Each ``MWAIT`` hint value is interpreted by th     47 Each ``MWAIT`` hint value is interpreted by the processor as a license to
 48 reconfigure itself in a certain way in order t     48 reconfigure itself in a certain way in order to save energy.  The processor
 49 configurations (with reduced power draw) resul     49 configurations (with reduced power draw) resulting from that are referred to
 50 as C-states (in the ACPI terminology) or idle      50 as C-states (in the ACPI terminology) or idle states.  The list of meaningful
 51 ``MWAIT`` hint values and idle states (i.e. lo     51 ``MWAIT`` hint values and idle states (i.e. low-power configurations of the
 52 processor) corresponding to them depends on th     52 processor) corresponding to them depends on the processor model and it may also
 53 depend on the configuration of the platform.       53 depend on the configuration of the platform.
 54                                                    54 
 55 In order to create a list of available idle st     55 In order to create a list of available idle states required by the ``CPUIdle``
 56 subsystem (see :ref:`idle-states-representatio     56 subsystem (see :ref:`idle-states-representation` in
 57 Documentation/admin-guide/pm/cpuidle.rst),         57 Documentation/admin-guide/pm/cpuidle.rst),
 58 ``intel_idle`` can use two sources of informat     58 ``intel_idle`` can use two sources of information: static tables of idle states
 59 for different processor models included in the     59 for different processor models included in the driver itself and the ACPI tables
 60 of the system.  The former are always used if      60 of the system.  The former are always used if the processor model at hand is
 61 recognized by ``intel_idle`` and the latter ar     61 recognized by ``intel_idle`` and the latter are used if that is required for
 62 the given processor model (which is the case f     62 the given processor model (which is the case for all server processor models
 63 recognized by ``intel_idle``) or if the proces     63 recognized by ``intel_idle``) or if the processor model is not recognized.
 64 [There is a module parameter that can be used      64 [There is a module parameter that can be used to make the driver use the ACPI
 65 tables with any processor model recognized by      65 tables with any processor model recognized by it; see
 66 `below <intel-idle-parameters_>`_.]                66 `below <intel-idle-parameters_>`_.]
 67                                                    67 
 68 If the ACPI tables are going to be used for bu     68 If the ACPI tables are going to be used for building the list of available idle
 69 states, ``intel_idle`` first looks for a ``_CS     69 states, ``intel_idle`` first looks for a ``_CST`` object under one of the ACPI
 70 objects corresponding to the CPUs in the syste     70 objects corresponding to the CPUs in the system (refer to the ACPI specification
 71 [2]_ for the description of ``_CST`` and its o     71 [2]_ for the description of ``_CST`` and its output package).  Because the
 72 ``CPUIdle`` subsystem expects that the list of     72 ``CPUIdle`` subsystem expects that the list of idle states supplied by the
 73 driver will be suitable for all of the CPUs ha     73 driver will be suitable for all of the CPUs handled by it and ``intel_idle`` is
 74 registered as the ``CPUIdle`` driver for all o     74 registered as the ``CPUIdle`` driver for all of the CPUs in the system, the
 75 driver looks for the first ``_CST`` object ret     75 driver looks for the first ``_CST`` object returning at least one valid idle
 76 state description and such that all of the idl     76 state description and such that all of the idle states included in its return
 77 package are of the FFH (Functional Fixed Hardw     77 package are of the FFH (Functional Fixed Hardware) type, which means that the
 78 ``MWAIT`` instruction is expected to be used t     78 ``MWAIT`` instruction is expected to be used to tell the processor that it can
 79 enter one of them.  The return package of that     79 enter one of them.  The return package of that ``_CST`` is then assumed to be
 80 applicable to all of the other CPUs in the sys     80 applicable to all of the other CPUs in the system and the idle state
 81 descriptions extracted from it are stored in a     81 descriptions extracted from it are stored in a preliminary list of idle states
 82 coming from the ACPI tables.  [This step is sk     82 coming from the ACPI tables.  [This step is skipped if ``intel_idle`` is
 83 configured to ignore the ACPI tables; see `bel     83 configured to ignore the ACPI tables; see `below <intel-idle-parameters_>`_.]
 84                                                    84 
 85 Next, the first (index 0) entry in the list of     85 Next, the first (index 0) entry in the list of available idle states is
 86 initialized to represent a "polling idle state     86 initialized to represent a "polling idle state" (a pseudo-idle state in which
 87 the target CPU continuously fetches and execut     87 the target CPU continuously fetches and executes instructions), and the
 88 subsequent (real) idle state entries are popul     88 subsequent (real) idle state entries are populated as follows.
 89                                                    89 
 90 If the processor model at hand is recognized b     90 If the processor model at hand is recognized by ``intel_idle``, there is a
 91 (static) table of idle state descriptions for      91 (static) table of idle state descriptions for it in the driver.  In that case,
 92 the "internal" table is the primary source of      92 the "internal" table is the primary source of information on idle states and the
 93 information from it is copied to the final lis     93 information from it is copied to the final list of available idle states.  If
 94 using the ACPI tables for the enumeration of i     94 using the ACPI tables for the enumeration of idle states is not required
 95 (depending on the processor model), all of the     95 (depending on the processor model), all of the listed idle state are enabled by
 96 default (so all of them will be taken into con     96 default (so all of them will be taken into consideration by ``CPUIdle``
 97 governors during CPU idle state selection).  O     97 governors during CPU idle state selection).  Otherwise, some of the listed idle
 98 states may not be enabled by default if there      98 states may not be enabled by default if there are no matching entries in the
 99 preliminary list of idle states coming from th     99 preliminary list of idle states coming from the ACPI tables.  In that case user
100 space still can enable them later (on a per-CP    100 space still can enable them later (on a per-CPU basis) with the help of
101 the ``disable`` idle state attribute in ``sysf    101 the ``disable`` idle state attribute in ``sysfs`` (see
102 :ref:`idle-states-representation` in              102 :ref:`idle-states-representation` in
103 Documentation/admin-guide/pm/cpuidle.rst).  Th    103 Documentation/admin-guide/pm/cpuidle.rst).  This basically means that
104 the idle states "known" to the driver may not     104 the idle states "known" to the driver may not be enabled by default if they have
105 not been exposed by the platform firmware (thr    105 not been exposed by the platform firmware (through the ACPI tables).
106                                                   106 
107 If the given processor model is not recognized    107 If the given processor model is not recognized by ``intel_idle``, but it
108 supports ``MWAIT``, the preliminary list of id    108 supports ``MWAIT``, the preliminary list of idle states coming from the ACPI
109 tables is used for building the final list tha    109 tables is used for building the final list that will be supplied to the
110 ``CPUIdle`` core during driver registration.      110 ``CPUIdle`` core during driver registration.  For each idle state in that list,
111 the description, ``MWAIT`` hint and exit laten    111 the description, ``MWAIT`` hint and exit latency are copied to the corresponding
112 entry in the final list of idle states.  The n    112 entry in the final list of idle states.  The name of the idle state represented
113 by it (to be returned by the ``name`` idle sta    113 by it (to be returned by the ``name`` idle state attribute in ``sysfs``) is
114 "CX_ACPI", where X is the index of that idle s    114 "CX_ACPI", where X is the index of that idle state in the final list (note that
115 the minimum value of X is 1, because 0 is rese    115 the minimum value of X is 1, because 0 is reserved for the "polling" state), and
116 its target residency is based on the exit late    116 its target residency is based on the exit latency value.  Specifically, for
117 C1-type idle states the exit latency value is     117 C1-type idle states the exit latency value is also used as the target residency
118 (for compatibility with the majority of the "i    118 (for compatibility with the majority of the "internal" tables of idle states for
119 various processor models recognized by ``intel    119 various processor models recognized by ``intel_idle``) and for the other idle
120 state types (C2 and C3) the target residency v    120 state types (C2 and C3) the target residency value is 3 times the exit latency
121 (again, that is because it reflects the target    121 (again, that is because it reflects the target residency to exit latency ratio
122 in the majority of cases for the processor mod    122 in the majority of cases for the processor models recognized by ``intel_idle``).
123 All of the idle states in the final list are e    123 All of the idle states in the final list are enabled by default in this case.
124                                                   124 
125                                                   125 
126 .. _intel-idle-initialization:                    126 .. _intel-idle-initialization:
127                                                   127 
128 Initialization                                    128 Initialization
129 ==============                                    129 ==============
130                                                   130 
131 The initialization of ``intel_idle`` starts wi    131 The initialization of ``intel_idle`` starts with checking if the kernel command
132 line options forbid the use of the ``MWAIT`` i    132 line options forbid the use of the ``MWAIT`` instruction.  If that is the case,
133 an error code is returned right away.             133 an error code is returned right away.
134                                                   134 
135 The next step is to check whether or not the p    135 The next step is to check whether or not the processor model is known to the
136 driver, which determines the idle states enume    136 driver, which determines the idle states enumeration method (see
137 `above <intel-idle-enumeration-of-states_>`_),    137 `above <intel-idle-enumeration-of-states_>`_), and whether or not the processor
138 supports ``MWAIT`` (the initialization fails i    138 supports ``MWAIT`` (the initialization fails if that is not the case).  Then,
139 the ``MWAIT`` support in the processor is enum    139 the ``MWAIT`` support in the processor is enumerated through ``CPUID`` and the
140 driver initialization fails if the level of su    140 driver initialization fails if the level of support is not as expected (for
141 example, if the total number of ``MWAIT`` subs    141 example, if the total number of ``MWAIT`` substates returned is 0).
142                                                   142 
143 Next, if the driver is not configured to ignor    143 Next, if the driver is not configured to ignore the ACPI tables (see
144 `below <intel-idle-parameters_>`_), the idle s    144 `below <intel-idle-parameters_>`_), the idle states information provided by the
145 platform firmware is extracted from them.         145 platform firmware is extracted from them.
146                                                   146 
147 Then, ``CPUIdle`` device objects are allocated    147 Then, ``CPUIdle`` device objects are allocated for all CPUs and the list of
148 available idle states is created as explained     148 available idle states is created as explained
149 `above <intel-idle-enumeration-of-states_>`_.     149 `above <intel-idle-enumeration-of-states_>`_.
150                                                   150 
151 Finally, ``intel_idle`` is registered with the    151 Finally, ``intel_idle`` is registered with the help of cpuidle_register_driver()
152 as the ``CPUIdle`` driver for all CPUs in the     152 as the ``CPUIdle`` driver for all CPUs in the system and a CPU online callback
153 for configuring individual CPUs is registered     153 for configuring individual CPUs is registered via cpuhp_setup_state(), which
154 (among other things) causes the callback routi    154 (among other things) causes the callback routine to be invoked for all of the
155 CPUs present in the system at that time (each     155 CPUs present in the system at that time (each CPU executes its own instance of
156 the callback routine).  That routine registers    156 the callback routine).  That routine registers a ``CPUIdle`` device for the CPU
157 running it (which enables the ``CPUIdle`` subs    157 running it (which enables the ``CPUIdle`` subsystem to operate that CPU) and
158 optionally performs some CPU-specific initiali    158 optionally performs some CPU-specific initialization actions that may be
159 required for the given processor model.           159 required for the given processor model.
160                                                   160 
161                                                   161 
162 .. _intel-idle-parameters:                        162 .. _intel-idle-parameters:
163                                                   163 
164 Kernel Command Line Options and Module Paramet    164 Kernel Command Line Options and Module Parameters
165 ==============================================    165 =================================================
166                                                   166 
167 The *x86* architecture support code recognizes    167 The *x86* architecture support code recognizes three kernel command line
168 options related to CPU idle time management: `    168 options related to CPU idle time management: ``idle=poll``, ``idle=halt``,
169 and ``idle=nomwait``.  If any of them is prese    169 and ``idle=nomwait``.  If any of them is present in the kernel command line, the
170 ``MWAIT`` instruction is not allowed to be use    170 ``MWAIT`` instruction is not allowed to be used, so the initialization of
171 ``intel_idle`` will fail.                         171 ``intel_idle`` will fail.
172                                                   172 
173 Apart from that there are five module paramete    173 Apart from that there are five module parameters recognized by ``intel_idle``
174 itself that can be set via the kernel command     174 itself that can be set via the kernel command line (they cannot be updated via
175 sysfs, so that is the only way to change their    175 sysfs, so that is the only way to change their values).
176                                                   176 
177 The ``max_cstate`` parameter value is the maxi    177 The ``max_cstate`` parameter value is the maximum idle state index in the list
178 of idle states supplied to the ``CPUIdle`` cor    178 of idle states supplied to the ``CPUIdle`` core during the registration of the
179 driver.  It is also the maximum number of regu    179 driver.  It is also the maximum number of regular (non-polling) idle states that
180 can be used by ``intel_idle``, so the enumerat    180 can be used by ``intel_idle``, so the enumeration of idle states is terminated
181 after finding that number of usable idle state    181 after finding that number of usable idle states (the other idle states that
182 potentially might have been used if ``max_csta    182 potentially might have been used if ``max_cstate`` had been greater are not
183 taken into consideration at all).  Setting ``m    183 taken into consideration at all).  Setting ``max_cstate`` can prevent
184 ``intel_idle`` from exposing idle states that     184 ``intel_idle`` from exposing idle states that are regarded as "too deep" for
185 some reason to the ``CPUIdle`` core, but it do    185 some reason to the ``CPUIdle`` core, but it does so by making them effectively
186 invisible until the system is shut down and st    186 invisible until the system is shut down and started again which may not always
187 be desirable.  In practice, it is only really     187 be desirable.  In practice, it is only really necessary to do that if the idle
188 states in question cannot be enabled during sy    188 states in question cannot be enabled during system startup, because in the
189 working state of the system the CPU power mana    189 working state of the system the CPU power management quality of service (PM
190 QoS) feature can be used to prevent ``CPUIdle`    190 QoS) feature can be used to prevent ``CPUIdle`` from touching those idle states
191 even if they have been enumerated (see :ref:`c    191 even if they have been enumerated (see :ref:`cpu-pm-qos` in
192 Documentation/admin-guide/pm/cpuidle.rst).        192 Documentation/admin-guide/pm/cpuidle.rst).
193 Setting ``max_cstate`` to 0 causes the ``intel    193 Setting ``max_cstate`` to 0 causes the ``intel_idle`` initialization to fail.
194                                                   194 
195 The ``no_acpi`` and ``use_acpi`` module parame    195 The ``no_acpi`` and ``use_acpi`` module parameters (recognized by ``intel_idle``
196 if the kernel has been configured with ACPI su    196 if the kernel has been configured with ACPI support) can be set to make the
197 driver ignore the system's ACPI tables entirel    197 driver ignore the system's ACPI tables entirely or use them for all of the
198 recognized processor models, respectively (the    198 recognized processor models, respectively (they both are unset by default and
199 ``use_acpi`` has no effect if ``no_acpi`` is s    199 ``use_acpi`` has no effect if ``no_acpi`` is set).
200                                                   200 
201 The value of the ``states_off`` module paramet    201 The value of the ``states_off`` module parameter (0 by default) represents a
202 list of idle states to be disabled by default     202 list of idle states to be disabled by default in the form of a bitmask.
203                                                   203 
204 Namely, the positions of the bits that are set    204 Namely, the positions of the bits that are set in the ``states_off`` value are
205 the indices of idle states to be disabled by d    205 the indices of idle states to be disabled by default (as reflected by the names
206 of the corresponding idle state directories in    206 of the corresponding idle state directories in ``sysfs``, :file:`state0`,
207 :file:`state1` ... :file:`state<i>` ..., where    207 :file:`state1` ... :file:`state<i>` ..., where ``<i>`` is the index of the given
208 idle state; see :ref:`idle-states-representati    208 idle state; see :ref:`idle-states-representation` in
209 Documentation/admin-guide/pm/cpuidle.rst).        209 Documentation/admin-guide/pm/cpuidle.rst).
210                                                   210 
211 For example, if ``states_off`` is equal to 3,     211 For example, if ``states_off`` is equal to 3, the driver will disable idle
212 states 0 and 1 by default, and if it is equal     212 states 0 and 1 by default, and if it is equal to 8, idle state 3 will be
213 disabled by default and so on (bit positions b    213 disabled by default and so on (bit positions beyond the maximum idle state index
214 are ignored).                                     214 are ignored).
215                                                   215 
216 The idle states disabled this way can be enabl    216 The idle states disabled this way can be enabled (on a per-CPU basis) from user
217 space via ``sysfs``.                              217 space via ``sysfs``.
218                                                   218 
219 The ``ibrs_off`` module parameter is a boolean    219 The ``ibrs_off`` module parameter is a boolean flag (defaults to
220 false). If set, it is used to control if IBRS     220 false). If set, it is used to control if IBRS (Indirect Branch Restricted
221 Speculation) should be turned off when the CPU    221 Speculation) should be turned off when the CPU enters an idle state.
222 This flag does not affect CPUs that use Enhanc    222 This flag does not affect CPUs that use Enhanced IBRS which can remain
223 on with little performance impact.                223 on with little performance impact.
224                                                   224 
225 For some CPUs, IBRS will be selected as mitiga    225 For some CPUs, IBRS will be selected as mitigation for Spectre v2 and Retbleed
226 security vulnerabilities by default.  Leaving     226 security vulnerabilities by default.  Leaving the IBRS mode on while idling may
227 have a performance impact on its sibling CPU.     227 have a performance impact on its sibling CPU.  The IBRS mode will be turned off
228 by default when the CPU enters into a deep idl    228 by default when the CPU enters into a deep idle state, but not in some
229 shallower ones.  Setting the ``ibrs_off`` modu    229 shallower ones.  Setting the ``ibrs_off`` module parameter will force the IBRS
230 mode to off when the CPU is in any one of the     230 mode to off when the CPU is in any one of the available idle states.  This may
231 help performance of a sibling CPU at the expen    231 help performance of a sibling CPU at the expense of a slightly higher wakeup
232 latency for the idle CPU.                         232 latency for the idle CPU.
233                                                   233 
234                                                   234 
235 .. _intel-idle-core-and-package-idle-states:      235 .. _intel-idle-core-and-package-idle-states:
236                                                   236 
237 Core and Package Levels of Idle States            237 Core and Package Levels of Idle States
238 ======================================            238 ======================================
239                                                   239 
240 Typically, in a processor supporting the ``MWA    240 Typically, in a processor supporting the ``MWAIT`` instruction there are (at
241 least) two levels of idle states (or C-states)    241 least) two levels of idle states (or C-states).  One level, referred to as
242 "core C-states", covers individual cores in th    242 "core C-states", covers individual cores in the processor, whereas the other
243 level, referred to as "package C-states", cove    243 level, referred to as "package C-states", covers the entire processor package
244 and it may also involve other components of th    244 and it may also involve other components of the system (GPUs, memory
245 controllers, I/O hubs etc.).                      245 controllers, I/O hubs etc.).
246                                                   246 
247 Some of the ``MWAIT`` hint values allow the pr    247 Some of the ``MWAIT`` hint values allow the processor to use core C-states only
248 (most importantly, that is the case for the ``    248 (most importantly, that is the case for the ``MWAIT`` hint value corresponding
249 to the ``C1`` idle state), but the majority of    249 to the ``C1`` idle state), but the majority of them give it a license to put
250 the target core (i.e. the core containing the     250 the target core (i.e. the core containing the logical CPU executing ``MWAIT``
251 with the given hint value) into a specific cor    251 with the given hint value) into a specific core C-state and then (if possible)
252 to enter a specific package C-state at the dee    252 to enter a specific package C-state at the deeper level.  For example, the
253 ``MWAIT`` hint value representing the ``C3`` i    253 ``MWAIT`` hint value representing the ``C3`` idle state allows the processor to
254 put the target core into the low-power state r    254 put the target core into the low-power state referred to as "core ``C3``" (or
255 ``CC3``), which happens if all of the logical     255 ``CC3``), which happens if all of the logical CPUs (SMT siblings) in that core
256 have executed ``MWAIT`` with the ``C3`` hint v    256 have executed ``MWAIT`` with the ``C3`` hint value (or with a hint value
257 representing a deeper idle state), and in addi    257 representing a deeper idle state), and in addition to that (in the majority of
258 cases) it gives the processor a license to put    258 cases) it gives the processor a license to put the entire package (possibly
259 including some non-CPU components such as a GP    259 including some non-CPU components such as a GPU or a memory controller) into the
260 low-power state referred to as "package ``C3``    260 low-power state referred to as "package ``C3``" (or ``PC3``), which happens if
261 all of the cores have gone into the ``CC3`` st    261 all of the cores have gone into the ``CC3`` state and (possibly) some additional
262 conditions are satisfied (for instance, if the    262 conditions are satisfied (for instance, if the GPU is covered by ``PC3``, it may
263 be required to be in a certain GPU-specific lo    263 be required to be in a certain GPU-specific low-power state for ``PC3`` to be
264 reachable).                                       264 reachable).
265                                                   265 
266 As a rule, there is no simple way to make the     266 As a rule, there is no simple way to make the processor use core C-states only
267 if the conditions for entering the correspondi    267 if the conditions for entering the corresponding package C-states are met, so
268 the logical CPU executing ``MWAIT`` with a hin    268 the logical CPU executing ``MWAIT`` with a hint value that is not core-level
269 only (like for ``C1``) must always assume that    269 only (like for ``C1``) must always assume that this may cause the processor to
270 enter a package C-state.  [That is why the exi    270 enter a package C-state.  [That is why the exit latency and target residency
271 values corresponding to the majority of ``MWAI    271 values corresponding to the majority of ``MWAIT`` hint values in the "internal"
272 tables of idle states in ``intel_idle`` reflec    272 tables of idle states in ``intel_idle`` reflect the properties of package
273 C-states.]  If using package C-states is not d    273 C-states.]  If using package C-states is not desirable at all, either
274 :ref:`PM QoS <cpu-pm-qos>` or the ``max_cstate    274 :ref:`PM QoS <cpu-pm-qos>` or the ``max_cstate`` module parameter of
275 ``intel_idle`` described `above <intel-idle-pa    275 ``intel_idle`` described `above <intel-idle-parameters_>`_ must be used to
276 restrict the range of permissible idle states     276 restrict the range of permissible idle states to the ones with core-level only
277 ``MWAIT`` hint values (like ``C1``).              277 ``MWAIT`` hint values (like ``C1``).
278                                                   278 
279                                                   279 
280 References                                        280 References
281 ==========                                        281 ==========
282                                                   282 
283 .. [1] *Intel® 64 and IA-32 Architectures Sof    283 .. [1] *Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2B*,
284        https://www.intel.com/content/www/us/en    284        https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-2b-manual.html
285                                                   285 
286 .. [2] *Advanced Configuration and Power Inter    286 .. [2] *Advanced Configuration and Power Interface (ACPI) Specification*,
287        https://uefi.org/specifications            287        https://uefi.org/specifications
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php