~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/driver-api/pm/devices.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/driver-api/pm/devices.rst (Version linux-6.12-rc7) and /Documentation/driver-api/pm/devices.rst (Version linux-4.14.336)


  1 .. SPDX-License-Identifier: GPL-2.0            !!   1 .. |struct dev_pm_ops| replace:: :c:type:`struct dev_pm_ops <dev_pm_ops>`
  2 .. include:: <isonum.txt>                      !!   2 .. |struct dev_pm_domain| replace:: :c:type:`struct dev_pm_domain <dev_pm_domain>`
  3                                                !!   3 .. |struct bus_type| replace:: :c:type:`struct bus_type <bus_type>`
  4 .. _driverapi_pm_devices:                      !!   4 .. |struct device_type| replace:: :c:type:`struct device_type <device_type>`
                                                   >>   5 .. |struct class| replace:: :c:type:`struct class <class>`
                                                   >>   6 .. |struct wakeup_source| replace:: :c:type:`struct wakeup_source <wakeup_source>`
                                                   >>   7 .. |struct device| replace:: :c:type:`struct device <device>`
  5                                                     8 
  6 ==============================                      9 ==============================
  7 Device Power Management Basics                     10 Device Power Management Basics
  8 ==============================                     11 ==============================
  9                                                    12 
 10 :Copyright: |copy| 2010-2011 Rafael J. Wysocki< !!  13 ::
 11 :Copyright: |copy| 2010 Alan Stern <stern@rowla << 
 12 :Copyright: |copy| 2016 Intel Corporation      << 
 13                                                << 
 14 :Author: Rafael J. Wysocki <rafael.j.wysocki@in << 
 15                                                    14 
                                                   >>  15  Copyright (c) 2010-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
                                                   >>  16  Copyright (c) 2010 Alan Stern <stern@rowland.harvard.edu>
                                                   >>  17  Copyright (c) 2016 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com>
 16                                                    18 
 17 Most of the code in Linux is device drivers, s     19 Most of the code in Linux is device drivers, so most of the Linux power
 18 management (PM) code is also driver-specific.      20 management (PM) code is also driver-specific.  Most drivers will do very
 19 little; others, especially for platforms with      21 little; others, especially for platforms with small batteries (like cell
 20 phones), will do a lot.                            22 phones), will do a lot.
 21                                                    23 
 22 This writeup gives an overview of how drivers      24 This writeup gives an overview of how drivers interact with system-wide
 23 power management goals, emphasizing the models     25 power management goals, emphasizing the models and interfaces that are
 24 shared by everything that hooks up to the driv     26 shared by everything that hooks up to the driver model core.  Read it as
 25 background for the domain-specific work you'd      27 background for the domain-specific work you'd do with any specific driver.
 26                                                    28 
 27                                                    29 
 28 Two Models for Device Power Management             30 Two Models for Device Power Management
 29 ======================================             31 ======================================
 30                                                    32 
 31 Drivers will use one or both of these models t     33 Drivers will use one or both of these models to put devices into low-power
 32 states:                                            34 states:
 33                                                    35 
 34     System Sleep model:                            36     System Sleep model:
 35                                                    37 
 36         Drivers can enter low-power states as      38         Drivers can enter low-power states as part of entering system-wide
 37         low-power states like "suspend" (also      39         low-power states like "suspend" (also known as "suspend-to-RAM"), or
 38         (mostly for systems with disks) "hiber     40         (mostly for systems with disks) "hibernation" (also known as
 39         "suspend-to-disk").                        41         "suspend-to-disk").
 40                                                    42 
 41         This is something that device, bus, an     43         This is something that device, bus, and class drivers collaborate on
 42         by implementing various role-specific      44         by implementing various role-specific suspend and resume methods to
 43         cleanly power down hardware and softwa     45         cleanly power down hardware and software subsystems, then reactivate
 44         them without loss of data.                 46         them without loss of data.
 45                                                    47 
 46         Some drivers can manage hardware wakeu     48         Some drivers can manage hardware wakeup events, which make the system
 47         leave the low-power state.  This featu     49         leave the low-power state.  This feature may be enabled or disabled
 48         using the relevant :file:`/sys/devices     50         using the relevant :file:`/sys/devices/.../power/wakeup` file (for
 49         Ethernet drivers the ioctl interface u     51         Ethernet drivers the ioctl interface used by ethtool may also be used
 50         for this purpose); enabling it may cos     52         for this purpose); enabling it may cost some power usage, but let the
 51         whole system enter low-power states mo     53         whole system enter low-power states more often.
 52                                                    54 
 53     Runtime Power Management model:                55     Runtime Power Management model:
 54                                                    56 
 55         Devices may also be put into low-power     57         Devices may also be put into low-power states while the system is
 56         running, independently of other power      58         running, independently of other power management activity in principle.
 57         However, devices are not generally ind     59         However, devices are not generally independent of each other (for
 58         example, a parent device cannot be sus     60         example, a parent device cannot be suspended unless all of its child
 59         devices have been suspended).  Moreove     61         devices have been suspended).  Moreover, depending on the bus type the
 60         device is on, it may be necessary to c     62         device is on, it may be necessary to carry out some bus-specific
 61         operations on the device for this purp     63         operations on the device for this purpose.  Devices put into low power
 62         states at run time may require special     64         states at run time may require special handling during system-wide power
 63         transitions (suspend or hibernation).      65         transitions (suspend or hibernation).
 64                                                    66 
 65         For these reasons not only the device      67         For these reasons not only the device driver itself, but also the
 66         appropriate subsystem (bus type, devic     68         appropriate subsystem (bus type, device type or device class) driver and
 67         the PM core are involved in runtime po     69         the PM core are involved in runtime power management.  As in the system
 68         sleep power management case, they need     70         sleep power management case, they need to collaborate by implementing
 69         various role-specific suspend and resu     71         various role-specific suspend and resume methods, so that the hardware
 70         is cleanly powered down and reactivate     72         is cleanly powered down and reactivated without data or service loss.
 71                                                    73 
 72 There's not a lot to be said about those low-p     74 There's not a lot to be said about those low-power states except that they are
 73 very system-specific, and often device-specifi     75 very system-specific, and often device-specific.  Also, that if enough devices
 74 have been put into low-power states (at runtim     76 have been put into low-power states (at runtime), the effect may be very similar
 75 to entering some system-wide low-power state (     77 to entering some system-wide low-power state (system sleep) ... and that
 76 synergies exist, so that several drivers using     78 synergies exist, so that several drivers using runtime PM might put the system
 77 into a state where even deeper power saving op     79 into a state where even deeper power saving options are available.
 78                                                    80 
 79 Most suspended devices will have quiesced all      81 Most suspended devices will have quiesced all I/O: no more DMA or IRQs (except
 80 for wakeup events), no more data read or writt     82 for wakeup events), no more data read or written, and requests from upstream
 81 drivers are no longer accepted.  A given bus o     83 drivers are no longer accepted.  A given bus or platform may have different
 82 requirements though.                               84 requirements though.
 83                                                    85 
 84 Examples of hardware wakeup events include an      86 Examples of hardware wakeup events include an alarm from a real time clock,
 85 network wake-on-LAN packets, keyboard or mouse     87 network wake-on-LAN packets, keyboard or mouse activity, and media insertion
 86 or removal (for PCMCIA, MMC/SD, USB, and so on     88 or removal (for PCMCIA, MMC/SD, USB, and so on).
 87                                                    89 
 88 Interfaces for Entering System Sleep States        90 Interfaces for Entering System Sleep States
 89 ===========================================        91 ===========================================
 90                                                    92 
 91 There are programming interfaces provided for      93 There are programming interfaces provided for subsystems (bus type, device type,
 92 device class) and device drivers to allow them     94 device class) and device drivers to allow them to participate in the power
 93 management of devices they are concerned with.     95 management of devices they are concerned with.  These interfaces cover both
 94 system sleep and runtime power management.         96 system sleep and runtime power management.
 95                                                    97 
 96                                                    98 
 97 Device Power Management Operations                 99 Device Power Management Operations
 98 ----------------------------------                100 ----------------------------------
 99                                                   101 
100 Device power management operations, at the sub    102 Device power management operations, at the subsystem level as well as at the
101 device driver level, are implemented by defini    103 device driver level, are implemented by defining and populating objects of type
102 struct dev_pm_ops defined in :file:`include/li !! 104 |struct dev_pm_ops| defined in :file:`include/linux/pm.h`.  The roles of the
103 methods included in it will be explained in wh    105 methods included in it will be explained in what follows.  For now, it should be
104 sufficient to remember that the last three met    106 sufficient to remember that the last three methods are specific to runtime power
105 management while the remaining ones are used d    107 management while the remaining ones are used during system-wide power
106 transitions.                                      108 transitions.
107                                                   109 
108 There also is a deprecated "old" or "legacy" i    110 There also is a deprecated "old" or "legacy" interface for power management
109 operations available at least for some subsyst    111 operations available at least for some subsystems.  This approach does not use
110 struct dev_pm_ops objects and it is suitable o !! 112 |struct dev_pm_ops| objects and it is suitable only for implementing system
111 sleep power management methods in a limited wa    113 sleep power management methods in a limited way.  Therefore it is not described
112 in this document, so please refer directly to     114 in this document, so please refer directly to the source code for more
113 information about it.                             115 information about it.
114                                                   116 
115                                                   117 
116 Subsystem-Level Methods                           118 Subsystem-Level Methods
117 -----------------------                           119 -----------------------
118                                                   120 
119 The core methods to suspend and resume devices    121 The core methods to suspend and resume devices reside in
120 struct dev_pm_ops pointed to by the :c:member: !! 122 |struct dev_pm_ops| pointed to by the :c:member:`ops` member of
121 struct dev_pm_domain, or by the :c:member:`pm` !! 123 |struct dev_pm_domain|, or by the :c:member:`pm` member of |struct bus_type|,
122 struct device_type and struct class.  They are !! 124 |struct device_type| and |struct class|.  They are mostly of interest to the
123 people writing infrastructure for platforms an    125 people writing infrastructure for platforms and buses, like PCI or USB, or
124 device type and device class drivers.  They al    126 device type and device class drivers.  They also are relevant to the writers of
125 device drivers whose subsystems (PM domains, d    127 device drivers whose subsystems (PM domains, device types, device classes and
126 bus types) don't provide all power management     128 bus types) don't provide all power management methods.
127                                                   129 
128 Bus drivers implement these methods as appropr    130 Bus drivers implement these methods as appropriate for the hardware and the
129 drivers using it; PCI works differently from U    131 drivers using it; PCI works differently from USB, and so on.  Not many people
130 write subsystem-level drivers; most driver cod    132 write subsystem-level drivers; most driver code is a "device driver" that builds
131 on top of bus-specific framework code.            133 on top of bus-specific framework code.
132                                                   134 
133 For more information on these driver calls, se    135 For more information on these driver calls, see the description later;
134 they are called in phases for every device, re    136 they are called in phases for every device, respecting the parent-child
135 sequencing in the driver model tree.              137 sequencing in the driver model tree.
136                                                   138 
137                                                   139 
138 :file:`/sys/devices/.../power/wakeup` files       140 :file:`/sys/devices/.../power/wakeup` files
139 -------------------------------------------       141 -------------------------------------------
140                                                   142 
141 All device objects in the driver model contain    143 All device objects in the driver model contain fields that control the handling
142 of system wakeup events (hardware signals that    144 of system wakeup events (hardware signals that can force the system out of a
143 sleep state).  These fields are initialized by    145 sleep state).  These fields are initialized by bus or device driver code using
144 :c:func:`device_set_wakeup_capable()` and :c:f    146 :c:func:`device_set_wakeup_capable()` and :c:func:`device_set_wakeup_enable()`,
145 defined in :file:`include/linux/pm_wakeup.h`.     147 defined in :file:`include/linux/pm_wakeup.h`.
146                                                   148 
147 The :c:member:`power.can_wakeup` flag just rec    149 The :c:member:`power.can_wakeup` flag just records whether the device (and its
148 driver) can physically support wakeup events.     150 driver) can physically support wakeup events.  The
149 :c:func:`device_set_wakeup_capable()` routine     151 :c:func:`device_set_wakeup_capable()` routine affects this flag.  The
150 :c:member:`power.wakeup` field is a pointer to    152 :c:member:`power.wakeup` field is a pointer to an object of type
151 struct wakeup_source used for controlling whet !! 153 |struct wakeup_source| used for controlling whether or not the device should use
152 its system wakeup mechanism and for notifying     154 its system wakeup mechanism and for notifying the PM core of system wakeup
153 events signaled by the device.  This object is    155 events signaled by the device.  This object is only present for wakeup-capable
154 devices (i.e. devices whose :c:member:`can_wak    156 devices (i.e. devices whose :c:member:`can_wakeup` flags are set) and is created
155 (or removed) by :c:func:`device_set_wakeup_cap    157 (or removed) by :c:func:`device_set_wakeup_capable()`.
156                                                   158 
157 Whether or not a device is capable of issuing     159 Whether or not a device is capable of issuing wakeup events is a hardware
158 matter, and the kernel is responsible for keep    160 matter, and the kernel is responsible for keeping track of it.  By contrast,
159 whether or not a wakeup-capable device should     161 whether or not a wakeup-capable device should issue wakeup events is a policy
160 decision, and it is managed by user space thro    162 decision, and it is managed by user space through a sysfs attribute: the
161 :file:`power/wakeup` file.  User space can wri    163 :file:`power/wakeup` file.  User space can write the "enabled" or "disabled"
162 strings to it to indicate whether or not, resp    164 strings to it to indicate whether or not, respectively, the device is supposed
163 to signal system wakeup.  This file is only pr    165 to signal system wakeup.  This file is only present if the
164 :c:member:`power.wakeup` object exists for the    166 :c:member:`power.wakeup` object exists for the given device and is created (or
165 removed) along with that object, by :c:func:`d    167 removed) along with that object, by :c:func:`device_set_wakeup_capable()`.
166 Reads from the file will return the correspond    168 Reads from the file will return the corresponding string.
167                                                   169 
168 The initial value in the :file:`power/wakeup`     170 The initial value in the :file:`power/wakeup` file is "disabled" for the
169 majority of devices; the major exceptions are     171 majority of devices; the major exceptions are power buttons, keyboards, and
170 Ethernet adapters whose WoL (wake-on-LAN) feat    172 Ethernet adapters whose WoL (wake-on-LAN) feature has been set up with ethtool.
171 It should also default to "enabled" for device    173 It should also default to "enabled" for devices that don't generate wakeup
172 requests on their own but merely forward wakeu    174 requests on their own but merely forward wakeup requests from one bus to another
173 (like PCI Express ports).                         175 (like PCI Express ports).
174                                                   176 
175 The :c:func:`device_may_wakeup()` routine retu    177 The :c:func:`device_may_wakeup()` routine returns true only if the
176 :c:member:`power.wakeup` object exists and the    178 :c:member:`power.wakeup` object exists and the corresponding :file:`power/wakeup`
177 file contains the "enabled" string.  This info    179 file contains the "enabled" string.  This information is used by subsystems,
178 like the PCI bus type code, to see whether or     180 like the PCI bus type code, to see whether or not to enable the devices' wakeup
179 mechanisms.  If device wakeup mechanisms are e    181 mechanisms.  If device wakeup mechanisms are enabled or disabled directly by
180 drivers, they also should use :c:func:`device_    182 drivers, they also should use :c:func:`device_may_wakeup()` to decide what to do
181 during a system sleep transition.  Device driv    183 during a system sleep transition.  Device drivers, however, are not expected to
182 call :c:func:`device_set_wakeup_enable()` dire    184 call :c:func:`device_set_wakeup_enable()` directly in any case.
183                                                   185 
184 It ought to be noted that system wakeup is con    186 It ought to be noted that system wakeup is conceptually different from "remote
185 wakeup" used by runtime power management, alth    187 wakeup" used by runtime power management, although it may be supported by the
186 same physical mechanism.  Remote wakeup is a f    188 same physical mechanism.  Remote wakeup is a feature allowing devices in
187 low-power states to trigger specific interrupt    189 low-power states to trigger specific interrupts to signal conditions in which
188 they should be put into the full-power state.     190 they should be put into the full-power state.  Those interrupts may or may not
189 be used to signal system wakeup events, depend    191 be used to signal system wakeup events, depending on the hardware design.  On
190 some systems it is impossible to trigger them     192 some systems it is impossible to trigger them from system sleep states.  In any
191 case, remote wakeup should always be enabled f    193 case, remote wakeup should always be enabled for runtime power management for
192 all devices and drivers that support it.          194 all devices and drivers that support it.
193                                                   195 
194                                                   196 
195 :file:`/sys/devices/.../power/control` files      197 :file:`/sys/devices/.../power/control` files
196 --------------------------------------------      198 --------------------------------------------
197                                                   199 
198 Each device in the driver model has a flag to     200 Each device in the driver model has a flag to control whether it is subject to
199 runtime power management.  This flag, :c:membe    201 runtime power management.  This flag, :c:member:`runtime_auto`, is initialized
200 by the bus type (or generally subsystem) code     202 by the bus type (or generally subsystem) code using :c:func:`pm_runtime_allow()`
201 or :c:func:`pm_runtime_forbid()`; the default     203 or :c:func:`pm_runtime_forbid()`; the default is to allow runtime power
202 management.                                       204 management.
203                                                   205 
204 The setting can be adjusted by user space by w    206 The setting can be adjusted by user space by writing either "on" or "auto" to
205 the device's :file:`power/control` sysfs file.    207 the device's :file:`power/control` sysfs file.  Writing "auto" calls
206 :c:func:`pm_runtime_allow()`, setting the flag    208 :c:func:`pm_runtime_allow()`, setting the flag and allowing the device to be
207 runtime power-managed by its driver.  Writing     209 runtime power-managed by its driver.  Writing "on" calls
208 :c:func:`pm_runtime_forbid()`, clearing the fl    210 :c:func:`pm_runtime_forbid()`, clearing the flag, returning the device to full
209 power if it was in a low-power state, and prev    211 power if it was in a low-power state, and preventing the
210 device from being runtime power-managed.  User    212 device from being runtime power-managed.  User space can check the current value
211 of the :c:member:`runtime_auto` flag by readin    213 of the :c:member:`runtime_auto` flag by reading that file.
212                                                   214 
213 The device's :c:member:`runtime_auto` flag has    215 The device's :c:member:`runtime_auto` flag has no effect on the handling of
214 system-wide power transitions.  In particular,    216 system-wide power transitions.  In particular, the device can (and in the
215 majority of cases should and will) be put into    217 majority of cases should and will) be put into a low-power state during a
216 system-wide transition to a sleep state even t    218 system-wide transition to a sleep state even though its :c:member:`runtime_auto`
217 flag is clear.                                    219 flag is clear.
218                                                   220 
219 For more information about the runtime power m    221 For more information about the runtime power management framework, refer to
220 Documentation/power/runtime_pm.rst.            !! 222 :file:`Documentation/power/runtime_pm.txt`.
221                                                   223 
222                                                   224 
223 Calling Drivers to Enter and Leave System Slee    225 Calling Drivers to Enter and Leave System Sleep States
224 ==============================================    226 ======================================================
225                                                   227 
226 When the system goes into a sleep state, each     228 When the system goes into a sleep state, each device's driver is asked to
227 suspend the device by putting it into a state     229 suspend the device by putting it into a state compatible with the target
228 system state.  That's usually some version of     230 system state.  That's usually some version of "off", but the details are
229 system-specific.  Also, wakeup-enabled devices    231 system-specific.  Also, wakeup-enabled devices will usually stay partly
230 functional in order to wake the system.           232 functional in order to wake the system.
231                                                   233 
232 When the system leaves that low-power state, t    234 When the system leaves that low-power state, the device's driver is asked to
233 resume it by returning it to full power.  The     235 resume it by returning it to full power.  The suspend and resume operations
234 always go together, and both are multi-phase o    236 always go together, and both are multi-phase operations.
235                                                   237 
236 For simple drivers, suspend might quiesce the     238 For simple drivers, suspend might quiesce the device using class code
237 and then turn its hardware as "off" as possibl    239 and then turn its hardware as "off" as possible during suspend_noirq.  The
238 matching resume calls would then completely re    240 matching resume calls would then completely reinitialize the hardware
239 before reactivating its class I/O queues.         241 before reactivating its class I/O queues.
240                                                   242 
241 More power-aware drivers might prepare the dev    243 More power-aware drivers might prepare the devices for triggering system wakeup
242 events.                                           244 events.
243                                                   245 
244                                                   246 
245 Call Sequence Guarantees                          247 Call Sequence Guarantees
246 ------------------------                          248 ------------------------
247                                                   249 
248 To ensure that bridges and similar links needi    250 To ensure that bridges and similar links needing to talk to a device are
249 available when the device is suspended or resu    251 available when the device is suspended or resumed, the device hierarchy is
250 walked in a bottom-up order to suspend devices    252 walked in a bottom-up order to suspend devices.  A top-down order is
251 used to resume those devices.                     253 used to resume those devices.
252                                                   254 
253 The ordering of the device hierarchy is define    255 The ordering of the device hierarchy is defined by the order in which devices
254 get registered:  a child can never be register    256 get registered:  a child can never be registered, probed or resumed before
255 its parent; and can't be removed or suspended     257 its parent; and can't be removed or suspended after that parent.
256                                                   258 
257 The policy is that the device hierarchy should    259 The policy is that the device hierarchy should match hardware bus topology.
258 [Or at least the control bus, for devices whic    260 [Or at least the control bus, for devices which use multiple busses.]
259 In particular, this means that a device regist    261 In particular, this means that a device registration may fail if the parent of
260 the device is suspending (i.e. has been chosen    262 the device is suspending (i.e. has been chosen by the PM core as the next
261 device to suspend) or has already suspended, a    263 device to suspend) or has already suspended, as well as after all of the other
262 devices have been suspended.  Device drivers m    264 devices have been suspended.  Device drivers must be prepared to cope with such
263 situations.                                       265 situations.
264                                                   266 
265                                                   267 
266 System Power Management Phases                    268 System Power Management Phases
267 ------------------------------                    269 ------------------------------
268                                                   270 
269 Suspending or resuming the system is done in s    271 Suspending or resuming the system is done in several phases.  Different phases
270 are used for suspend-to-idle, shallow (standby    272 are used for suspend-to-idle, shallow (standby), and deep ("suspend-to-RAM")
271 sleep states and the hibernation state ("suspe    273 sleep states and the hibernation state ("suspend-to-disk").  Each phase involves
272 executing callbacks for every device before th    274 executing callbacks for every device before the next phase begins.  Not all
273 buses or classes support all these callbacks a    275 buses or classes support all these callbacks and not all drivers use all the
274 callbacks.  The various phases always run afte    276 callbacks.  The various phases always run after tasks have been frozen and
275 before they are unfrozen.  Furthermore, the `` !! 277 before they are unfrozen.  Furthermore, the ``*_noirq phases`` run at a time
276 when IRQ handlers have been disabled (except f    278 when IRQ handlers have been disabled (except for those marked with the
277 IRQF_NO_SUSPEND flag).                            279 IRQF_NO_SUSPEND flag).
278                                                   280 
279 All phases use PM domain, bus, type, class or     281 All phases use PM domain, bus, type, class or driver callbacks (that is, methods
280 defined in ``dev->pm_domain->ops``, ``dev->bus    282 defined in ``dev->pm_domain->ops``, ``dev->bus->pm``, ``dev->type->pm``,
281 ``dev->class->pm`` or ``dev->driver->pm``).  T    283 ``dev->class->pm`` or ``dev->driver->pm``).  These callbacks are regarded by the
282 PM core as mutually exclusive.  Moreover, PM d    284 PM core as mutually exclusive.  Moreover, PM domain callbacks always take
283 precedence over all of the other callbacks and    285 precedence over all of the other callbacks and, for example, type callbacks take
284 precedence over bus, class and driver callback    286 precedence over bus, class and driver callbacks.  To be precise, the following
285 rules are used to determine which callback to     287 rules are used to determine which callback to execute in the given phase:
286                                                   288 
287     1.  If ``dev->pm_domain`` is present, the     289     1.  If ``dev->pm_domain`` is present, the PM core will choose the callback
288         provided by ``dev->pm_domain->ops`` fo    290         provided by ``dev->pm_domain->ops`` for execution.
289                                                   291 
290     2.  Otherwise, if both ``dev->type`` and `    292     2.  Otherwise, if both ``dev->type`` and ``dev->type->pm`` are present, the
291         callback provided by ``dev->type->pm``    293         callback provided by ``dev->type->pm`` will be chosen for execution.
292                                                   294 
293     3.  Otherwise, if both ``dev->class`` and     295     3.  Otherwise, if both ``dev->class`` and ``dev->class->pm`` are present,
294         the callback provided by ``dev->class-    296         the callback provided by ``dev->class->pm`` will be chosen for
295         execution.                                297         execution.
296                                                   298 
297     4.  Otherwise, if both ``dev->bus`` and ``    299     4.  Otherwise, if both ``dev->bus`` and ``dev->bus->pm`` are present, the
298         callback provided by ``dev->bus->pm``     300         callback provided by ``dev->bus->pm`` will be chosen for execution.
299                                                   301 
300 This allows PM domains and device types to ove    302 This allows PM domains and device types to override callbacks provided by bus
301 types or device classes if necessary.             303 types or device classes if necessary.
302                                                   304 
303 The PM domain, type, class and bus callbacks m    305 The PM domain, type, class and bus callbacks may in turn invoke device- or
304 driver-specific methods stored in ``dev->drive    306 driver-specific methods stored in ``dev->driver->pm``, but they don't have to do
305 that.                                             307 that.
306                                                   308 
307 If the subsystem callback chosen for execution    309 If the subsystem callback chosen for execution is not present, the PM core will
308 execute the corresponding method from the ``de    310 execute the corresponding method from the ``dev->driver->pm`` set instead if
309 there is one.                                     311 there is one.
310                                                   312 
311                                                   313 
312 Entering System Suspend                           314 Entering System Suspend
313 -----------------------                           315 -----------------------
314                                                   316 
315 When the system goes into the freeze, standby     317 When the system goes into the freeze, standby or memory sleep state,
316 the phases are: ``prepare``, ``suspend``, ``su    318 the phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``.
317                                                   319 
318     1.  The ``prepare`` phase is meant to prev    320     1.  The ``prepare`` phase is meant to prevent races by preventing new
319         devices from being registered; the PM     321         devices from being registered; the PM core would never know that all the
320         children of a device had been suspende    322         children of a device had been suspended if new children could be
321         registered at will.  [By contrast, fro    323         registered at will.  [By contrast, from the PM core's perspective,
322         devices may be unregistered at any tim    324         devices may be unregistered at any time.]  Unlike the other
323         suspend-related phases, during the ``p    325         suspend-related phases, during the ``prepare`` phase the device
324         hierarchy is traversed top-down.          326         hierarchy is traversed top-down.
325                                                   327 
326         After the ``->prepare`` callback metho    328         After the ``->prepare`` callback method returns, no new children may be
327         registered below the device.  The meth    329         registered below the device.  The method may also prepare the device or
328         driver in some way for the upcoming sy    330         driver in some way for the upcoming system power transition, but it
329         should not put the device into a low-p !! 331         should not put the device into a low-power state.
330         device supports runtime power manageme << 
331         method must not update its state in ca << 
332         from runtime suspend later on.         << 
333                                                   332 
334         For devices supporting runtime power m    333         For devices supporting runtime power management, the return value of the
335         prepare callback can be used to indica    334         prepare callback can be used to indicate to the PM core that it may
336         safely leave the device in runtime sus    335         safely leave the device in runtime suspend (if runtime-suspended
337         already), provided that all of the dev    336         already), provided that all of the device's descendants are also left in
338         runtime suspend.  Namely, if the prepa    337         runtime suspend.  Namely, if the prepare callback returns a positive
339         number and that happens for all of the    338         number and that happens for all of the descendants of the device too,
340         and all of them (including the device     339         and all of them (including the device itself) are runtime-suspended, the
341         PM core will skip the ``suspend``, ``s    340         PM core will skip the ``suspend``, ``suspend_late`` and
342         ``suspend_noirq`` phases as well as al    341         ``suspend_noirq`` phases as well as all of the corresponding phases of
343         the subsequent device resume for all o    342         the subsequent device resume for all of these devices.  In that case,
344         the ``->complete`` callback will be th !! 343         the ``->complete`` callback will be invoked directly after the
345         ``->prepare`` callback and is entirely    344         ``->prepare`` callback and is entirely responsible for putting the
346         device into a consistent state as appr    345         device into a consistent state as appropriate.
347                                                   346 
348         Note that this direct-complete procedu    347         Note that this direct-complete procedure applies even if the device is
349         disabled for runtime PM; only the runt    348         disabled for runtime PM; only the runtime-PM status matters.  It follows
350         that if a device has system-sleep call    349         that if a device has system-sleep callbacks but does not support runtime
351         PM, then its prepare callback must nev    350         PM, then its prepare callback must never return a positive value.  This
352         is because all such devices are initia    351         is because all such devices are initially set to runtime-suspended with
353         runtime PM disabled.                      352         runtime PM disabled.
354                                                   353 
355         This feature also can be controlled by << 
356         ``DPM_FLAG_NO_DIRECT_COMPLETE`` and `` << 
357         power management flags.  [Typically, t << 
358         is probed against the device in questi << 
359         :c:func:`dev_pm_set_driver_flags` help << 
360         these flags is set, the PM core will n << 
361         procedure described above to the given << 
362         of its ancestors.  The second flag, wh << 
363         code (bus types, device types, PM doma << 
364         the return value of the ``->prepare``  << 
365         into account and it may only return a  << 
366         ``->prepare`` callback if the driver's << 
367         value.                                 << 
368                                                << 
369     2.  The ``->suspend`` methods should quies    354     2.  The ``->suspend`` methods should quiesce the device to stop it from
370         performing I/O.  They also may save th    355         performing I/O.  They also may save the device registers and put it into
371         the appropriate low-power state, depen    356         the appropriate low-power state, depending on the bus type the device is
372         on, and they may enable wakeup events.    357         on, and they may enable wakeup events.
373                                                   358 
374         However, for devices supporting runtim << 
375         ``->suspend`` methods provided by subs << 
376         in particular) must follow an addition << 
377         to the devices before their drivers' ` << 
378         Namely, they may resume the devices fr << 
379         calling :c:func:`pm_runtime_resume` fo << 
380         they must not update the state of the  << 
381         time (in case the drivers need to resu << 
382         suspend in their ``->suspend`` methods << 
383         subsystems or drivers from putting dev << 
384         these times by calling :c:func:`pm_run << 
385         the ``->prepare`` callback (and callin << 
386         issuing the ``->complete`` callback).  << 
387                                                << 
388     3.  For a number of devices it is convenie    359     3.  For a number of devices it is convenient to split suspend into the
389         "quiesce device" and "save device stat    360         "quiesce device" and "save device state" phases, in which cases
390         ``suspend_late`` is meant to do the la    361         ``suspend_late`` is meant to do the latter.  It is always executed after
391         runtime power management has been disa    362         runtime power management has been disabled for the device in question.
392                                                   363 
393     4.  The ``suspend_noirq`` phase occurs aft    364     4.  The ``suspend_noirq`` phase occurs after IRQ handlers have been disabled,
394         which means that the driver's interrup    365         which means that the driver's interrupt handler will not be called while
395         the callback method is running.  The `    366         the callback method is running.  The ``->suspend_noirq`` methods should
396         save the values of the device's regist    367         save the values of the device's registers that weren't saved previously
397         and finally put the device into the ap    368         and finally put the device into the appropriate low-power state.
398                                                   369 
399         The majority of subsystems and device     370         The majority of subsystems and device drivers need not implement this
400         callback.  However, bus types allowing    371         callback.  However, bus types allowing devices to share interrupt
401         vectors, like PCI, generally need it;     372         vectors, like PCI, generally need it; otherwise a driver might encounter
402         an error during the suspend phase by f    373         an error during the suspend phase by fielding a shared interrupt
403         generated by some other device after i    374         generated by some other device after its own device had been set to low
404         power.                                    375         power.
405                                                   376 
406 At the end of these phases, drivers should hav    377 At the end of these phases, drivers should have stopped all I/O transactions
407 (DMA, IRQs), saved enough state that they can     378 (DMA, IRQs), saved enough state that they can re-initialize or restore previous
408 state (as needed by the hardware), and placed     379 state (as needed by the hardware), and placed the device into a low-power state.
409 On many platforms they will gate off one or mo    380 On many platforms they will gate off one or more clock sources; sometimes they
410 will also switch off power supplies or reduce     381 will also switch off power supplies or reduce voltages.  [Drivers supporting
411 runtime PM may already have performed some or     382 runtime PM may already have performed some or all of these steps.]
412                                                   383 
413 If :c:func:`device_may_wakeup()` returns ``tru !! 384 If :c:func:`device_may_wakeup(dev)` returns ``true``, the device should be
414 prepared for generating hardware wakeup signal    385 prepared for generating hardware wakeup signals to trigger a system wakeup event
415 when the system is in the sleep state.  For ex    386 when the system is in the sleep state.  For example, :c:func:`enable_irq_wake()`
416 might identify GPIO signals hooked up to a swi    387 might identify GPIO signals hooked up to a switch or other external hardware,
417 and :c:func:`pci_enable_wake()` does something    388 and :c:func:`pci_enable_wake()` does something similar for the PCI PME signal.
418                                                   389 
419 If any of these callbacks returns an error, th    390 If any of these callbacks returns an error, the system won't enter the desired
420 low-power state.  Instead, the PM core will un    391 low-power state.  Instead, the PM core will unwind its actions by resuming all
421 the devices that were suspended.                  392 the devices that were suspended.
422                                                   393 
423                                                   394 
424 Leaving System Suspend                            395 Leaving System Suspend
425 ----------------------                            396 ----------------------
426                                                   397 
427 When resuming from freeze, standby or memory s    398 When resuming from freeze, standby or memory sleep, the phases are:
428 ``resume_noirq``, ``resume_early``, ``resume``    399 ``resume_noirq``, ``resume_early``, ``resume``, ``complete``.
429                                                   400 
430     1.  The ``->resume_noirq`` callback method    401     1.  The ``->resume_noirq`` callback methods should perform any actions
431         needed before the driver's interrupt h    402         needed before the driver's interrupt handlers are invoked.  This
432         generally means undoing the actions of    403         generally means undoing the actions of the ``suspend_noirq`` phase.  If
433         the bus type permits devices to share     404         the bus type permits devices to share interrupt vectors, like PCI, the
434         method should bring the device and its    405         method should bring the device and its driver into a state in which the
435         driver can recognize if the device is     406         driver can recognize if the device is the source of incoming interrupts,
436         if any, and handle them correctly.        407         if any, and handle them correctly.
437                                                   408 
438         For example, the PCI bus type's ``->pm    409         For example, the PCI bus type's ``->pm.resume_noirq()`` puts the device
439         into the full-power state (D0 in the P    410         into the full-power state (D0 in the PCI terminology) and restores the
440         standard configuration registers of th    411         standard configuration registers of the device.  Then it calls the
441         device driver's ``->pm.resume_noirq()`    412         device driver's ``->pm.resume_noirq()`` method to perform device-specific
442         actions.                                  413         actions.
443                                                   414 
444     2.  The ``->resume_early`` methods should     415     2.  The ``->resume_early`` methods should prepare devices for the execution
445         of the resume methods.  This generally    416         of the resume methods.  This generally involves undoing the actions of
446         the preceding ``suspend_late`` phase.     417         the preceding ``suspend_late`` phase.
447                                                   418 
448     3.  The ``->resume`` methods should bring     419     3.  The ``->resume`` methods should bring the device back to its operating
449         state, so that it can perform normal I    420         state, so that it can perform normal I/O.  This generally involves
450         undoing the actions of the ``suspend``    421         undoing the actions of the ``suspend`` phase.
451                                                   422 
452     4.  The ``complete`` phase should undo the    423     4.  The ``complete`` phase should undo the actions of the ``prepare`` phase.
453         For this reason, unlike the other resu    424         For this reason, unlike the other resume-related phases, during the
454         ``complete`` phase the device hierarch    425         ``complete`` phase the device hierarchy is traversed bottom-up.
455                                                   426 
456         Note, however, that new children may b    427         Note, however, that new children may be registered below the device as
457         soon as the ``->resume`` callbacks occ    428         soon as the ``->resume`` callbacks occur; it's not necessary to wait
458         until the ``complete`` phase runs.     !! 429         until the ``complete`` phase with that.
459                                                   430 
460         Moreover, if the preceding ``->prepare    431         Moreover, if the preceding ``->prepare`` callback returned a positive
461         number, the device may have been left     432         number, the device may have been left in runtime suspend throughout the
462         whole system suspend and resume (its ` !! 433         whole system suspend and resume (the ``suspend``, ``suspend_late``,
463         ``->suspend_noirq``, ``->resume_noirq` !! 434         ``suspend_noirq`` phases of system suspend and the ``resume_noirq``,
464         ``->resume_early``, and ``->resume`` c !! 435         ``resume_early``, ``resume`` phases of system resume may have been
465         skipped).  In that case, the ``->compl !! 436         skipped for it).  In that case, the ``->complete`` callback is entirely
466         responsible for putting the device int    437         responsible for putting the device into a consistent state after system
467         suspend if necessary.  [For example, i    438         suspend if necessary.  [For example, it may need to queue up a runtime
468         resume request for the device for this    439         resume request for the device for this purpose.]  To check if that is
469         the case, the ``->complete`` callback     440         the case, the ``->complete`` callback can consult the device's
470         ``power.direct_complete`` flag.  If th !! 441         ``power.direct_complete`` flag.  Namely, if that flag is set when the
471         ``->complete`` callback is being run t !! 442         ``->complete`` callback is being run, it has been called directly after
472         was used, and special actions may be r !! 443         the preceding ``->prepare`` and special actions may be required
473         correctly afterward.                   !! 444         to make the device work correctly afterward.
474                                                   445 
475 At the end of these phases, drivers should be     446 At the end of these phases, drivers should be as functional as they were before
476 suspending: I/O can be performed using DMA and    447 suspending: I/O can be performed using DMA and IRQs, and the relevant clocks are
477 gated on.                                         448 gated on.
478                                                   449 
479 However, the details here may again be platfor    450 However, the details here may again be platform-specific.  For example,
480 some systems support multiple "run" states, an    451 some systems support multiple "run" states, and the mode in effect at
481 the end of resume might not be the one which p    452 the end of resume might not be the one which preceded suspension.
482 That means availability of certain clocks or p    453 That means availability of certain clocks or power supplies changed,
483 which could easily affect how a driver works.     454 which could easily affect how a driver works.
484                                                   455 
485 Drivers need to be able to handle hardware whi    456 Drivers need to be able to handle hardware which has been reset since all of the
486 suspend methods were called, for example by co    457 suspend methods were called, for example by complete reinitialization.
487 This may be the hardest part, and the one most    458 This may be the hardest part, and the one most protected by NDA'd documents
488 and chip errata.  It's simplest if the hardwar    459 and chip errata.  It's simplest if the hardware state hasn't changed since
489 the suspend was carried out, but that can only    460 the suspend was carried out, but that can only be guaranteed if the target
490 system sleep entered was suspend-to-idle.  For    461 system sleep entered was suspend-to-idle.  For the other system sleep states
491 that may not be the case (and usually isn't fo    462 that may not be the case (and usually isn't for ACPI-defined system sleep
492 states, like S3).                                 463 states, like S3).
493                                                   464 
494 Drivers must also be prepared to notice that t    465 Drivers must also be prepared to notice that the device has been removed
495 while the system was powered down, whenever th    466 while the system was powered down, whenever that's physically possible.
496 PCMCIA, MMC, USB, Firewire, SCSI, and even IDE    467 PCMCIA, MMC, USB, Firewire, SCSI, and even IDE are common examples of busses
497 where common Linux platforms will see such rem    468 where common Linux platforms will see such removal.  Details of how drivers
498 will notice and handle such removals are curre    469 will notice and handle such removals are currently bus-specific, and often
499 involve a separate thread.                        470 involve a separate thread.
500                                                   471 
501 These callbacks may return an error value, but    472 These callbacks may return an error value, but the PM core will ignore such
502 errors since there's nothing it can do about t    473 errors since there's nothing it can do about them other than printing them in
503 the system log.                                   474 the system log.
504                                                   475 
505                                                   476 
506 Entering Hibernation                              477 Entering Hibernation
507 --------------------                              478 --------------------
508                                                   479 
509 Hibernating the system is more complicated tha    480 Hibernating the system is more complicated than putting it into sleep states,
510 because it involves creating and saving a syst    481 because it involves creating and saving a system image.  Therefore there are
511 more phases for hibernation, with a different     482 more phases for hibernation, with a different set of callbacks.  These phases
512 always run after tasks have been frozen and en    483 always run after tasks have been frozen and enough memory has been freed.
513                                                   484 
514 The general procedure for hibernation is to qu    485 The general procedure for hibernation is to quiesce all devices ("freeze"),
515 create an image of the system memory while eve    486 create an image of the system memory while everything is stable, reactivate all
516 devices ("thaw"), write the image to permanent    487 devices ("thaw"), write the image to permanent storage, and finally shut down
517 the system ("power off").  The phases used to     488 the system ("power off").  The phases used to accomplish this are: ``prepare``,
518 ``freeze``, ``freeze_late``, ``freeze_noirq``,    489 ``freeze``, ``freeze_late``, ``freeze_noirq``, ``thaw_noirq``, ``thaw_early``,
519 ``thaw``, ``complete``, ``prepare``, ``powerof    490 ``thaw``, ``complete``, ``prepare``, ``poweroff``, ``poweroff_late``,
520 ``poweroff_noirq``.                               491 ``poweroff_noirq``.
521                                                   492 
522     1.  The ``prepare`` phase is discussed in     493     1.  The ``prepare`` phase is discussed in the "Entering System Suspend"
523         section above.                            494         section above.
524                                                   495 
525     2.  The ``->freeze`` methods should quiesc    496     2.  The ``->freeze`` methods should quiesce the device so that it doesn't
526         generate IRQs or DMA, and they may nee    497         generate IRQs or DMA, and they may need to save the values of device
527         registers.  However the device does no    498         registers.  However the device does not have to be put in a low-power
528         state, and to save time it's best not     499         state, and to save time it's best not to do so.  Also, the device should
529         not be prepared to generate wakeup eve    500         not be prepared to generate wakeup events.
530                                                   501 
531     3.  The ``freeze_late`` phase is analogous    502     3.  The ``freeze_late`` phase is analogous to the ``suspend_late`` phase
532         described earlier, except that the dev    503         described earlier, except that the device should not be put into a
533         low-power state and should not be allo    504         low-power state and should not be allowed to generate wakeup events.
534                                                   505 
535     4.  The ``freeze_noirq`` phase is analogou    506     4.  The ``freeze_noirq`` phase is analogous to the ``suspend_noirq`` phase
536         discussed earlier, except again that t    507         discussed earlier, except again that the device should not be put into
537         a low-power state and should not be al    508         a low-power state and should not be allowed to generate wakeup events.
538                                                   509 
539 At this point the system image is created.  Al    510 At this point the system image is created.  All devices should be inactive and
540 the contents of memory should remain undisturb    511 the contents of memory should remain undisturbed while this happens, so that the
541 image forms an atomic snapshot of the system s    512 image forms an atomic snapshot of the system state.
542                                                   513 
543     5.  The ``thaw_noirq`` phase is analogous     514     5.  The ``thaw_noirq`` phase is analogous to the ``resume_noirq`` phase
544         discussed earlier.  The main differenc    515         discussed earlier.  The main difference is that its methods can assume
545         the device is in the same state as at     516         the device is in the same state as at the end of the ``freeze_noirq``
546         phase.                                    517         phase.
547                                                   518 
548     6.  The ``thaw_early`` phase is analogous     519     6.  The ``thaw_early`` phase is analogous to the ``resume_early`` phase
549         described above.  Its methods should u    520         described above.  Its methods should undo the actions of the preceding
550         ``freeze_late``, if necessary.            521         ``freeze_late``, if necessary.
551                                                   522 
552     7.  The ``thaw`` phase is analogous to the    523     7.  The ``thaw`` phase is analogous to the ``resume`` phase discussed
553         earlier.  Its methods should bring the    524         earlier.  Its methods should bring the device back to an operating
554         state, so that it can be used for savi    525         state, so that it can be used for saving the image if necessary.
555                                                   526 
556     8.  The ``complete`` phase is discussed in    527     8.  The ``complete`` phase is discussed in the "Leaving System Suspend"
557         section above.                            528         section above.
558                                                   529 
559 At this point the system image is saved, and t    530 At this point the system image is saved, and the devices then need to be
560 prepared for the upcoming system shutdown.  Th    531 prepared for the upcoming system shutdown.  This is much like suspending them
561 before putting the system into the suspend-to-    532 before putting the system into the suspend-to-idle, shallow or deep sleep state,
562 and the phases are similar.                       533 and the phases are similar.
563                                                   534 
564     9.  The ``prepare`` phase is discussed abo    535     9.  The ``prepare`` phase is discussed above.
565                                                   536 
566     10. The ``poweroff`` phase is analogous to    537     10. The ``poweroff`` phase is analogous to the ``suspend`` phase.
567                                                   538 
568     11. The ``poweroff_late`` phase is analogo    539     11. The ``poweroff_late`` phase is analogous to the ``suspend_late`` phase.
569                                                   540 
570     12. The ``poweroff_noirq`` phase is analog    541     12. The ``poweroff_noirq`` phase is analogous to the ``suspend_noirq`` phase.
571                                                   542 
572 The ``->poweroff``, ``->poweroff_late`` and ``    543 The ``->poweroff``, ``->poweroff_late`` and ``->poweroff_noirq`` callbacks
573 should do essentially the same things as the `    544 should do essentially the same things as the ``->suspend``, ``->suspend_late``
574 and ``->suspend_noirq`` callbacks, respectivel !! 545 and ``->suspend_noirq`` callbacks, respectively.  The only notable difference is
575 that they need not store the device register v    546 that they need not store the device register values, because the registers
576 should already have been stored during the ``f    547 should already have been stored during the ``freeze``, ``freeze_late`` or
577 ``freeze_noirq`` phases.  Also, on many machin !! 548 ``freeze_noirq`` phases.
578 the entire system, so it is not necessary for  << 
579 a low-power state.                             << 
580                                                   549 
581                                                   550 
582 Leaving Hibernation                               551 Leaving Hibernation
583 -------------------                               552 -------------------
584                                                   553 
585 Resuming from hibernation is, again, more comp    554 Resuming from hibernation is, again, more complicated than resuming from a sleep
586 state in which the contents of main memory are    555 state in which the contents of main memory are preserved, because it requires
587 a system image to be loaded into memory and th    556 a system image to be loaded into memory and the pre-hibernation memory contents
588 to be restored before control can be passed ba    557 to be restored before control can be passed back to the image kernel.
589                                                   558 
590 Although in principle the image might be loade    559 Although in principle the image might be loaded into memory and the
591 pre-hibernation memory contents restored by th    560 pre-hibernation memory contents restored by the boot loader, in practice this
592 can't be done because boot loaders aren't smar    561 can't be done because boot loaders aren't smart enough and there is no
593 established protocol for passing the necessary    562 established protocol for passing the necessary information.  So instead, the
594 boot loader loads a fresh instance of the kern    563 boot loader loads a fresh instance of the kernel, called "the restore kernel",
595 into memory and passes control to it in the us    564 into memory and passes control to it in the usual way.  Then the restore kernel
596 reads the system image, restores the pre-hiber    565 reads the system image, restores the pre-hibernation memory contents, and passes
597 control to the image kernel.  Thus two differe    566 control to the image kernel.  Thus two different kernel instances are involved
598 in resuming from hibernation.  In fact, the re    567 in resuming from hibernation.  In fact, the restore kernel may be completely
599 different from the image kernel: a different c    568 different from the image kernel: a different configuration and even a different
600 version.  This has important consequences for     569 version.  This has important consequences for device drivers and their
601 subsystems.                                       570 subsystems.
602                                                   571 
603 To be able to load the system image into memor    572 To be able to load the system image into memory, the restore kernel needs to
604 include at least a subset of device drivers al    573 include at least a subset of device drivers allowing it to access the storage
605 medium containing the image, although it doesn    574 medium containing the image, although it doesn't need to include all of the
606 drivers present in the image kernel.  After th    575 drivers present in the image kernel.  After the image has been loaded, the
607 devices managed by the boot kernel need to be     576 devices managed by the boot kernel need to be prepared for passing control back
608 to the image kernel.  This is very similar to     577 to the image kernel.  This is very similar to the initial steps involved in
609 creating a system image, and it is accomplishe    578 creating a system image, and it is accomplished in the same way, using
610 ``prepare``, ``freeze``, and ``freeze_noirq``     579 ``prepare``, ``freeze``, and ``freeze_noirq`` phases.  However, the devices
611 affected by these phases are only those having    580 affected by these phases are only those having drivers in the restore kernel;
612 other devices will still be in whatever state     581 other devices will still be in whatever state the boot loader left them.
613                                                   582 
614 Should the restoration of the pre-hibernation     583 Should the restoration of the pre-hibernation memory contents fail, the restore
615 kernel would go through the "thawing" procedur    584 kernel would go through the "thawing" procedure described above, using the
616 ``thaw_noirq``, ``thaw_early``, ``thaw``, and     585 ``thaw_noirq``, ``thaw_early``, ``thaw``, and ``complete`` phases, and then
617 continue running normally.  This happens only     586 continue running normally.  This happens only rarely.  Most often the
618 pre-hibernation memory contents are restored s    587 pre-hibernation memory contents are restored successfully and control is passed
619 to the image kernel, which then becomes respon    588 to the image kernel, which then becomes responsible for bringing the system back
620 to the working state.                             589 to the working state.
621                                                   590 
622 To achieve this, the image kernel must restore    591 To achieve this, the image kernel must restore the devices' pre-hibernation
623 functionality.  The operation is much like wak    592 functionality.  The operation is much like waking up from a sleep state (with
624 the memory contents preserved), although it in    593 the memory contents preserved), although it involves different phases:
625 ``restore_noirq``, ``restore_early``, ``restor    594 ``restore_noirq``, ``restore_early``, ``restore``, ``complete``.
626                                                   595 
627     1.  The ``restore_noirq`` phase is analogo    596     1.  The ``restore_noirq`` phase is analogous to the ``resume_noirq`` phase.
628                                                   597 
629     2.  The ``restore_early`` phase is analogo    598     2.  The ``restore_early`` phase is analogous to the ``resume_early`` phase.
630                                                   599 
631     3.  The ``restore`` phase is analogous to     600     3.  The ``restore`` phase is analogous to the ``resume`` phase.
632                                                   601 
633     4.  The ``complete`` phase is discussed ab    602     4.  The ``complete`` phase is discussed above.
634                                                   603 
635 The main difference from ``resume[_early|_noir    604 The main difference from ``resume[_early|_noirq]`` is that
636 ``restore[_early|_noirq]`` must assume the dev    605 ``restore[_early|_noirq]`` must assume the device has been accessed and
637 reconfigured by the boot loader or the restore    606 reconfigured by the boot loader or the restore kernel.  Consequently, the state
638 of the device may be different from the state     607 of the device may be different from the state remembered from the ``freeze``,
639 ``freeze_late`` and ``freeze_noirq`` phases.      608 ``freeze_late`` and ``freeze_noirq`` phases.  The device may even need to be
640 reset and completely re-initialized.  In many     609 reset and completely re-initialized.  In many cases this difference doesn't
641 matter, so the ``->resume[_early|_noirq]`` and    610 matter, so the ``->resume[_early|_noirq]`` and ``->restore[_early|_norq]``
642 method pointers can be set to the same routine    611 method pointers can be set to the same routines.  Nevertheless, different
643 callback pointers are used in case there is a     612 callback pointers are used in case there is a situation where it actually does
644 matter.                                           613 matter.
645                                                   614 
646                                                   615 
647 Power Management Notifiers                        616 Power Management Notifiers
648 ==========================                        617 ==========================
649                                                   618 
650 There are some operations that cannot be carri    619 There are some operations that cannot be carried out by the power management
651 callbacks discussed above, because the callbac    620 callbacks discussed above, because the callbacks occur too late or too early.
652 To handle these cases, subsystems and device d    621 To handle these cases, subsystems and device drivers may register power
653 management notifiers that are called before ta    622 management notifiers that are called before tasks are frozen and after they have
654 been thawed.  Generally speaking, the PM notif    623 been thawed.  Generally speaking, the PM notifiers are suitable for performing
655 actions that either require user space to be a    624 actions that either require user space to be available, or at least won't
656 interfere with user space.                        625 interfere with user space.
657                                                   626 
658 For details refer to Documentation/driver-api/ !! 627 For details refer to :doc:`notifiers`.
659                                                   628 
660                                                   629 
661 Device Low-Power (suspend) States                 630 Device Low-Power (suspend) States
662 =================================                 631 =================================
663                                                   632 
664 Device low-power states aren't standard.  One     633 Device low-power states aren't standard.  One device might only handle
665 "on" and "off", while another might support a     634 "on" and "off", while another might support a dozen different versions of
666 "on" (how many engines are active?), plus a st    635 "on" (how many engines are active?), plus a state that gets back to "on"
667 faster than from a full "off".                    636 faster than from a full "off".
668                                                   637 
669 Some buses define rules about what different s    638 Some buses define rules about what different suspend states mean.  PCI
670 gives one example: after the suspend sequence     639 gives one example: after the suspend sequence completes, a non-legacy
671 PCI device may not perform DMA or issue IRQs,     640 PCI device may not perform DMA or issue IRQs, and any wakeup events it
672 issues would be issued through the PME# bus si    641 issues would be issued through the PME# bus signal.  Plus, there are
673 several PCI-standard device states, some of wh    642 several PCI-standard device states, some of which are optional.
674                                                   643 
675 In contrast, integrated system-on-chip process    644 In contrast, integrated system-on-chip processors often use IRQs as the
676 wakeup event sources (so drivers would call :c    645 wakeup event sources (so drivers would call :c:func:`enable_irq_wake`) and
677 might be able to treat DMA completion as a wak    646 might be able to treat DMA completion as a wakeup event (sometimes DMA can stay
678 active too, it'd only be the CPU and some peri    647 active too, it'd only be the CPU and some peripherals that sleep).
679                                                   648 
680 Some details here may be platform-specific.  S    649 Some details here may be platform-specific.  Systems may have devices that
681 can be fully active in certain sleep states, s    650 can be fully active in certain sleep states, such as an LCD display that's
682 refreshed using DMA while most of the system i    651 refreshed using DMA while most of the system is sleeping lightly ... and
683 its frame buffer might even be updated by a DS    652 its frame buffer might even be updated by a DSP or other non-Linux CPU while
684 the Linux control processor stays idle.           653 the Linux control processor stays idle.
685                                                   654 
686 Moreover, the specific actions taken may depen    655 Moreover, the specific actions taken may depend on the target system state.
687 One target system state might allow a given de    656 One target system state might allow a given device to be very operational;
688 another might require a hard shut down with re    657 another might require a hard shut down with re-initialization on resume.
689 And two different target systems might use the    658 And two different target systems might use the same device in different
690 ways; the aforementioned LCD might be active i    659 ways; the aforementioned LCD might be active in one product's "standby",
691 but a different product using the same SOC mig    660 but a different product using the same SOC might work differently.
692                                                   661 
693                                                   662 
694 Device Power Management Domains                   663 Device Power Management Domains
695 ===============================                   664 ===============================
696                                                   665 
697 Sometimes devices share reference clocks or ot    666 Sometimes devices share reference clocks or other power resources.  In those
698 cases it generally is not possible to put devi    667 cases it generally is not possible to put devices into low-power states
699 individually.  Instead, a set of devices shari    668 individually.  Instead, a set of devices sharing a power resource can be put
700 into a low-power state together at the same ti    669 into a low-power state together at the same time by turning off the shared
701 power resource.  Of course, they also need to     670 power resource.  Of course, they also need to be put into the full-power state
702 together, by turning the shared power resource    671 together, by turning the shared power resource on.  A set of devices with this
703 property is often referred to as a power domai    672 property is often referred to as a power domain. A power domain may also be
704 nested inside another power domain. The nested    673 nested inside another power domain. The nested domain is referred to as the
705 sub-domain of the parent domain.                  674 sub-domain of the parent domain.
706                                                   675 
707 Support for power domains is provided through     676 Support for power domains is provided through the :c:member:`pm_domain` field of
708 struct device.  This field is a pointer to an  !! 677 |struct device|.  This field is a pointer to an object of type
709 struct dev_pm_domain, defined in :file:`includ !! 678 |struct dev_pm_domain|, defined in :file:`include/linux/pm.h`, providing a set
710 of power management callbacks analogous to the    679 of power management callbacks analogous to the subsystem-level and device driver
711 callbacks that are executed for the given devi    680 callbacks that are executed for the given device during all power transitions,
712 instead of the respective subsystem-level call    681 instead of the respective subsystem-level callbacks.  Specifically, if a
713 device's :c:member:`pm_domain` pointer is not     682 device's :c:member:`pm_domain` pointer is not NULL, the ``->suspend()`` callback
714 from the object pointed to by it will be execu    683 from the object pointed to by it will be executed instead of its subsystem's
715 (e.g. bus type's) ``->suspend()`` callback and    684 (e.g. bus type's) ``->suspend()`` callback and analogously for all of the
716 remaining callbacks.  In other words, power ma    685 remaining callbacks.  In other words, power management domain callbacks, if
717 defined for the given device, always take prec    686 defined for the given device, always take precedence over the callbacks provided
718 by the device's subsystem (e.g. bus type).        687 by the device's subsystem (e.g. bus type).
719                                                   688 
720 The support for device power management domain    689 The support for device power management domains is only relevant to platforms
721 needing to use the same device driver power ma    690 needing to use the same device driver power management callbacks in many
722 different power domain configurations and want    691 different power domain configurations and wanting to avoid incorporating the
723 support for power domains into subsystem-level    692 support for power domains into subsystem-level callbacks, for example by
724 modifying the platform bus type.  Other platfo    693 modifying the platform bus type.  Other platforms need not implement it or take
725 it into account in any way.                       694 it into account in any way.
726                                                   695 
727 Devices may be defined as IRQ-safe which indic    696 Devices may be defined as IRQ-safe which indicates to the PM core that their
728 runtime PM callbacks may be invoked with disab    697 runtime PM callbacks may be invoked with disabled interrupts (see
729 Documentation/power/runtime_pm.rst for more in !! 698 :file:`Documentation/power/runtime_pm.txt` for more information).  If an
730 IRQ-safe device belongs to a PM domain, the ru    699 IRQ-safe device belongs to a PM domain, the runtime PM of the domain will be
731 disallowed, unless the domain itself is define    700 disallowed, unless the domain itself is defined as IRQ-safe. However, it
732 makes sense to define a PM domain as IRQ-safe     701 makes sense to define a PM domain as IRQ-safe only if all the devices in it
733 are IRQ-safe. Moreover, if an IRQ-safe domain     702 are IRQ-safe. Moreover, if an IRQ-safe domain has a parent domain, the runtime
734 PM of the parent is only allowed if the parent    703 PM of the parent is only allowed if the parent itself is IRQ-safe too with the
735 additional restriction that all child domains     704 additional restriction that all child domains of an IRQ-safe parent must also
736 be IRQ-safe.                                      705 be IRQ-safe.
737                                                   706 
738                                                   707 
739 Runtime Power Management                          708 Runtime Power Management
740 ========================                          709 ========================
741                                                   710 
742 Many devices are able to dynamically power dow    711 Many devices are able to dynamically power down while the system is still
743 running. This feature is useful for devices th    712 running. This feature is useful for devices that are not being used, and
744 can offer significant power savings on a runni    713 can offer significant power savings on a running system.  These devices
745 often support a range of runtime power states,    714 often support a range of runtime power states, which might use names such
746 as "off", "sleep", "idle", "active", and so on    715 as "off", "sleep", "idle", "active", and so on.  Those states will in some
747 cases (like PCI) be partially constrained by t    716 cases (like PCI) be partially constrained by the bus the device uses, and will
748 usually include hardware states that are also     717 usually include hardware states that are also used in system sleep states.
749                                                   718 
750 A system-wide power transition can be started     719 A system-wide power transition can be started while some devices are in low
751 power states due to runtime power management.     720 power states due to runtime power management.  The system sleep PM callbacks
752 should recognize such situations and react to     721 should recognize such situations and react to them appropriately, but the
753 necessary actions are subsystem-specific.         722 necessary actions are subsystem-specific.
754                                                   723 
755 In some cases the decision may be made at the     724 In some cases the decision may be made at the subsystem level while in other
756 cases the device driver may be left to decide.    725 cases the device driver may be left to decide.  In some cases it may be
757 desirable to leave a suspended device in that     726 desirable to leave a suspended device in that state during a system-wide power
758 transition, but in other cases the device must    727 transition, but in other cases the device must be put back into the full-power
759 state temporarily, for example so that its sys    728 state temporarily, for example so that its system wakeup capability can be
760 disabled.  This all depends on the hardware an    729 disabled.  This all depends on the hardware and the design of the subsystem and
761 device driver in question.                        730 device driver in question.
762                                                   731 
763 If it is necessary to resume a device from run << 
764 transition into a sleep state, that can be don << 
765 :c:func:`pm_runtime_resume` from the ``->suspe << 
766 or ``->poweroff`` callback for transitions rel << 
767 device's driver or its subsystem (for example, << 
768 However, subsystems must not otherwise change  << 
769 from their ``->prepare`` and ``->suspend`` cal << 
770 invoking device drivers' ``->suspend`` callbac << 
771                                                << 
772 .. _smart_suspend_flag:                        << 
773                                                << 
774 The ``DPM_FLAG_SMART_SUSPEND`` Driver Flag     << 
775 ------------------------------------------     << 
776                                                << 
777 Some bus types and PM domains have a policy to << 
778 suspend upfront in their ``->suspend`` callbac << 
779 necessary if the device's driver can cope with << 
780 The driver can indicate this by setting ``DPM_ << 
781 :c:member:`power.driver_flags` at probe time,  << 
782 :c:func:`dev_pm_set_driver_flags` helper routi << 
783                                                << 
784 Setting that flag causes the PM core and middl << 
785 (bus types, PM domains etc.) to skip the ``->s << 
786 ``->suspend_noirq`` callbacks provided by the  << 
787 runtime suspend throughout those phases of the << 
788 similarly for the "freeze" and "poweroff" part << 
789 [Otherwise the same driver                     << 
790 callback might be executed twice in a row for  << 
791 be valid in general.]  If the middle-layer sys << 
792 for the device then they are responsible for s << 
793 if not then the PM core skips them.  The subsy << 
794 determine whether they need to skip the driver << 
795 value from the :c:func:`dev_pm_skip_suspend` h << 
796                                                << 
797 In addition, with ``DPM_FLAG_SMART_SUSPEND`` s << 
798 and ``->thaw_early`` callbacks are skipped in  << 
799 in runtime suspend throughout the preceding "f << 
800 middle-layer callbacks are present for the dev << 
801 doing this, otherwise the PM core takes care o << 
802                                                << 
803                                                << 
804 The ``DPM_FLAG_MAY_SKIP_RESUME`` Driver Flag   << 
805 --------------------------------------------   << 
806                                                << 
807 During system-wide resume from a sleep state i    732 During system-wide resume from a sleep state it's easiest to put devices into
808 the full-power state, as explained in Document !! 733 the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`.
809 [Refer to that document for more information r !! 734 Refer to that document for more information regarding this particular issue as
810 well as for information on the device runtime     735 well as for information on the device runtime power management framework in
811 general.]  However, it often is desirable to l !! 736 general.
812 system transitions to the working state, espec << 
813 runtime suspend before the preceding system-wi << 
814 transition.                                    << 
815                                                << 
816 To that end, device drivers can use the ``DPM_ << 
817 indicate to the PM core and middle-layer code  << 
818 "early" resume callbacks to be skipped if the  << 
819 after system-wide PM transitions to the workin << 
820 the case generally depends on the state of the << 
821 suspend-resume cycle and on the type of the sy << 
822 In particular, the "thaw" and "restore" transi << 
823 not affected by ``DPM_FLAG_MAY_SKIP_RESUME`` a << 
824 issued during the "restore" transition regardl << 
825 and whether or not any driver callbacks        << 
826 are skipped during the "thaw" transition depen << 
827 ``DPM_FLAG_SMART_SUSPEND`` flag is set (see `a << 
828 In addition, a device is not allowed to remain << 
829 children will be returned to full power.]      << 
830                                                << 
831 The ``DPM_FLAG_MAY_SKIP_RESUME`` flag is taken << 
832 the :c:member:`power.may_skip_resume` status b << 
833 "suspend" phase of suspend-type transitions.   << 
834 has a reason to prevent the driver's "noirq" a << 
835 being skipped during the subsequent system res << 
836 clear :c:member:`power.may_skip_resume` in its << 
837 or ``->suspend_noirq`` callback.  [Note that t << 
838 ``DPM_FLAG_SMART_SUSPEND`` need to clear :c:me << 
839 their ``->suspend`` callback in case the other << 
840                                                << 
841 Setting the :c:member:`power.may_skip_resume`  << 
842 ``DPM_FLAG_MAY_SKIP_RESUME`` flag is necessary << 
843 for the driver's "noirq" and "early" resume ca << 
844 not they should be skipped can be determined b << 
845 :c:func:`dev_pm_skip_resume` helper function.  << 
846                                                << 
847 If that function returns ``true``, the driver' << 
848 callbacks should be skipped and the device's r << 
849 "suspended" by the PM core.  Otherwise, if the << 
850 during the preceding system-wide suspend trans << 
851 ``DPM_FLAG_SMART_SUSPEND`` is set, its runtime << 
852 "active" by the PM core.  [Hence, the drivers  << 
853 ``DPM_FLAG_SMART_SUSPEND`` should not expect t << 
854 devices to be changed from "suspended" to "act << 
855 system-wide resume-type transitions.]          << 
856                                                << 
857 If the ``DPM_FLAG_MAY_SKIP_RESUME`` flag is no << 
858 ``DPM_FLAG_SMART_SUSPEND`` is set and the driv << 
859 callbacks are skipped, its system-wide "noirq" << 
860 present, are invoked as usual and the device's << 
861 "active" by the PM core before enabling runtim << 
862 driver must be prepared to cope with the invoc << 
863 callbacks back-to-back with its ``->runtime_su << 
864 intervening ``->runtime_resume`` and system-wi << 
865 final state of the device must reflect the "ac << 
866 case.  [Note that this is not a problem at all << 
867 ``->suspend_late`` callback pointer points to  << 
868 ``->runtime_suspend`` one and its ``->resume_e << 
869 the same function as the ``->runtime_resume``  << 
870 system-wide suspend-resume callbacks of the dr << 
871                                                << 
872 Likewise, if ``DPM_FLAG_MAY_SKIP_RESUME`` is s << 
873 system-wide "noirq" and "early" resume callbac << 
874 and "noirq" suspend callbacks may have been ex << 
875 of whether or not ``DPM_FLAG_SMART_SUSPEND`` i << 
876 needs to be able to cope with the invocation o << 
877 callback back-to-back with its "late" and "noi << 
878 that is not a concern if the driver sets both  << 
879 ``DPM_FLAG_MAY_SKIP_RESUME`` and uses the same << 
880 functions for runtime PM and system-wide suspe << 
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php