~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/networking/devlink/devlink-health.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/networking/devlink/devlink-health.rst (Architecture ppc) and /Documentation/networking/devlink/devlink-health.rst (Architecture sparc64)


  1 .. SPDX-License-Identifier: GPL-2.0                 1 .. SPDX-License-Identifier: GPL-2.0
  2                                                     2 
  3 ==============                                      3 ==============
  4 Devlink Health                                      4 Devlink Health
  5 ==============                                      5 ==============
  6                                                     6 
  7 Background                                          7 Background
  8 ==========                                          8 ==========
  9                                                     9 
 10 The ``devlink`` health mechanism is targeted f     10 The ``devlink`` health mechanism is targeted for Real Time Alerting, in
 11 order to know when something bad happened to a     11 order to know when something bad happened to a PCI device.
 12                                                    12 
 13   * Provide alert debug information.               13   * Provide alert debug information.
 14   * Self healing.                                  14   * Self healing.
 15   * If problem needs vendor support, provide a     15   * If problem needs vendor support, provide a way to gather all needed
 16     debugging information.                         16     debugging information.
 17                                                    17 
 18 Overview                                           18 Overview
 19 ========                                           19 ========
 20                                                    20 
 21 The main idea is to unify and centralize drive     21 The main idea is to unify and centralize driver health reports in the
 22 generic ``devlink`` instance and allow the use     22 generic ``devlink`` instance and allow the user to set different
 23 attributes of the health reporting and recover     23 attributes of the health reporting and recovery procedures.
 24                                                    24 
 25 The ``devlink`` health reporter:                   25 The ``devlink`` health reporter:
 26 Device driver creates a "health reporter" per      26 Device driver creates a "health reporter" per each error/health type.
 27 Error/Health type can be a known/generic (e.g.     27 Error/Health type can be a known/generic (e.g. PCI error, fw error, rx/tx error)
 28 or unknown (driver specific).                      28 or unknown (driver specific).
 29 For each registered health reporter a driver c     29 For each registered health reporter a driver can issue error/health reports
 30 asynchronously. All health reports handling is     30 asynchronously. All health reports handling is done by ``devlink``.
 31 Device driver can provide specific callbacks f     31 Device driver can provide specific callbacks for each "health reporter", e.g.:
 32                                                    32 
 33   * Recovery procedures                            33   * Recovery procedures
 34   * Diagnostics procedures                         34   * Diagnostics procedures
 35   * Object dump procedures                         35   * Object dump procedures
 36   * Out Of Box initial parameters                  36   * Out Of Box initial parameters
 37                                                    37 
 38 Different parts of the driver can register dif     38 Different parts of the driver can register different types of health reporters
 39 with different handlers.                           39 with different handlers.
 40                                                    40 
 41 Actions                                            41 Actions
 42 =======                                            42 =======
 43                                                    43 
 44 Once an error is reported, devlink health will     44 Once an error is reported, devlink health will perform the following actions:
 45                                                    45 
 46   * A log is being send to the kernel trace ev     46   * A log is being send to the kernel trace events buffer
 47   * Health status and statistics are being upd     47   * Health status and statistics are being updated for the reporter instance
 48   * Object dump is being taken and saved at th     48   * Object dump is being taken and saved at the reporter instance (as long as
 49     auto-dump is set and there is no other dum     49     auto-dump is set and there is no other dump which is already stored)
 50   * Auto recovery attempt is being done. Depen     50   * Auto recovery attempt is being done. Depends on:
 51                                                    51 
 52     - Auto-recovery configuration                  52     - Auto-recovery configuration
 53     - Grace period vs. time passed since last      53     - Grace period vs. time passed since last recover
 54                                                    54 
 55 Devlink formatted message                          55 Devlink formatted message
 56 =========================                          56 =========================
 57                                                    57 
 58 To handle devlink health diagnose and health d     58 To handle devlink health diagnose and health dump requests, devlink creates a
 59 formatted message structure ``devlink_fmsg`` a     59 formatted message structure ``devlink_fmsg`` and send it to the driver's callback
 60 to fill the data in using the devlink fmsg API     60 to fill the data in using the devlink fmsg API.
 61                                                    61 
 62 Devlink fmsg is a mechanism to pass descriptor     62 Devlink fmsg is a mechanism to pass descriptors between drivers and devlink, in
 63 json-like format. The API allows the driver to     63 json-like format. The API allows the driver to add nested attributes such as
 64 object, object pair and value array, in additi     64 object, object pair and value array, in addition to attributes such as name and
 65 value.                                             65 value.
 66                                                    66 
 67 Driver should use this API to fill the fmsg co     67 Driver should use this API to fill the fmsg context in a format which will be
 68 translated by the devlink to the netlink messa     68 translated by the devlink to the netlink message later. When it needs to send
 69 the data using SKBs to the netlink layer, it f     69 the data using SKBs to the netlink layer, it fragments the data between
 70 different SKBs. In order to do this fragmentat     70 different SKBs. In order to do this fragmentation, it uses virtual nests
 71 attributes, to avoid actual nesting use which      71 attributes, to avoid actual nesting use which cannot be divided between
 72 different SKBs.                                    72 different SKBs.
 73                                                    73 
 74 User Interface                                     74 User Interface
 75 ==============                                     75 ==============
 76                                                    76 
 77 User can access/change each reporter's paramet     77 User can access/change each reporter's parameters and driver specific callbacks
 78 via ``devlink``, e.g per error type (per healt     78 via ``devlink``, e.g per error type (per health reporter):
 79                                                    79 
 80   * Configure reporter's generic parameters (l     80   * Configure reporter's generic parameters (like: disable/enable auto recovery)
 81   * Invoke recovery procedure                      81   * Invoke recovery procedure
 82   * Run diagnostics                                82   * Run diagnostics
 83   * Object dump                                    83   * Object dump
 84                                                    84 
 85 .. list-table:: List of devlink health interfa     85 .. list-table:: List of devlink health interfaces
 86    :widths: 10 90                                  86    :widths: 10 90
 87                                                    87 
 88    * - Name                                        88    * - Name
 89      - Description                                 89      - Description
 90    * - ``DEVLINK_CMD_HEALTH_REPORTER_GET``         90    * - ``DEVLINK_CMD_HEALTH_REPORTER_GET``
 91      - Retrieves status and configuration info     91      - Retrieves status and configuration info per DEV and reporter.
 92    * - ``DEVLINK_CMD_HEALTH_REPORTER_SET``         92    * - ``DEVLINK_CMD_HEALTH_REPORTER_SET``
 93      - Allows reporter-related configuration s     93      - Allows reporter-related configuration setting.
 94    * - ``DEVLINK_CMD_HEALTH_REPORTER_RECOVER``     94    * - ``DEVLINK_CMD_HEALTH_REPORTER_RECOVER``
 95      - Triggers reporter's recovery procedure.     95      - Triggers reporter's recovery procedure.
 96    * - ``DEVLINK_CMD_HEALTH_REPORTER_TEST``        96    * - ``DEVLINK_CMD_HEALTH_REPORTER_TEST``
 97      - Triggers a fake health event on the rep     97      - Triggers a fake health event on the reporter. The effects of the test
 98        event in terms of recovery flow should      98        event in terms of recovery flow should follow closely that of a real
 99        event.                                      99        event.
100    * - ``DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE`    100    * - ``DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE``
101      - Retrieves current device state related     101      - Retrieves current device state related to the reporter.
102    * - ``DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET`    102    * - ``DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET``
103      - Retrieves the last stored dump. Devlink    103      - Retrieves the last stored dump. Devlink health
104        saves a single dump. If an dump is not     104        saves a single dump. If an dump is not already stored by devlink
105        for this reporter, devlink generates a     105        for this reporter, devlink generates a new dump.
106        Dump output is defined by the reporter.    106        Dump output is defined by the reporter.
107    * - ``DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEA    107    * - ``DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR``
108      - Clears the last saved dump file for the    108      - Clears the last saved dump file for the specified reporter.
109                                                   109 
110 The following diagram provides a general overv    110 The following diagram provides a general overview of ``devlink-health``::
111                                                   111 
112                                                   112                                                    netlink
113                                           +---    113                                           +--------------------------+
114                                           |       114                                           |                          |
115                                           |       115                                           |            +             |
116                                           |       116                                           |            |             |
117                                           +---    117                                           +--------------------------+
118                                                   118                                                        |request for ops
119                                                   119                                                        |(diagnose,
120       driver                               dev    120       driver                               devlink     |recover,
121                                                   121                                                        |dump)
122     +--------+                            +---    122     +--------+                            +--------------------------+
123     |        |                            |       123     |        |                            |    reporter|             |
124     |        |                            |  +    124     |        |                            |  +---------v----------+  |
125     |        |   ops execution            |  |    125     |        |   ops execution            |  |                    |  |
126     |     <----------------------------------+    126     |     <----------------------------------+                    |  |
127     |        |                            |  |    127     |        |                            |  |                    |  |
128     |        |                            |  +    128     |        |                            |  + ^------------------+  |
129     |        |                            |       129     |        |                            |    | request for ops     |
130     |        |                            |       130     |        |                            |    | (recover, dump)     |
131     |        |                            |       131     |        |                            |    |                     |
132     |        |                            |  +    132     |        |                            |  +-+------------------+  |
133     |        |     health report          |  |    133     |        |     health report          |  | health handler     |  |
134     |        +------------------------------->    134     |        +------------------------------->                    |  |
135     |        |                            |  +    135     |        |                            |  +--------------------+  |
136     |        |     health reporter create |       136     |        |     health reporter create |                          |
137     |        +---------------------------->       137     |        +---------------------------->                          |
138     +--------+                            +---    138     +--------+                            +--------------------------+
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php