~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/accel/qaic/aic100.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/accel/qaic/aic100.rst (Version linux-6.12-rc7) and /Documentation/accel/qaic/aic100.rst (Version linux-6.9.12)


  1 .. SPDX-License-Identifier: GPL-2.0-only            1 .. SPDX-License-Identifier: GPL-2.0-only
  2                                                     2 
  3 ===============================                     3 ===============================
  4  Qualcomm Cloud AI 100 (AIC100)                     4  Qualcomm Cloud AI 100 (AIC100)
  5 ===============================                     5 ===============================
  6                                                     6 
  7 Overview                                            7 Overview
  8 ========                                            8 ========
  9                                                     9 
 10 The Qualcomm Cloud AI 100/AIC100 family of pro     10 The Qualcomm Cloud AI 100/AIC100 family of products (including SA9000P - part of
 11 Snapdragon Ride) are PCIe adapter cards which      11 Snapdragon Ride) are PCIe adapter cards which contain a dedicated SoC ASIC for
 12 the purpose of efficiently running Artificial      12 the purpose of efficiently running Artificial Intelligence (AI) Deep Learning
 13 inference workloads. They are AI accelerators.     13 inference workloads. They are AI accelerators.
 14                                                    14 
 15 The PCIe interface of AIC100 is capable of PCI     15 The PCIe interface of AIC100 is capable of PCIe Gen4 speeds over eight lanes
 16 (x8). An individual SoC on a card can have up      16 (x8). An individual SoC on a card can have up to 16 NSPs for running workloads.
 17 Each SoC has an A53 management CPU. On card, t     17 Each SoC has an A53 management CPU. On card, there can be up to 32 GB of DDR.
 18                                                    18 
 19 Multiple AIC100 cards can be hosted in a singl     19 Multiple AIC100 cards can be hosted in a single system to scale overall
 20 performance. AIC100 cards are multi-user capab     20 performance. AIC100 cards are multi-user capable and able to execute workloads
 21 from multiple users in a concurrent manner.        21 from multiple users in a concurrent manner.
 22                                                    22 
 23 Hardware Description                               23 Hardware Description
 24 ====================                               24 ====================
 25                                                    25 
 26 An AIC100 card consists of an AIC100 SoC, on-c     26 An AIC100 card consists of an AIC100 SoC, on-card DDR, and a set of misc
 27 peripherals (PMICs, etc).                          27 peripherals (PMICs, etc).
 28                                                    28 
 29 An AIC100 card can either be a PCIe HHHL form      29 An AIC100 card can either be a PCIe HHHL form factor (a traditional PCIe card),
 30 or a Dual M.2 card. Both use PCIe to connect t     30 or a Dual M.2 card. Both use PCIe to connect to the host system.
 31                                                    31 
 32 As a PCIe endpoint/adapter, AIC100 uses the st     32 As a PCIe endpoint/adapter, AIC100 uses the standard VendorID(VID)/
 33 DeviceID(DID) combination to uniquely identify     33 DeviceID(DID) combination to uniquely identify itself to the host. AIC100
 34 uses the standard Qualcomm VID (0x17cb). All A     34 uses the standard Qualcomm VID (0x17cb). All AIC100 SKUs use the same
 35 AIC100 DID (0xa100).                               35 AIC100 DID (0xa100).
 36                                                    36 
 37 AIC100 does not implement FLR (function level      37 AIC100 does not implement FLR (function level reset).
 38                                                    38 
 39 AIC100 implements MSI but does not implement M     39 AIC100 implements MSI but does not implement MSI-X. AIC100 prefers 17 MSIs to
 40 operate (1 for MHI, 16 for the DMA Bridge). Fa     40 operate (1 for MHI, 16 for the DMA Bridge). Falling back to 1 MSI is possible in
 41 scenarios where reserving 32 MSIs isn't feasib     41 scenarios where reserving 32 MSIs isn't feasible.
 42                                                    42 
 43 As a PCIe device, AIC100 utilizes BARs to prov     43 As a PCIe device, AIC100 utilizes BARs to provide host interfaces to the device
 44 hardware. AIC100 provides 3, 64-bit BARs.          44 hardware. AIC100 provides 3, 64-bit BARs.
 45                                                    45 
 46 * The first BAR is 4K in size, and exposes the     46 * The first BAR is 4K in size, and exposes the MHI interface to the host.
 47                                                    47 
 48 * The second BAR is 2M in size, and exposes th     48 * The second BAR is 2M in size, and exposes the DMA Bridge interface to the
 49   host.                                            49   host.
 50                                                    50 
 51 * The third BAR is variable in size based on a     51 * The third BAR is variable in size based on an individual AIC100's
 52   configuration, but defaults to 64K. This BAR     52   configuration, but defaults to 64K. This BAR currently has no purpose.
 53                                                    53 
 54 From the host perspective, AIC100 has several      54 From the host perspective, AIC100 has several key hardware components -
 55                                                    55 
 56 * MHI (Modem Host Interface)                       56 * MHI (Modem Host Interface)
 57 * QSM (QAIC Service Manager)                       57 * QSM (QAIC Service Manager)
 58 * NSPs (Neural Signal Processor)                   58 * NSPs (Neural Signal Processor)
 59 * DMA Bridge                                       59 * DMA Bridge
 60 * DDR                                              60 * DDR
 61                                                    61 
 62 MHI                                                62 MHI
 63 ---                                                63 ---
 64                                                    64 
 65 AIC100 has one MHI interface over PCIe. MHI it     65 AIC100 has one MHI interface over PCIe. MHI itself is documented at
 66 Documentation/mhi/index.rst MHI is the mechani     66 Documentation/mhi/index.rst MHI is the mechanism the host uses to communicate
 67 with the QSM. Except for workload data via the     67 with the QSM. Except for workload data via the DMA Bridge, all interaction with
 68 the device occurs via MHI.                         68 the device occurs via MHI.
 69                                                    69 
 70 QSM                                                70 QSM
 71 ---                                                71 ---
 72                                                    72 
 73 QAIC Service Manager. This is an ARM A53 CPU t     73 QAIC Service Manager. This is an ARM A53 CPU that runs the primary
 74 firmware of the card and performs on-card mana     74 firmware of the card and performs on-card management tasks. It also
 75 communicates with the host via MHI. Each AIC10     75 communicates with the host via MHI. Each AIC100 has one of
 76 these.                                             76 these.
 77                                                    77 
 78 NSP                                                78 NSP
 79 ---                                                79 ---
 80                                                    80 
 81 Neural Signal Processor. Each AIC100 has up to     81 Neural Signal Processor. Each AIC100 has up to 16 of these. These are
 82 the processors that run the workloads on AIC10     82 the processors that run the workloads on AIC100. Each NSP is a Qualcomm Hexagon
 83 (Q6) DSP with HVX and HMX. Each NSP can only r     83 (Q6) DSP with HVX and HMX. Each NSP can only run one workload at a time, but
 84 multiple NSPs may be assigned to a single work     84 multiple NSPs may be assigned to a single workload. Since each NSP can only run
 85 one workload, AIC100 is limited to 16 concurre     85 one workload, AIC100 is limited to 16 concurrent workloads. Workload
 86 "scheduling" is under the purview of the host.     86 "scheduling" is under the purview of the host. AIC100 does not automatically
 87 timeslice.                                         87 timeslice.
 88                                                    88 
 89 DMA Bridge                                         89 DMA Bridge
 90 ----------                                         90 ----------
 91                                                    91 
 92 The DMA Bridge is custom DMA engine that manag     92 The DMA Bridge is custom DMA engine that manages the flow of data
 93 in and out of workloads. AIC100 has one of the     93 in and out of workloads. AIC100 has one of these. The DMA Bridge has 16
 94 channels, each consisting of a set of request/     94 channels, each consisting of a set of request/response FIFOs. Each active
 95 workload is assigned a single DMA Bridge chann     95 workload is assigned a single DMA Bridge channel. The DMA Bridge exposes
 96 hardware registers to manage the FIFOs (head/t     96 hardware registers to manage the FIFOs (head/tail pointers), but requires host
 97 memory to store the FIFOs.                         97 memory to store the FIFOs.
 98                                                    98 
 99 DDR                                                99 DDR
100 ---                                               100 ---
101                                                   101 
102 AIC100 has on-card DDR. In total, an AIC100 ca    102 AIC100 has on-card DDR. In total, an AIC100 can have up to 32 GB of DDR.
103 This DDR is used to store workloads, data for     103 This DDR is used to store workloads, data for the workloads, and is used by the
104 QSM for managing the device. NSPs are granted     104 QSM for managing the device. NSPs are granted access to sections of the DDR by
105 the QSM. The host does not have direct access     105 the QSM. The host does not have direct access to the DDR, and must make
106 requests to the QSM to transfer data to the DD    106 requests to the QSM to transfer data to the DDR.
107                                                   107 
108 High-level Use Flow                               108 High-level Use Flow
109 ===================                               109 ===================
110                                                   110 
111 AIC100 is a multi-user, programmable accelerat    111 AIC100 is a multi-user, programmable accelerator typically used for running
112 neural networks in inferencing mode to efficie    112 neural networks in inferencing mode to efficiently perform AI operations.
113 AIC100 is not intended for training neural net    113 AIC100 is not intended for training neural networks. AIC100 can be utilized
114 for generic compute workloads.                    114 for generic compute workloads.
115                                                   115 
116 Assuming a user wants to utilize AIC100, they     116 Assuming a user wants to utilize AIC100, they would follow these steps:
117                                                   117 
118 1. Compile the workload into an ELF targeting     118 1. Compile the workload into an ELF targeting the NSP(s)
119 2. Make requests to the QSM to load the worklo    119 2. Make requests to the QSM to load the workload and related artifacts into the
120    device DDR                                     120    device DDR
121 3. Make a request to the QSM to activate the w    121 3. Make a request to the QSM to activate the workload onto a set of idle NSPs
122 4. Make requests to the DMA Bridge to send inp    122 4. Make requests to the DMA Bridge to send input data to the workload to be
123    processed, and other requests to receive pr    123    processed, and other requests to receive processed output data from the
124    workload.                                      124    workload.
125 5. Once the workload is no longer required, ma    125 5. Once the workload is no longer required, make a request to the QSM to
126    deactivate the workload, thus putting the N    126    deactivate the workload, thus putting the NSPs back into an idle state.
127 6. Once the workload and related artifacts are    127 6. Once the workload and related artifacts are no longer needed for future
128    sessions, make requests to the QSM to unloa    128    sessions, make requests to the QSM to unload the data from DDR. This frees
129    the DDR to be used by other users.             129    the DDR to be used by other users.
130                                                   130 
131                                                   131 
132 Boot Flow                                         132 Boot Flow
133 =========                                         133 =========
134                                                   134 
135 AIC100 uses a flashless boot flow, derived fro    135 AIC100 uses a flashless boot flow, derived from Qualcomm MSMs.
136                                                   136 
137 When AIC100 is first powered on, it begins exe    137 When AIC100 is first powered on, it begins executing PBL (Primary Bootloader)
138 from ROM. PBL enumerates the PCIe link, and in    138 from ROM. PBL enumerates the PCIe link, and initializes the BHI (Boot Host
139 Interface) component of MHI.                      139 Interface) component of MHI.
140                                                   140 
141 Using BHI, the host points PBL to the location    141 Using BHI, the host points PBL to the location of the SBL (Secondary Bootloader)
142 image. The PBL pulls the image from the host,     142 image. The PBL pulls the image from the host, validates it, and begins
143 execution of SBL.                                 143 execution of SBL.
144                                                   144 
145 SBL initializes MHI, and uses MHI to notify th    145 SBL initializes MHI, and uses MHI to notify the host that the device has entered
146 the SBL stage. SBL performs a number of operat    146 the SBL stage. SBL performs a number of operations:
147                                                   147 
148 * SBL initializes the majority of hardware (an    148 * SBL initializes the majority of hardware (anything PBL left uninitialized),
149   including DDR.                                  149   including DDR.
150 * SBL offloads the bootlog to the host.           150 * SBL offloads the bootlog to the host.
151 * SBL synchronizes timestamps with the host fo    151 * SBL synchronizes timestamps with the host for future logging.
152 * SBL uses the Sahara protocol to obtain the r    152 * SBL uses the Sahara protocol to obtain the runtime firmware images from the
153   host.                                           153   host.
154                                                   154 
155 Once SBL has obtained and validated the runtim    155 Once SBL has obtained and validated the runtime firmware, it brings the NSPs out
156 of reset, and jumps into the QSM.                 156 of reset, and jumps into the QSM.
157                                                   157 
158 The QSM uses MHI to notify the host that the d    158 The QSM uses MHI to notify the host that the device has entered the QSM stage
159 (AMSS in MHI terms). At this point, the AIC100    159 (AMSS in MHI terms). At this point, the AIC100 device is fully functional, and
160 ready to process workloads.                       160 ready to process workloads.
161                                                   161 
162 Userspace components                              162 Userspace components
163 ====================                              163 ====================
164                                                   164 
165 Compiler                                          165 Compiler
166 --------                                          166 --------
167                                                   167 
168 An open compiler for AIC100 based on upstream     168 An open compiler for AIC100 based on upstream LLVM can be found at:
169 https://github.com/quic/software-kit-for-qualc    169 https://github.com/quic/software-kit-for-qualcomm-cloud-ai-100-cc
170                                                   170 
171 Usermode Driver (UMD)                             171 Usermode Driver (UMD)
172 ---------------------                             172 ---------------------
173                                                   173 
174 An open UMD that interfaces with the qaic kern    174 An open UMD that interfaces with the qaic kernel driver can be found at:
175 https://github.com/quic/software-kit-for-qualc    175 https://github.com/quic/software-kit-for-qualcomm-cloud-ai-100
176                                                   176 
177 Sahara loader                                     177 Sahara loader
178 -------------                                     178 -------------
179                                                   179 
180 An open implementation of the Sahara protocol     180 An open implementation of the Sahara protocol called kickstart can be found at:
181 https://github.com/andersson/qdl                  181 https://github.com/andersson/qdl
182                                                   182 
183 MHI Channels                                      183 MHI Channels
184 ============                                      184 ============
185                                                   185 
186 AIC100 defines a number of MHI channels for di    186 AIC100 defines a number of MHI channels for different purposes. This is a list
187 of the defined channels, and their uses.          187 of the defined channels, and their uses.
188                                                   188 
189 +----------------+---------+----------+-------    189 +----------------+---------+----------+----------------------------------------+
190 | Channel name   | IDs     | EEs      | Purpos    190 | Channel name   | IDs     | EEs      | Purpose                                |
191 +================+=========+==========+=======    191 +================+=========+==========+========================================+
192 | QAIC_LOOPBACK  | 0 & 1   | AMSS     | Any da    192 | QAIC_LOOPBACK  | 0 & 1   | AMSS     | Any data sent to the device on this    |
193 |                |         |          | channe    193 |                |         |          | channel is sent back to the host.      |
194 +----------------+---------+----------+-------    194 +----------------+---------+----------+----------------------------------------+
195 | QAIC_SAHARA    | 2 & 3   | SBL      | Used b    195 | QAIC_SAHARA    | 2 & 3   | SBL      | Used by SBL to obtain the runtime      |
196 |                |         |          | firmwa    196 |                |         |          | firmware from the host.                |
197 +----------------+---------+----------+-------    197 +----------------+---------+----------+----------------------------------------+
198 | QAIC_DIAG      | 4 & 5   | AMSS     | Used t    198 | QAIC_DIAG      | 4 & 5   | AMSS     | Used to communicate with QSM via the   |
199 |                |         |          | DIAG p    199 |                |         |          | DIAG protocol.                         |
200 +----------------+---------+----------+-------    200 +----------------+---------+----------+----------------------------------------+
201 | QAIC_SSR       | 6 & 7   | AMSS     | Used t    201 | QAIC_SSR       | 6 & 7   | AMSS     | Used to notify the host of subsystem   |
202 |                |         |          | restar    202 |                |         |          | restart events, and to offload SSR     |
203 |                |         |          | crashd    203 |                |         |          | crashdumps.                            |
204 +----------------+---------+----------+-------    204 +----------------+---------+----------+----------------------------------------+
205 | QAIC_QDSS      | 8 & 9   | AMSS     | Used f    205 | QAIC_QDSS      | 8 & 9   | AMSS     | Used for the Qualcomm Debug Subsystem. |
206 +----------------+---------+----------+-------    206 +----------------+---------+----------+----------------------------------------+
207 | QAIC_CONTROL   | 10 & 11 | AMSS     | Used f    207 | QAIC_CONTROL   | 10 & 11 | AMSS     | Used for the Neural Network Control    |
208 |                |         |          | (NNC)     208 |                |         |          | (NNC) protocol. This is the primary    |
209 |                |         |          | channe    209 |                |         |          | channel between host and QSM for       |
210 |                |         |          | managi    210 |                |         |          | managing workloads.                    |
211 +----------------+---------+----------+-------    211 +----------------+---------+----------+----------------------------------------+
212 | QAIC_LOGGING   | 12 & 13 | SBL      | Used b    212 | QAIC_LOGGING   | 12 & 13 | SBL      | Used by the SBL to send the bootlog to |
213 |                |         |          | the ho    213 |                |         |          | the host.                              |
214 +----------------+---------+----------+-------    214 +----------------+---------+----------+----------------------------------------+
215 | QAIC_STATUS    | 14 & 15 | AMSS     | Used t    215 | QAIC_STATUS    | 14 & 15 | AMSS     | Used to notify the host of Reliability,|
216 |                |         |          | Access    216 |                |         |          | Accessibility, Serviceability (RAS)    |
217 |                |         |          | events    217 |                |         |          | events.                                |
218 +----------------+---------+----------+-------    218 +----------------+---------+----------+----------------------------------------+
219 | QAIC_TELEMETRY | 16 & 17 | AMSS     | Used t    219 | QAIC_TELEMETRY | 16 & 17 | AMSS     | Used to get/set power/thermal/etc      |
220 |                |         |          | attrib    220 |                |         |          | attributes.                            |
221 +----------------+---------+----------+-------    221 +----------------+---------+----------+----------------------------------------+
222 | QAIC_DEBUG     | 18 & 19 | AMSS     | Not us    222 | QAIC_DEBUG     | 18 & 19 | AMSS     | Not used.                              |
223 +----------------+---------+----------+-------    223 +----------------+---------+----------+----------------------------------------+
224 | QAIC_TIMESYNC  | 20 & 21 | SBL      | Used t    224 | QAIC_TIMESYNC  | 20 & 21 | SBL      | Used to synchronize timestamps in the  |
225 |                |         |          | device    225 |                |         |          | device side logs with the host time    |
226 |                |         |          | source    226 |                |         |          | source.                                |
227 +----------------+---------+----------+-------    227 +----------------+---------+----------+----------------------------------------+
228 | QAIC_TIMESYNC  | 22 & 23 | AMSS     | Used t    228 | QAIC_TIMESYNC  | 22 & 23 | AMSS     | Used to periodically synchronize       |
229 | _PERIODIC      |         |          | timest    229 | _PERIODIC      |         |          | timestamps in the device side logs with|
230 |                |         |          | the ho    230 |                |         |          | the host time source.                  |
231 +----------------+---------+----------+-------    231 +----------------+---------+----------+----------------------------------------+
232                                                   232 
233 DMA Bridge                                        233 DMA Bridge
234 ==========                                        234 ==========
235                                                   235 
236 Overview                                          236 Overview
237 --------                                          237 --------
238                                                   238 
239 The DMA Bridge is one of the main interfaces t    239 The DMA Bridge is one of the main interfaces to the host from the device
240 (the other being MHI). As part of activating a    240 (the other being MHI). As part of activating a workload to run on NSPs, the QSM
241 assigns that network a DMA Bridge channel. A w    241 assigns that network a DMA Bridge channel. A workload's DMA Bridge channel
242 (DBC for short) is solely for the use of that     242 (DBC for short) is solely for the use of that workload and is not shared with
243 other workloads.                                  243 other workloads.
244                                                   244 
245 Each DBC is a pair of FIFOs that manage data i    245 Each DBC is a pair of FIFOs that manage data in and out of the workload. One
246 FIFO is the request FIFO. The other FIFO is th    246 FIFO is the request FIFO. The other FIFO is the response FIFO.
247                                                   247 
248 Each DBC contains 4 registers in hardware:        248 Each DBC contains 4 registers in hardware:
249                                                   249 
250 * Request FIFO head pointer (offset 0x0). Read    250 * Request FIFO head pointer (offset 0x0). Read only by the host. Indicates the
251   latest item in the FIFO the device has consu    251   latest item in the FIFO the device has consumed.
252 * Request FIFO tail pointer (offset 0x4). Read    252 * Request FIFO tail pointer (offset 0x4). Read/write by the host. Host
253   increments this register to add new items to    253   increments this register to add new items to the FIFO.
254 * Response FIFO head pointer (offset 0x8). Rea    254 * Response FIFO head pointer (offset 0x8). Read/write by the host. Indicates
255   the latest item in the FIFO the host has con    255   the latest item in the FIFO the host has consumed.
256 * Response FIFO tail pointer (offset 0xc). Rea    256 * Response FIFO tail pointer (offset 0xc). Read only by the host. Device
257   increments this register to add new items to    257   increments this register to add new items to the FIFO.
258                                                   258 
259 The values in each register are indexes in the    259 The values in each register are indexes in the FIFO. To get the location of the
260 FIFO element pointed to by the register: FIFO     260 FIFO element pointed to by the register: FIFO base address + register * element
261 size.                                             261 size.
262                                                   262 
263 DBC registers are exposed to the host via the     263 DBC registers are exposed to the host via the second BAR. Each DBC consumes
264 4KB of space in the BAR.                          264 4KB of space in the BAR.
265                                                   265 
266 The actual FIFOs are backed by host memory. Wh    266 The actual FIFOs are backed by host memory. When sending a request to the QSM
267 to activate a network, the host must donate me    267 to activate a network, the host must donate memory to be used for the FIFOs.
268 Due to internal mapping limitations of the dev    268 Due to internal mapping limitations of the device, a single contiguous chunk of
269 memory must be provided per DBC, which hosts b    269 memory must be provided per DBC, which hosts both FIFOs. The request FIFO will
270 consume the beginning of the memory chunk, and    270 consume the beginning of the memory chunk, and the response FIFO will consume
271 the end of the memory chunk.                      271 the end of the memory chunk.
272                                                   272 
273 Request FIFO                                      273 Request FIFO
274 ------------                                      274 ------------
275                                                   275 
276 A request FIFO element has the following struc    276 A request FIFO element has the following structure:
277                                                   277 
278 .. code-block:: c                                 278 .. code-block:: c
279                                                   279 
280   struct request_elem {                           280   struct request_elem {
281         u16 req_id;                               281         u16 req_id;
282         u8  seq_id;                               282         u8  seq_id;
283         u8  pcie_dma_cmd;                         283         u8  pcie_dma_cmd;
284         u32 reserved;                             284         u32 reserved;
285         u64 pcie_dma_source_addr;                 285         u64 pcie_dma_source_addr;
286         u64 pcie_dma_dest_addr;                   286         u64 pcie_dma_dest_addr;
287         u32 pcie_dma_len;                         287         u32 pcie_dma_len;
288         u32 reserved;                             288         u32 reserved;
289         u64 doorbell_addr;                        289         u64 doorbell_addr;
290         u8  doorbell_attr;                        290         u8  doorbell_attr;
291         u8  reserved;                             291         u8  reserved;
292         u16 reserved;                             292         u16 reserved;
293         u32 doorbell_data;                        293         u32 doorbell_data;
294         u32 sem_cmd0;                             294         u32 sem_cmd0;
295         u32 sem_cmd1;                             295         u32 sem_cmd1;
296         u32 sem_cmd2;                             296         u32 sem_cmd2;
297         u32 sem_cmd3;                             297         u32 sem_cmd3;
298   };                                              298   };
299                                                   299 
300 Request field descriptions:                       300 Request field descriptions:
301                                                   301 
302 req_id                                            302 req_id
303         request ID. A request FIFO element and    303         request ID. A request FIFO element and a response FIFO element with
304         the same request ID refer to the same     304         the same request ID refer to the same command.
305                                                   305 
306 seq_id                                            306 seq_id
307         sequence ID within a request. Ignored     307         sequence ID within a request. Ignored by the DMA Bridge.
308                                                   308 
309 pcie_dma_cmd                                      309 pcie_dma_cmd
310         describes the DMA element of this requ    310         describes the DMA element of this request.
311                                                   311 
312         * Bit(7) is the force msi flag, which     312         * Bit(7) is the force msi flag, which overrides the DMA Bridge MSI logic
313           and generates a MSI when this reques    313           and generates a MSI when this request is complete, and QSM
314           configures the DMA Bridge to look at    314           configures the DMA Bridge to look at this bit.
315         * Bits(6:5) are reserved.                 315         * Bits(6:5) are reserved.
316         * Bit(4) is the completion code flag,     316         * Bit(4) is the completion code flag, and indicates that the DMA Bridge
317           shall generate a response FIFO eleme    317           shall generate a response FIFO element when this request is
318           complete.                               318           complete.
319         * Bit(3) indicates if this request is     319         * Bit(3) indicates if this request is a linked list transfer(0) or a bulk
320           transfer(1).                            320           transfer(1).
321         * Bit(2) is reserved.                     321         * Bit(2) is reserved.
322         * Bits(1:0) indicate the type of trans    322         * Bits(1:0) indicate the type of transfer. No transfer(0), to device(1),
323           from device(2). Value 3 is illegal.     323           from device(2). Value 3 is illegal.
324                                                   324 
325 pcie_dma_source_addr                              325 pcie_dma_source_addr
326         source address for a bulk transfer, or    326         source address for a bulk transfer, or the address of the linked list.
327                                                   327 
328 pcie_dma_dest_addr                                328 pcie_dma_dest_addr
329         destination address for a bulk transfe    329         destination address for a bulk transfer.
330                                                   330 
331 pcie_dma_len                                      331 pcie_dma_len
332         length of the bulk transfer. Note that    332         length of the bulk transfer. Note that the size of this field
333         limits transfers to 4G in size.           333         limits transfers to 4G in size.
334                                                   334 
335 doorbell_addr                                     335 doorbell_addr
336         address of the doorbell to ring when t    336         address of the doorbell to ring when this request is complete.
337                                                   337 
338 doorbell_attr                                     338 doorbell_attr
339         doorbell attributes.                      339         doorbell attributes.
340                                                   340 
341         * Bit(7) indicates if a write to a doo    341         * Bit(7) indicates if a write to a doorbell is to occur.
342         * Bits(6:2) are reserved.                 342         * Bits(6:2) are reserved.
343         * Bits(1:0) contain the encoding of th    343         * Bits(1:0) contain the encoding of the doorbell length. 0 is 32-bit,
344           1 is 16-bit, 2 is 8-bit, 3 is reserv    344           1 is 16-bit, 2 is 8-bit, 3 is reserved. The doorbell address
345           must be naturally aligned to the spe    345           must be naturally aligned to the specified length.
346                                                   346 
347 doorbell_data                                     347 doorbell_data
348         data to write to the doorbell. Only th    348         data to write to the doorbell. Only the bits corresponding to
349         the doorbell length are valid.            349         the doorbell length are valid.
350                                                   350 
351 sem_cmdN                                          351 sem_cmdN
352         semaphore command.                        352         semaphore command.
353                                                   353 
354         * Bit(31) indicates this semaphore com    354         * Bit(31) indicates this semaphore command is enabled.
355         * Bit(30) is the to-device DMA fence.     355         * Bit(30) is the to-device DMA fence. Block this request until all
356           to-device DMA transfers are complete    356           to-device DMA transfers are complete.
357         * Bit(29) is the from-device DMA fence    357         * Bit(29) is the from-device DMA fence. Block this request until all
358           from-device DMA transfers are comple    358           from-device DMA transfers are complete.
359         * Bits(28:27) are reserved.               359         * Bits(28:27) are reserved.
360         * Bits(26:24) are the semaphore comman    360         * Bits(26:24) are the semaphore command. 0 is NOP. 1 is init with the
361           specified value. 2 is increment. 3 i    361           specified value. 2 is increment. 3 is decrement. 4 is wait
362           until the semaphore is equal to the     362           until the semaphore is equal to the specified value. 5 is wait
363           until the semaphore is greater or eq    363           until the semaphore is greater or equal to the specified value.
364           6 is "P", wait until semaphore is gr    364           6 is "P", wait until semaphore is greater than 0, then
365           decrement by 1. 7 is reserved.          365           decrement by 1. 7 is reserved.
366         * Bit(23) is reserved.                    366         * Bit(23) is reserved.
367         * Bit(22) is the semaphore sync. 0 is     367         * Bit(22) is the semaphore sync. 0 is post sync, which means that the
368           semaphore operation is done after th    368           semaphore operation is done after the DMA transfer. 1 is
369           presync, which gates the DMA transfe    369           presync, which gates the DMA transfer. Only one presync is
370           allowed per request.                    370           allowed per request.
371         * Bit(21) is reserved.                    371         * Bit(21) is reserved.
372         * Bits(20:16) is the index of the sema    372         * Bits(20:16) is the index of the semaphore to operate on.
373         * Bits(15:12) are reserved.               373         * Bits(15:12) are reserved.
374         * Bits(11:0) are the semaphore value t    374         * Bits(11:0) are the semaphore value to use in operations.
375                                                   375 
376 Overall, a request is processed in 4 steps:       376 Overall, a request is processed in 4 steps:
377                                                   377 
378 1. If specified, the presync semaphore conditi    378 1. If specified, the presync semaphore condition must be true
379 2. If enabled, the DMA transfer occurs            379 2. If enabled, the DMA transfer occurs
380 3. If specified, the postsync semaphore condit    380 3. If specified, the postsync semaphore conditions must be true
381 4. If enabled, the doorbell is written            381 4. If enabled, the doorbell is written
382                                                   382 
383 By using the semaphores in conjunction with th    383 By using the semaphores in conjunction with the workload running on the NSPs,
384 the data pipeline can be synchronized such tha    384 the data pipeline can be synchronized such that the host can queue multiple
385 requests of data for the workload to process,     385 requests of data for the workload to process, but the DMA Bridge will only copy
386 the data into the memory of the workload when     386 the data into the memory of the workload when the workload is ready to process
387 the next input.                                   387 the next input.
388                                                   388 
389 Response FIFO                                     389 Response FIFO
390 -------------                                     390 -------------
391                                                   391 
392 Once a request is fully processed, a response     392 Once a request is fully processed, a response FIFO element is generated if
393 specified in pcie_dma_cmd. The structure of a     393 specified in pcie_dma_cmd. The structure of a response FIFO element:
394                                                   394 
395 .. code-block:: c                                 395 .. code-block:: c
396                                                   396 
397   struct response_elem {                          397   struct response_elem {
398         u16 req_id;                               398         u16 req_id;
399         u16 completion_code;                      399         u16 completion_code;
400   };                                              400   };
401                                                   401 
402 req_id                                            402 req_id
403         matches the req_id of the request that    403         matches the req_id of the request that generated this element.
404                                                   404 
405 completion_code                                   405 completion_code
406         status of this request. 0 is success.     406         status of this request. 0 is success. Non-zero is an error.
407                                                   407 
408 The DMA Bridge will generate a MSI to the host    408 The DMA Bridge will generate a MSI to the host as a reaction to activity in the
409 response FIFO of a DBC. The DMA Bridge hardwar    409 response FIFO of a DBC. The DMA Bridge hardware has an IRQ storm mitigation
410 algorithm, where it will only generate a MSI w    410 algorithm, where it will only generate a MSI when the response FIFO transitions
411 from empty to non-empty (unless force MSI is e    411 from empty to non-empty (unless force MSI is enabled and triggered). In
412 response to this MSI, the host is expected to     412 response to this MSI, the host is expected to drain the response FIFO, and must
413 take care to handle any race conditions betwee    413 take care to handle any race conditions between draining the FIFO, and the
414 device inserting elements into the FIFO.          414 device inserting elements into the FIFO.
415                                                   415 
416 Neural Network Control (NNC) Protocol             416 Neural Network Control (NNC) Protocol
417 =====================================             417 =====================================
418                                                   418 
419 The NNC protocol is how the host makes request    419 The NNC protocol is how the host makes requests to the QSM to manage workloads.
420 It uses the QAIC_CONTROL MHI channel.             420 It uses the QAIC_CONTROL MHI channel.
421                                                   421 
422 Each NNC request is packaged into a message. E    422 Each NNC request is packaged into a message. Each message is a series of
423 transactions. A passthrough type transaction c    423 transactions. A passthrough type transaction can contain elements known as
424 commands.                                         424 commands.
425                                                   425 
426 QSM requires NNC messages be little endian enc    426 QSM requires NNC messages be little endian encoded and the fields be naturally
427 aligned. Since there are 64-bit elements in so    427 aligned. Since there are 64-bit elements in some NNC messages, 64-bit alignment
428 must be maintained.                               428 must be maintained.
429                                                   429 
430 A message contains a header and then a series     430 A message contains a header and then a series of transactions. A message may be
431 at most 4K in size from QSM to the host. From     431 at most 4K in size from QSM to the host. From the host to the QSM, a message
432 can be at most 64K (maximum size of a single M    432 can be at most 64K (maximum size of a single MHI packet), but there is a
433 continuation feature where message N+1 can be     433 continuation feature where message N+1 can be marked as a continuation of
434 message N. This is used for exceedingly large     434 message N. This is used for exceedingly large DMA xfer transactions.
435                                                   435 
436 Transaction descriptions                          436 Transaction descriptions
437 ------------------------                          437 ------------------------
438                                                   438 
439 passthrough                                       439 passthrough
440         Allows userspace to send an opaque pay    440         Allows userspace to send an opaque payload directly to the QSM.
441         This is used for NNC commands. Userspa    441         This is used for NNC commands. Userspace is responsible for managing
442         the QSM message requirements in the pa    442         the QSM message requirements in the payload.
443                                                   443 
444 dma_xfer                                          444 dma_xfer
445         DMA transfer. Describes an object that    445         DMA transfer. Describes an object that the QSM should DMA into the
446         device via address and size tuples.       446         device via address and size tuples.
447                                                   447 
448 activate                                          448 activate
449         Activate a workload onto NSPs. The hos    449         Activate a workload onto NSPs. The host must provide memory to be
450         used by the DBC.                          450         used by the DBC.
451                                                   451 
452 deactivate                                        452 deactivate
453         Deactivate an active workload and retu    453         Deactivate an active workload and return the NSPs to idle.
454                                                   454 
455 status                                            455 status
456         Query the QSM about it's NNC implement    456         Query the QSM about it's NNC implementation. Returns the NNC version,
457         and if CRC is used.                       457         and if CRC is used.
458                                                   458 
459 terminate                                         459 terminate
460         Release a user's resources.               460         Release a user's resources.
461                                                   461 
462 dma_xfer_cont                                     462 dma_xfer_cont
463         Continuation of a previous DMA transfe    463         Continuation of a previous DMA transfer. If a DMA transfer
464         cannot be specified in a single messag    464         cannot be specified in a single message (highly fragmented), this
465         transaction can be used to specify mor    465         transaction can be used to specify more ranges.
466                                                   466 
467 validate_partition                                467 validate_partition
468         Query to QSM to determine if a partiti    468         Query to QSM to determine if a partition identifier is valid.
469                                                   469 
470 Each message is tagged with a user id, and a p    470 Each message is tagged with a user id, and a partition id. The user id allows
471 QSM to track resources, and release them when     471 QSM to track resources, and release them when the user goes away (eg the process
472 crashes). A partition id identifies the resour    472 crashes). A partition id identifies the resource partition that QSM manages,
473 which this message applies to.                    473 which this message applies to.
474                                                   474 
475 Messages may have CRCs. Messages should have C    475 Messages may have CRCs. Messages should have CRCs applied until the QSM
476 reports via the status transaction that CRCs a    476 reports via the status transaction that CRCs are not needed. The QSM on the
477 SA9000P requires CRCs for black channel safing    477 SA9000P requires CRCs for black channel safing.
478                                                   478 
479 Subsystem Restart (SSR)                           479 Subsystem Restart (SSR)
480 =======================                           480 =======================
481                                                   481 
482 SSR is the concept of limiting the impact of a    482 SSR is the concept of limiting the impact of an error. An AIC100 device may
483 have multiple users, each with their own workl    483 have multiple users, each with their own workload running. If the workload of
484 one user crashes, the fallout of that should b    484 one user crashes, the fallout of that should be limited to that workload and not
485 impact other workloads. SSR accomplishes this.    485 impact other workloads. SSR accomplishes this.
486                                                   486 
487 If a particular workload crashes, QSM notifies    487 If a particular workload crashes, QSM notifies the host via the QAIC_SSR MHI
488 channel. This notification identifies the work    488 channel. This notification identifies the workload by it's assigned DBC. A
489 multi-stage recovery process is then used to c    489 multi-stage recovery process is then used to cleanup both sides, and get the
490 DBC/NSPs into a working state.                    490 DBC/NSPs into a working state.
491                                                   491 
492 When SSR occurs, any state in the workload is     492 When SSR occurs, any state in the workload is lost. Any inputs that were in
493 process, or queued by not yet serviced, are lo    493 process, or queued by not yet serviced, are lost. The loaded artifacts will
494 remain in on-card DDR, but the host will need     494 remain in on-card DDR, but the host will need to re-activate the workload if
495 it desires to recover the workload.               495 it desires to recover the workload.
496                                                   496 
497 Reliability, Accessibility, Serviceability (RA    497 Reliability, Accessibility, Serviceability (RAS)
498 ==============================================    498 ================================================
499                                                   499 
500 AIC100 is expected to be deployed in server sy    500 AIC100 is expected to be deployed in server systems where RAS ideology is
501 applied. Simply put, RAS is the concept of det    501 applied. Simply put, RAS is the concept of detecting, classifying, and
502 reporting errors. While PCIe has AER (Advanced    502 reporting errors. While PCIe has AER (Advanced Error Reporting) which factors
503 into RAS, AER does not allow for a device to r    503 into RAS, AER does not allow for a device to report details about internal
504 errors. Therefore, AIC100 implements a custom     504 errors. Therefore, AIC100 implements a custom RAS mechanism. When a RAS event
505 occurs, QSM will report the event with appropr    505 occurs, QSM will report the event with appropriate details via the QAIC_STATUS
506 MHI channel. A sysadmin may determine that a p    506 MHI channel. A sysadmin may determine that a particular device needs
507 additional service based on RAS reports.          507 additional service based on RAS reports.
508                                                   508 
509 Telemetry                                         509 Telemetry
510 =========                                         510 =========
511                                                   511 
512 QSM has the ability to report various physical    512 QSM has the ability to report various physical attributes of the device, and in
513 some cases, to allow the host to control them.    513 some cases, to allow the host to control them. Examples include thermal limits,
514 thermal readings, and power readings. These it    514 thermal readings, and power readings. These items are communicated via the
515 QAIC_TELEMETRY MHI channel.                       515 QAIC_TELEMETRY MHI channel.
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php