~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/arch/powerpc/cxl.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 ====================================
  2 Coherent Accelerator Interface (CXL)
  3 ====================================
  4 
  5 Introduction
  6 ============
  7 
  8     The coherent accelerator interface is designed to allow the
  9     coherent connection of accelerators (FPGAs and other devices) to a
 10     POWER system. These devices need to adhere to the Coherent
 11     Accelerator Interface Architecture (CAIA).
 12 
 13     IBM refers to this as the Coherent Accelerator Processor Interface
 14     or CAPI. In the kernel it's referred to by the name CXL to avoid
 15     confusion with the ISDN CAPI subsystem.
 16 
 17     Coherent in this context means that the accelerator and CPUs can
 18     both access system memory directly and with the same effective
 19     addresses.
 20 
 21 
 22 Hardware overview
 23 =================
 24 
 25     ::
 26 
 27          POWER8/9             FPGA
 28        +----------+        +---------+
 29        |          |        |         |
 30        |   CPU    |        |   AFU   |
 31        |          |        |         |
 32        |          |        |         |
 33        |          |        |         |
 34        +----------+        +---------+
 35        |   PHB    |        |         |
 36        |   +------+        |   PSL   |
 37        |   | CAPP |<------>|         |
 38        +---+------+  PCIE  +---------+
 39 
 40     The POWER8/9 chip has a Coherently Attached Processor Proxy (CAPP)
 41     unit which is part of the PCIe Host Bridge (PHB). This is managed
 42     by Linux by calls into OPAL. Linux doesn't directly program the
 43     CAPP.
 44 
 45     The FPGA (or coherently attached device) consists of two parts.
 46     The POWER Service Layer (PSL) and the Accelerator Function Unit
 47     (AFU). The AFU is used to implement specific functionality behind
 48     the PSL. The PSL, among other things, provides memory address
 49     translation services to allow each AFU direct access to userspace
 50     memory.
 51 
 52     The AFU is the core part of the accelerator (eg. the compression,
 53     crypto etc function). The kernel has no knowledge of the function
 54     of the AFU. Only userspace interacts directly with the AFU.
 55 
 56     The PSL provides the translation and interrupt services that the
 57     AFU needs. This is what the kernel interacts with. For example, if
 58     the AFU needs to read a particular effective address, it sends
 59     that address to the PSL, the PSL then translates it, fetches the
 60     data from memory and returns it to the AFU. If the PSL has a
 61     translation miss, it interrupts the kernel and the kernel services
 62     the fault. The context to which this fault is serviced is based on
 63     who owns that acceleration function.
 64 
 65     - POWER8 and PSL Version 8 are compliant to the CAIA Version 1.0.
 66     - POWER9 and PSL Version 9 are compliant to the CAIA Version 2.0.
 67 
 68     This PSL Version 9 provides new features such as:
 69 
 70     * Interaction with the nest MMU on the P9 chip.
 71     * Native DMA support.
 72     * Supports sending ASB_Notify messages for host thread wakeup.
 73     * Supports Atomic operations.
 74     * etc.
 75 
 76     Cards with a PSL9 won't work on a POWER8 system and cards with a
 77     PSL8 won't work on a POWER9 system.
 78 
 79 AFU Modes
 80 =========
 81 
 82     There are two programming modes supported by the AFU. Dedicated
 83     and AFU directed. AFU may support one or both modes.
 84 
 85     When using dedicated mode only one MMU context is supported. In
 86     this mode, only one userspace process can use the accelerator at
 87     time.
 88 
 89     When using AFU directed mode, up to 16K simultaneous contexts can
 90     be supported. This means up to 16K simultaneous userspace
 91     applications may use the accelerator (although specific AFUs may
 92     support fewer). In this mode, the AFU sends a 16 bit context ID
 93     with each of its requests. This tells the PSL which context is
 94     associated with each operation. If the PSL can't translate an
 95     operation, the ID can also be accessed by the kernel so it can
 96     determine the userspace context associated with an operation.
 97 
 98 
 99 MMIO space
100 ==========
101 
102     A portion of the accelerator MMIO space can be directly mapped
103     from the AFU to userspace. Either the whole space can be mapped or
104     just a per context portion. The hardware is self describing, hence
105     the kernel can determine the offset and size of the per context
106     portion.
107 
108 
109 Interrupts
110 ==========
111 
112     AFUs may generate interrupts that are destined for userspace. These
113     are received by the kernel as hardware interrupts and passed onto
114     userspace by a read syscall documented below.
115 
116     Data storage faults and error interrupts are handled by the kernel
117     driver.
118 
119 
120 Work Element Descriptor (WED)
121 =============================
122 
123     The WED is a 64-bit parameter passed to the AFU when a context is
124     started. Its format is up to the AFU hence the kernel has no
125     knowledge of what it represents. Typically it will be the
126     effective address of a work queue or status block where the AFU
127     and userspace can share control and status information.
128 
129 
130 
131 
132 User API
133 ========
134 
135 1. AFU character devices
136 ^^^^^^^^^^^^^^^^^^^^^^^^
137 
138     For AFUs operating in AFU directed mode, two character device
139     files will be created. /dev/cxl/afu0.0m will correspond to a
140     master context and /dev/cxl/afu0.0s will correspond to a slave
141     context. Master contexts have access to the full MMIO space an
142     AFU provides. Slave contexts have access to only the per process
143     MMIO space an AFU provides.
144 
145     For AFUs operating in dedicated process mode, the driver will
146     only create a single character device per AFU called
147     /dev/cxl/afu0.0d. This will have access to the entire MMIO space
148     that the AFU provides (like master contexts in AFU directed).
149 
150     The types described below are defined in include/uapi/misc/cxl.h
151 
152     The following file operations are supported on both slave and
153     master devices.
154 
155     A userspace library libcxl is available here:
156 
157         https://github.com/ibm-capi/libcxl
158 
159     This provides a C interface to this kernel API.
160 
161 open
162 ----
163 
164     Opens the device and allocates a file descriptor to be used with
165     the rest of the API.
166 
167     A dedicated mode AFU only has one context and only allows the
168     device to be opened once.
169 
170     An AFU directed mode AFU can have many contexts, the device can be
171     opened once for each context that is available.
172 
173     When all available contexts are allocated the open call will fail
174     and return -ENOSPC.
175 
176     Note:
177           IRQs need to be allocated for each context, which may limit
178           the number of contexts that can be created, and therefore
179           how many times the device can be opened. The POWER8 CAPP
180           supports 2040 IRQs and 3 are used by the kernel, so 2037 are
181           left. If 1 IRQ is needed per context, then only 2037
182           contexts can be allocated. If 4 IRQs are needed per context,
183           then only 2037/4 = 509 contexts can be allocated.
184 
185 
186 ioctl
187 -----
188 
189     CXL_IOCTL_START_WORK:
190         Starts the AFU context and associates it with the current
191         process. Once this ioctl is successfully executed, all memory
192         mapped into this process is accessible to this AFU context
193         using the same effective addresses. No additional calls are
194         required to map/unmap memory. The AFU memory context will be
195         updated as userspace allocates and frees memory. This ioctl
196         returns once the AFU context is started.
197 
198         Takes a pointer to a struct cxl_ioctl_start_work
199 
200             ::
201 
202                 struct cxl_ioctl_start_work {
203                         __u64 flags;
204                         __u64 work_element_descriptor;
205                         __u64 amr;
206                         __s16 num_interrupts;
207                         __s16 reserved1;
208                         __s32 reserved2;
209                         __u64 reserved3;
210                         __u64 reserved4;
211                         __u64 reserved5;
212                         __u64 reserved6;
213                 };
214 
215             flags:
216                 Indicates which optional fields in the structure are
217                 valid.
218 
219             work_element_descriptor:
220                 The Work Element Descriptor (WED) is a 64-bit argument
221                 defined by the AFU. Typically this is an effective
222                 address pointing to an AFU specific structure
223                 describing what work to perform.
224 
225             amr:
226                 Authority Mask Register (AMR), same as the powerpc
227                 AMR. This field is only used by the kernel when the
228                 corresponding CXL_START_WORK_AMR value is specified in
229                 flags. If not specified the kernel will use a default
230                 value of 0.
231 
232             num_interrupts:
233                 Number of userspace interrupts to request. This field
234                 is only used by the kernel when the corresponding
235                 CXL_START_WORK_NUM_IRQS value is specified in flags.
236                 If not specified the minimum number required by the
237                 AFU will be allocated. The min and max number can be
238                 obtained from sysfs.
239 
240             reserved fields:
241                 For ABI padding and future extensions
242 
243     CXL_IOCTL_GET_PROCESS_ELEMENT:
244         Get the current context id, also known as the process element.
245         The value is returned from the kernel as a __u32.
246 
247 
248 mmap
249 ----
250 
251     An AFU may have an MMIO space to facilitate communication with the
252     AFU. If it does, the MMIO space can be accessed via mmap. The size
253     and contents of this area are specific to the particular AFU. The
254     size can be discovered via sysfs.
255 
256     In AFU directed mode, master contexts are allowed to map all of
257     the MMIO space and slave contexts are allowed to only map the per
258     process MMIO space associated with the context. In dedicated
259     process mode the entire MMIO space can always be mapped.
260 
261     This mmap call must be done after the START_WORK ioctl.
262 
263     Care should be taken when accessing MMIO space. Only 32 and 64-bit
264     accesses are supported by POWER8. Also, the AFU will be designed
265     with a specific endianness, so all MMIO accesses should consider
266     endianness (recommend endian(3) variants like: le64toh(),
267     be64toh() etc). These endian issues equally apply to shared memory
268     queues the WED may describe.
269 
270 
271 read
272 ----
273 
274     Reads events from the AFU. Blocks if no events are pending
275     (unless O_NONBLOCK is supplied). Returns -EIO in the case of an
276     unrecoverable error or if the card is removed.
277 
278     read() will always return an integral number of events.
279 
280     The buffer passed to read() must be at least 4K bytes.
281 
282     The result of the read will be a buffer of one or more events,
283     each event is of type struct cxl_event, of varying size::
284 
285             struct cxl_event {
286                     struct cxl_event_header header;
287                     union {
288                             struct cxl_event_afu_interrupt irq;
289                             struct cxl_event_data_storage fault;
290                             struct cxl_event_afu_error afu_error;
291                     };
292             };
293 
294     The struct cxl_event_header is defined as
295 
296         ::
297 
298             struct cxl_event_header {
299                     __u16 type;
300                     __u16 size;
301                     __u16 process_element;
302                     __u16 reserved1;
303             };
304 
305         type:
306             This defines the type of event. The type determines how
307             the rest of the event is structured. These types are
308             described below and defined by enum cxl_event_type.
309 
310         size:
311             This is the size of the event in bytes including the
312             struct cxl_event_header. The start of the next event can
313             be found at this offset from the start of the current
314             event.
315 
316         process_element:
317             Context ID of the event.
318 
319         reserved field:
320             For future extensions and padding.
321 
322     If the event type is CXL_EVENT_AFU_INTERRUPT then the event
323     structure is defined as
324 
325         ::
326 
327             struct cxl_event_afu_interrupt {
328                     __u16 flags;
329                     __u16 irq; /* Raised AFU interrupt number */
330                     __u32 reserved1;
331             };
332 
333         flags:
334             These flags indicate which optional fields are present
335             in this struct. Currently all fields are mandatory.
336 
337         irq:
338             The IRQ number sent by the AFU.
339 
340         reserved field:
341             For future extensions and padding.
342 
343     If the event type is CXL_EVENT_DATA_STORAGE then the event
344     structure is defined as
345 
346         ::
347 
348             struct cxl_event_data_storage {
349                     __u16 flags;
350                     __u16 reserved1;
351                     __u32 reserved2;
352                     __u64 addr;
353                     __u64 dsisr;
354                     __u64 reserved3;
355             };
356 
357         flags:
358             These flags indicate which optional fields are present in
359             this struct. Currently all fields are mandatory.
360 
361         address:
362             The address that the AFU unsuccessfully attempted to
363             access. Valid accesses will be handled transparently by the
364             kernel but invalid accesses will generate this event.
365 
366         dsisr:
367             This field gives information on the type of fault. It is a
368             copy of the DSISR from the PSL hardware when the address
369             fault occurred. The form of the DSISR is as defined in the
370             CAIA.
371 
372         reserved fields:
373             For future extensions
374 
375     If the event type is CXL_EVENT_AFU_ERROR then the event structure
376     is defined as
377 
378         ::
379 
380             struct cxl_event_afu_error {
381                     __u16 flags;
382                     __u16 reserved1;
383                     __u32 reserved2;
384                     __u64 error;
385             };
386 
387         flags:
388             These flags indicate which optional fields are present in
389             this struct. Currently all fields are Mandatory.
390 
391         error:
392             Error status from the AFU. Defined by the AFU.
393 
394         reserved fields:
395             For future extensions and padding
396 
397 
398 2. Card character device (powerVM guest only)
399 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
400 
401     In a powerVM guest, an extra character device is created for the
402     card. The device is only used to write (flash) a new image on the
403     FPGA accelerator. Once the image is written and verified, the
404     device tree is updated and the card is reset to reload the updated
405     image.
406 
407 open
408 ----
409 
410     Opens the device and allocates a file descriptor to be used with
411     the rest of the API. The device can only be opened once.
412 
413 ioctl
414 -----
415 
416 CXL_IOCTL_DOWNLOAD_IMAGE / CXL_IOCTL_VALIDATE_IMAGE:
417     Starts and controls flashing a new FPGA image. Partial
418     reconfiguration is not supported (yet), so the image must contain
419     a copy of the PSL and AFU(s). Since an image can be quite large,
420     the caller may have to iterate, splitting the image in smaller
421     chunks.
422 
423     Takes a pointer to a struct cxl_adapter_image::
424 
425         struct cxl_adapter_image {
426             __u64 flags;
427             __u64 data;
428             __u64 len_data;
429             __u64 len_image;
430             __u64 reserved1;
431             __u64 reserved2;
432             __u64 reserved3;
433             __u64 reserved4;
434         };
435 
436     flags:
437         These flags indicate which optional fields are present in
438         this struct. Currently all fields are mandatory.
439 
440     data:
441         Pointer to a buffer with part of the image to write to the
442         card.
443 
444     len_data:
445         Size of the buffer pointed to by data.
446 
447     len_image:
448         Full size of the image.
449 
450 
451 Sysfs Class
452 ===========
453 
454     A cxl sysfs class is added under /sys/class/cxl to facilitate
455     enumeration and tuning of the accelerators. Its layout is
456     described in Documentation/ABI/testing/sysfs-class-cxl
457 
458 
459 Udev rules
460 ==========
461 
462     The following udev rules could be used to create a symlink to the
463     most logical chardev to use in any programming mode (afuX.Yd for
464     dedicated, afuX.Ys for afu directed), since the API is virtually
465     identical for each::
466 
467         SUBSYSTEM=="cxl", ATTRS{mode}=="dedicated_process", SYMLINK="cxl/%b"
468         SUBSYSTEM=="cxl", ATTRS{mode}=="afu_directed", \
469                           KERNEL=="afu[0-9]*.[0-9]*s", SYMLINK="cxl/%b"

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php