~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/accel/introduction.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/accel/introduction.rst (Version linux-6.11.5) and /Documentation/accel/introduction.rst (Version linux-6.5.13)


  1 .. SPDX-License-Identifier: GPL-2.0                 1 .. SPDX-License-Identifier: GPL-2.0
  2                                                     2 
  3 ============                                        3 ============
  4 Introduction                                        4 Introduction
  5 ============                                        5 ============
  6                                                     6 
  7 The Linux compute accelerators subsystem is de      7 The Linux compute accelerators subsystem is designed to expose compute
  8 accelerators in a common way to user-space and      8 accelerators in a common way to user-space and provide a common set of
  9 functionality.                                      9 functionality.
 10                                                    10 
 11 These devices can be either stand-alone ASICs      11 These devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU.
 12 Although these devices are typically designed      12 Although these devices are typically designed to accelerate
 13 Machine-Learning (ML) and/or Deep-Learning (DL     13 Machine-Learning (ML) and/or Deep-Learning (DL) computations, the accel layer
 14 is not limited to handling these types of acce     14 is not limited to handling these types of accelerators.
 15                                                    15 
 16 Typically, a compute accelerator will belong t     16 Typically, a compute accelerator will belong to one of the following
 17 categories:                                        17 categories:
 18                                                    18 
 19 - Edge AI - doing inference at an edge device.     19 - Edge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA,
 20   or an IP inside a SoC (e.g. laptop web camer     20   or an IP inside a SoC (e.g. laptop web camera). These devices
 21   are typically configured using registers and     21   are typically configured using registers and can work with or without DMA.
 22                                                    22 
 23 - Inference data-center - single/multi user de     23 - Inference data-center - single/multi user devices in a large server. This
 24   type of device can be stand-alone or an IP i     24   type of device can be stand-alone or an IP inside a SoC or a GPU. It will
 25   have on-board DRAM (to hold the DL topology)     25   have on-board DRAM (to hold the DL topology), DMA engines and
 26   command submission queues (either kernel or      26   command submission queues (either kernel or user-space queues).
 27   It might also have an MMU to manage multiple     27   It might also have an MMU to manage multiple users and might also enable
 28   virtualization (SR-IOV) to support multiple      28   virtualization (SR-IOV) to support multiple VMs on the same device. In
 29   addition, these devices will usually have so     29   addition, these devices will usually have some tools, such as profiler and
 30   debugger.                                        30   debugger.
 31                                                    31 
 32 - Training data-center - Similar to Inference      32 - Training data-center - Similar to Inference data-center cards, but typically
 33   have more computational power and memory b/w     33   have more computational power and memory b/w (e.g. HBM) and will likely have
 34   a method of scaling-up/out, i.e. connecting      34   a method of scaling-up/out, i.e. connecting to other training cards inside
 35   the server or in other servers, respectively     35   the server or in other servers, respectively.
 36                                                    36 
 37 All these devices typically have different run     37 All these devices typically have different runtime user-space software stacks,
 38 that are tailored-made to their h/w. In additi     38 that are tailored-made to their h/w. In addition, they will also probably
 39 include a compiler to generate programs to the     39 include a compiler to generate programs to their custom-made computational
 40 engines. Typically, the common layer in user-s     40 engines. Typically, the common layer in user-space will be the DL frameworks,
 41 such as PyTorch and TensorFlow.                    41 such as PyTorch and TensorFlow.
 42                                                    42 
 43 Sharing code with DRM                              43 Sharing code with DRM
 44 =====================                              44 =====================
 45                                                    45 
 46 Because this type of devices can be an IP insi     46 Because this type of devices can be an IP inside GPUs or have similar
 47 characteristics as those of GPUs, the accel su     47 characteristics as those of GPUs, the accel subsystem will use the
 48 DRM subsystem's code and functionality. i.e. t     48 DRM subsystem's code and functionality. i.e. the accel core code will
 49 be part of the DRM subsystem and an accel devi     49 be part of the DRM subsystem and an accel device will be a new type of DRM
 50 device.                                            50 device.
 51                                                    51 
 52 This will allow us to leverage the extensive D     52 This will allow us to leverage the extensive DRM code-base and
 53 collaborate with DRM developers that have expe     53 collaborate with DRM developers that have experience with this type of
 54 devices. In addition, new features that will b     54 devices. In addition, new features that will be added for the accelerator
 55 drivers can be of use to GPU drivers as well.      55 drivers can be of use to GPU drivers as well.
 56                                                    56 
 57 Differentiation from GPUs                          57 Differentiation from GPUs
 58 =========================                          58 =========================
 59                                                    59 
 60 Because we want to prevent the extensive user-     60 Because we want to prevent the extensive user-space graphic software stack
 61 from trying to use an accelerator as a GPU, th     61 from trying to use an accelerator as a GPU, the compute accelerators will be
 62 differentiated from GPUs by using a new major      62 differentiated from GPUs by using a new major number and new device char files.
 63                                                    63 
 64 Furthermore, the drivers will be located in a      64 Furthermore, the drivers will be located in a separate place in the kernel
 65 tree - drivers/accel/.                             65 tree - drivers/accel/.
 66                                                    66 
 67 The accelerator devices will be exposed to the     67 The accelerator devices will be exposed to the user space with the dedicated
 68 261 major number and will have the following c     68 261 major number and will have the following convention:
 69                                                    69 
 70 - device char files - /dev/accel/accel\*           70 - device char files - /dev/accel/accel\*
 71 - sysfs             - /sys/class/accel/accel\*     71 - sysfs             - /sys/class/accel/accel\*/
 72 - debugfs           - /sys/kernel/debug/accel/     72 - debugfs           - /sys/kernel/debug/accel/\*/
 73                                                    73 
 74 Getting Started                                    74 Getting Started
 75 ===============                                    75 ===============
 76                                                    76 
 77 First, read the DRM documentation at Documenta     77 First, read the DRM documentation at Documentation/gpu/index.rst.
 78 Not only it will explain how to write a new DR     78 Not only it will explain how to write a new DRM driver but it will also
 79 contain all the information on how to contribu     79 contain all the information on how to contribute, the Code Of Conduct and
 80 what is the coding style/documentation. All of     80 what is the coding style/documentation. All of that is the same for the
 81 accel subsystem.                                   81 accel subsystem.
 82                                                    82 
 83 Second, make sure the kernel is configured wit     83 Second, make sure the kernel is configured with CONFIG_DRM_ACCEL.
 84                                                    84 
 85 To expose your device as an accelerator, two c     85 To expose your device as an accelerator, two changes are needed to
 86 be done in your driver (as opposed to a standa     86 be done in your driver (as opposed to a standard DRM driver):
 87                                                    87 
 88 - Add the DRIVER_COMPUTE_ACCEL feature flag in     88 - Add the DRIVER_COMPUTE_ACCEL feature flag in your drm_driver's
 89   driver_features field. It is important to no     89   driver_features field. It is important to note that this driver feature is
 90   mutually exclusive with DRIVER_RENDER and DR     90   mutually exclusive with DRIVER_RENDER and DRIVER_MODESET. Devices that want
 91   to expose both graphics and compute device c     91   to expose both graphics and compute device char files should be handled by
 92   two drivers that are connected using the aux     92   two drivers that are connected using the auxiliary bus framework.
 93                                                    93 
 94 - Change the open callback in your driver fops     94 - Change the open callback in your driver fops structure to accel_open().
 95   Alternatively, your driver can use DEFINE_DR     95   Alternatively, your driver can use DEFINE_DRM_ACCEL_FOPS macro to easily
 96   set the correct function operations pointers     96   set the correct function operations pointers structure.
 97                                                    97 
 98 External References                                98 External References
 99 ===================                                99 ===================
100                                                   100 
101 email threads                                     101 email threads
102 -------------                                     102 -------------
103                                                   103 
104 * `Initial discussion on the New subsystem for< !! 104 * `Initial discussion on the New subsystem for acceleration devices <https://lkml.org/lkml/2022/7/31/83>`_ - Oded Gabbay (2022)
105 * `patch-set to add the new subsystem <https:// !! 105 * `patch-set to add the new subsystem <https://lkml.org/lkml/2022/10/22/544>`_ - Oded Gabbay (2022)
106                                                   106 
107 Conference talks                                  107 Conference talks
108 ----------------                                  108 ----------------
109                                                   109 
110 * `LPC 2022 Accelerators BOF outcomes summary     110 * `LPC 2022 Accelerators BOF outcomes summary <https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html>`_ - Dave Airlie (2022)
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php