~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/arch/arm/stm32/stm32-dma-mdma-chaining.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 .. SPDX-License-Identifier: GPL-2.0
  2 
  3 =======================
  4 STM32 DMA-MDMA chaining
  5 =======================
  6 
  7 
  8 Introduction
  9 ------------
 10 
 11   This document describes the STM32 DMA-MDMA chaining feature. But before going
 12   further, let's introduce the peripherals involved.
 13 
 14   To offload data transfers from the CPU, STM32 microprocessors (MPUs) embed
 15   direct memory access controllers (DMA).
 16 
 17   STM32MP1 SoCs embed both STM32 DMA and STM32 MDMA controllers. STM32 DMA
 18   request routing capabilities are enhanced by a DMA request multiplexer
 19   (STM32 DMAMUX).
 20 
 21   **STM32 DMAMUX**
 22 
 23   STM32 DMAMUX routes any DMA request from a given peripheral to any STM32 DMA
 24   controller (STM32MP1 counts two STM32 DMA controllers) channels.
 25 
 26   **STM32 DMA**
 27 
 28   STM32 DMA is mainly used to implement central data buffer storage (usually in
 29   the system SRAM) for different peripheral. It can access external RAMs but
 30   without the ability to generate convenient burst transfer ensuring the best
 31   load of the AXI.
 32 
 33   **STM32 MDMA**
 34 
 35   STM32 MDMA (Master DMA) is mainly used to manage direct data transfers between
 36   RAM data buffers without CPU intervention. It can also be used in a
 37   hierarchical structure that uses STM32 DMA as first level data buffer
 38   interfaces for AHB peripherals, while the STM32 MDMA acts as a second level
 39   DMA with better performance. As a AXI/AHB master, STM32 MDMA can take control
 40   of the AXI/AHB bus.
 41 
 42 
 43 Principles
 44 ----------
 45 
 46   STM32 DMA-MDMA chaining feature relies on the strengths of STM32 DMA and
 47   STM32 MDMA controllers.
 48 
 49   STM32 DMA has a circular Double Buffer Mode (DBM). At each end of transaction
 50   (when DMA data counter - DMA_SxNDTR - reaches 0), the memory pointers
 51   (configured with DMA_SxSM0AR and DMA_SxM1AR) are swapped and the DMA data
 52   counter is automatically reloaded. This allows the SW or the STM32 MDMA to
 53   process one memory area while the second memory area is being filled/used by
 54   the STM32 DMA transfer.
 55 
 56   With STM32 MDMA linked-list mode, a single request initiates the data array
 57   (collection of nodes) to be transferred until the linked-list pointer for the
 58   channel is null. The channel transfer complete of the last node is the end of
 59   transfer, unless first and last nodes are linked to each other, in such a
 60   case, the linked-list loops on to create a circular MDMA transfer.
 61 
 62   STM32 MDMA has direct connections with STM32 DMA. This enables autonomous
 63   communication and synchronization between peripherals, thus saving CPU
 64   resources and bus congestion. Transfer Complete signal of STM32 DMA channel
 65   can triggers STM32 MDMA transfer. STM32 MDMA can clear the request generated
 66   by the STM32 DMA by writing to its Interrupt Clear register (whose address is
 67   stored in MDMA_CxMAR, and bit mask in MDMA_CxMDR).
 68 
 69   .. table:: STM32 MDMA interconnect table with STM32 DMA
 70 
 71     +--------------+----------------+-----------+------------+
 72     | STM32 DMAMUX | STM32 DMA      | STM32 DMA | STM32 MDMA |
 73     | channels     | channels       | Transfer  | request    |
 74     |              |                | complete  |            |
 75     |              |                | signal    |            |
 76     +==============+================+===========+============+
 77     | Channel *0*  | DMA1 channel 0 | dma1_tcf0 | *0x00*     |
 78     +--------------+----------------+-----------+------------+
 79     | Channel *1*  | DMA1 channel 1 | dma1_tcf1 | *0x01*     |
 80     +--------------+----------------+-----------+------------+
 81     | Channel *2*  | DMA1 channel 2 | dma1_tcf2 | *0x02*     |
 82     +--------------+----------------+-----------+------------+
 83     | Channel *3*  | DMA1 channel 3 | dma1_tcf3 | *0x03*     |
 84     +--------------+----------------+-----------+------------+
 85     | Channel *4*  | DMA1 channel 4 | dma1_tcf4 | *0x04*     |
 86     +--------------+----------------+-----------+------------+
 87     | Channel *5*  | DMA1 channel 5 | dma1_tcf5 | *0x05*     |
 88     +--------------+----------------+-----------+------------+
 89     | Channel *6*  | DMA1 channel 6 | dma1_tcf6 | *0x06*     |
 90     +--------------+----------------+-----------+------------+
 91     | Channel *7*  | DMA1 channel 7 | dma1_tcf7 | *0x07*     |
 92     +--------------+----------------+-----------+------------+
 93     | Channel *8*  | DMA2 channel 0 | dma2_tcf0 | *0x08*     |
 94     +--------------+----------------+-----------+------------+
 95     | Channel *9*  | DMA2 channel 1 | dma2_tcf1 | *0x09*     |
 96     +--------------+----------------+-----------+------------+
 97     | Channel *10* | DMA2 channel 2 | dma2_tcf2 | *0x0A*     |
 98     +--------------+----------------+-----------+------------+
 99     | Channel *11* | DMA2 channel 3 | dma2_tcf3 | *0x0B*     |
100     +--------------+----------------+-----------+------------+
101     | Channel *12* | DMA2 channel 4 | dma2_tcf4 | *0x0C*     |
102     +--------------+----------------+-----------+------------+
103     | Channel *13* | DMA2 channel 5 | dma2_tcf5 | *0x0D*     |
104     +--------------+----------------+-----------+------------+
105     | Channel *14* | DMA2 channel 6 | dma2_tcf6 | *0x0E*     |
106     +--------------+----------------+-----------+------------+
107     | Channel *15* | DMA2 channel 7 | dma2_tcf7 | *0x0F*     |
108     +--------------+----------------+-----------+------------+
109 
110   STM32 DMA-MDMA chaining feature then uses a SRAM buffer. STM32MP1 SoCs embed
111   three fast access static internal RAMs of various size, used for data storage.
112   Due to STM32 DMA legacy (within microcontrollers), STM32 DMA performances are
113   bad with DDR, while they are optimal with SRAM. Hence the SRAM buffer used
114   between STM32 DMA and STM32 MDMA. This buffer is split in two equal periods
115   and STM32 DMA uses one period while STM32 MDMA uses the other period
116   simultaneously.
117   ::
118 
119                     dma[1:2]-tcf[0:7]
120                    .----------------.
121      ____________ '    _________     V____________
122     | STM32 DMA  |    /  __|>_  \    | STM32 MDMA |
123     |------------|   |  /     \  |   |------------|
124     | DMA_SxM0AR |<=>| | SRAM  | |<=>| []-[]...[] |
125     | DMA_SxM1AR |   |  \_____/  |   |            |
126     |____________|    \___<|____/    |____________|
127 
128   STM32 DMA-MDMA chaining uses (struct dma_slave_config).peripheral_config to
129   exchange the parameters needed to configure MDMA. These parameters are
130   gathered into a u32 array with three values:
131 
132   * the STM32 MDMA request (which is actually the DMAMUX channel ID),
133   * the address of the STM32 DMA register to clear the Transfer Complete
134     interrupt flag,
135   * the mask of the Transfer Complete interrupt flag of the STM32 DMA channel.
136 
137 Device Tree updates for STM32 DMA-MDMA chaining support
138 -------------------------------------------------------
139 
140   **1. Allocate a SRAM buffer**
141 
142     SRAM device tree node is defined in SoC device tree. You can refer to it in
143     your board device tree to define your SRAM pool.
144     ::
145 
146           &sram {
147                   my_foo_device_dma_pool: dma-sram@0 {
148                           reg = <0x0 0x1000>;
149                   };
150           };
151 
152     Be careful of the start index, in case there are other SRAM consumers.
153     Define your pool size strategically: to optimise chaining, the idea is that
154     STM32 DMA and STM32 MDMA can work simultaneously, on each buffer of the
155     SRAM.
156     If the SRAM period is greater than the expected DMA transfer, then STM32 DMA
157     and STM32 MDMA will work sequentially instead of simultaneously. It is not a
158     functional issue but it is not optimal.
159 
160     Don't forget to refer to your SRAM pool in your device node. You need to
161     define a new property.
162     ::
163 
164           &my_foo_device {
165                   ...
166                   my_dma_pool = &my_foo_device_dma_pool;
167           };
168 
169     Then get this SRAM pool in your foo driver and allocate your SRAM buffer.
170 
171   **2. Allocate a STM32 DMA channel and a STM32 MDMA channel**
172 
173     You need to define an extra channel in your device tree node, in addition to
174     the one you should already have for "classic" DMA operation.
175 
176     This new channel must be taken from STM32 MDMA channels, so, the phandle of
177     the DMA controller to use is the MDMA controller's one.
178     ::
179 
180           &my_foo_device {
181                   [...]
182                   my_dma_pool = &my_foo_device_dma_pool;
183                   dmas = <&dmamux1 ...>,                // STM32 DMA channel
184                          <&mdma1 0 0x3 0x1200000a 0 0>; // + STM32 MDMA channel
185           };
186 
187     Concerning STM32 MDMA bindings:
188 
189     1. The request line number : whatever the value here, it will be overwritten
190     by MDMA driver with the STM32 DMAMUX channel ID passed through
191     (struct dma_slave_config).peripheral_config
192 
193     2. The priority level : choose Very High (0x3) so that your channel will
194     take priority other the other during request arbitration
195 
196     3. A 32bit mask specifying the DMA channel configuration : source and
197     destination address increment, block transfer with 128 bytes per single
198     transfer
199 
200     4. The 32bit value specifying the register to be used to acknowledge the
201     request: it will be overwritten by MDMA driver, with the DMA channel
202     interrupt flag clear register address passed through
203     (struct dma_slave_config).peripheral_config
204 
205     5. The 32bit mask specifying the value to be written to acknowledge the
206     request: it will be overwritten by MDMA driver, with the DMA channel
207     Transfer Complete flag passed through
208     (struct dma_slave_config).peripheral_config
209 
210 Driver updates for STM32 DMA-MDMA chaining support in foo driver
211 ----------------------------------------------------------------
212 
213   **0. (optional) Refactor the original sg_table if dmaengine_prep_slave_sg()**
214 
215     In case of dmaengine_prep_slave_sg(), the original sg_table can't be used as
216     is. Two new sg_tables must be created from the original one. One for
217     STM32 DMA transfer (where memory address targets now the SRAM buffer instead
218     of DDR buffer) and one for STM32 MDMA transfer (where memory address targets
219     the DDR buffer).
220 
221     The new sg_list items must fit SRAM period length. Here is an example for
222     DMA_DEV_TO_MEM:
223     ::
224 
225       /*
226         * Assuming sgl and nents, respectively the initial scatterlist and its
227         * length.
228         * Assuming sram_dma_buf and sram_period, respectively the memory
229         * allocated from the pool for DMA usage, and the length of the period,
230         * which is half of the sram_buf size.
231         */
232       struct sg_table new_dma_sgt, new_mdma_sgt;
233       struct scatterlist *s, *_sgl;
234       dma_addr_t ddr_dma_buf;
235       u32 new_nents = 0, len;
236       int i;
237 
238       /* Count the number of entries needed */
239       for_each_sg(sgl, s, nents, i)
240               if (sg_dma_len(s) > sram_period)
241                       new_nents += DIV_ROUND_UP(sg_dma_len(s), sram_period);
242               else
243                       new_nents++;
244 
245       /* Create sg table for STM32 DMA channel */
246       ret = sg_alloc_table(&new_dma_sgt, new_nents, GFP_ATOMIC);
247       if (ret)
248               dev_err(dev, "DMA sg table alloc failed\n");
249 
250       for_each_sg(new_dma_sgt.sgl, s, new_dma_sgt.nents, i) {
251               _sgl = sgl;
252               sg_dma_len(s) = min(sg_dma_len(_sgl), sram_period);
253               /* Targets the beginning = first half of the sram_buf */
254               s->dma_address = sram_buf;
255               /*
256                 * Targets the second half of the sram_buf
257                 * for odd indexes of the item of the sg_list
258                 */
259               if (i & 1)
260                       s->dma_address += sram_period;
261       }
262 
263       /* Create sg table for STM32 MDMA channel */
264       ret = sg_alloc_table(&new_mdma_sgt, new_nents, GFP_ATOMIC);
265       if (ret)
266               dev_err(dev, "MDMA sg_table alloc failed\n");
267 
268       _sgl = sgl;
269       len = sg_dma_len(sgl);
270       ddr_dma_buf = sg_dma_address(sgl);
271       for_each_sg(mdma_sgt.sgl, s, mdma_sgt.nents, i) {
272               size_t bytes = min_t(size_t, len, sram_period);
273 
274               sg_dma_len(s) = bytes;
275               sg_dma_address(s) = ddr_dma_buf;
276               len -= bytes;
277 
278               if (!len && sg_next(_sgl)) {
279                       _sgl = sg_next(_sgl);
280                       len = sg_dma_len(_sgl);
281                       ddr_dma_buf = sg_dma_address(_sgl);
282               } else {
283                       ddr_dma_buf += bytes;
284               }
285       }
286 
287     Don't forget to release these new sg_tables after getting the descriptors
288     with dmaengine_prep_slave_sg().
289 
290   **1. Set controller specific parameters**
291 
292     First, use dmaengine_slave_config() with a struct dma_slave_config to
293     configure STM32 DMA channel. You just have to take care of DMA addresses,
294     the memory address (depending on the transfer direction) must point on your
295     SRAM buffer, and set (struct dma_slave_config).peripheral_size != 0.
296 
297     STM32 DMA driver will check (struct dma_slave_config).peripheral_size to
298     determine if chaining is being used or not. If it is used, then STM32 DMA
299     driver fills (struct dma_slave_config).peripheral_config with an array of
300     three u32 : the first one containing STM32 DMAMUX channel ID, the second one
301     the channel interrupt flag clear register address, and the third one the
302     channel Transfer Complete flag mask.
303 
304     Then, use dmaengine_slave_config with another struct dma_slave_config to
305     configure STM32 MDMA channel. Take care of DMA addresses, the device address
306     (depending on the transfer direction) must point on your SRAM buffer, and
307     the memory address must point to the buffer originally used for "classic"
308     DMA operation. Use the previous (struct dma_slave_config).peripheral_size
309     and .peripheral_config that have been updated by STM32 DMA driver, to set
310     (struct dma_slave_config).peripheral_size and .peripheral_config of the
311     struct dma_slave_config to configure STM32 MDMA channel.
312     ::
313 
314       struct dma_slave_config dma_conf;
315       struct dma_slave_config mdma_conf;
316 
317       memset(&dma_conf, 0, sizeof(dma_conf));
318       [...]
319       config.direction = DMA_DEV_TO_MEM;
320       config.dst_addr = sram_dma_buf;        // SRAM buffer
321       config.peripheral_size = 1;            // peripheral_size != 0 => chaining
322 
323       dmaengine_slave_config(dma_chan, &dma_config);
324 
325       memset(&mdma_conf, 0, sizeof(mdma_conf));
326       config.direction = DMA_DEV_TO_MEM;
327       mdma_conf.src_addr = sram_dma_buf;     // SRAM buffer
328       mdma_conf.dst_addr = rx_dma_buf;       // original memory buffer
329       mdma_conf.peripheral_size = dma_conf.peripheral_size;       // <- dma_conf
330       mdma_conf.peripheral_config = dma_config.peripheral_config; // <- dma_conf
331 
332       dmaengine_slave_config(mdma_chan, &mdma_conf);
333 
334   **2. Get a descriptor for STM32 DMA channel transaction**
335 
336     In the same way you get your descriptor for your "classic" DMA operation,
337     you just have to replace the original sg_list (in case of
338     dmaengine_prep_slave_sg()) with the new sg_list using SRAM buffer, or to
339     replace the original buffer address, length and period (in case of
340     dmaengine_prep_dma_cyclic()) with the new SRAM buffer.
341 
342   **3. Get a descriptor for STM32 MDMA channel transaction**
343 
344     If you previously get descriptor (for STM32 DMA) with
345 
346     * dmaengine_prep_slave_sg(), then use dmaengine_prep_slave_sg() for
347       STM32 MDMA;
348     * dmaengine_prep_dma_cyclic(), then use dmaengine_prep_dma_cyclic() for
349       STM32 MDMA.
350 
351     Use the new sg_list using SRAM buffer (in case of dmaengine_prep_slave_sg())
352     or, depending on the transfer direction, either the original DDR buffer (in
353     case of DMA_DEV_TO_MEM) or the SRAM buffer (in case of DMA_MEM_TO_DEV), the
354     source address being previously set with dmaengine_slave_config().
355 
356   **4. Submit both transactions**
357 
358     Before submitting your transactions, you may need to define on which
359     descriptor you want a callback to be called at the end of the transfer
360     (dmaengine_prep_slave_sg()) or the period (dmaengine_prep_dma_cyclic()).
361     Depending on the direction, set the callback on the descriptor that finishes
362     the overall transfer:
363 
364     * DMA_DEV_TO_MEM: set the callback on the "MDMA" descriptor
365     * DMA_MEM_TO_DEV: set the callback on the "DMA" descriptor
366 
367     Then, submit the descriptors whatever the order, with dmaengine_tx_submit().
368 
369   **5. Issue pending requests (and wait for callback notification)**
370 
371   As STM32 MDMA channel transfer is triggered by STM32 DMA, you must issue
372   STM32 MDMA channel before STM32 DMA channel.
373 
374   If any, your callback will be called to warn you about the end of the overall
375   transfer or the period completion.
376 
377   Don't forget to terminate both channels. STM32 DMA channel is configured in
378   cyclic Double-Buffer mode so it won't be disabled by HW, you need to terminate
379   it. STM32 MDMA channel will be stopped by HW in case of sg transfer, but not
380   in case of cyclic transfer. You can terminate it whatever the kind of transfer.
381 
382   **STM32 DMA-MDMA chaining DMA_MEM_TO_DEV special case**
383 
384   STM32 DMA-MDMA chaining in DMA_MEM_TO_DEV is a special case. Indeed, the
385   STM32 MDMA feeds the SRAM buffer with the DDR data, and the STM32 DMA reads
386   data from SRAM buffer. So some data (the first period) have to be copied in
387   SRAM buffer when the STM32 DMA starts to read.
388 
389   A trick could be pausing the STM32 DMA channel (that will raise a Transfer
390   Complete signal, triggering the STM32 MDMA channel), but the first data read
391   by the STM32 DMA could be "wrong". The proper way is to prepare the first SRAM
392   period with dmaengine_prep_dma_memcpy(). Then this first period should be
393   "removed" from the sg or the cyclic transfer.
394 
395   Due to this complexity, rather use the STM32 DMA-MDMA chaining for
396   DMA_DEV_TO_MEM and keep the "classic" DMA usage for DMA_MEM_TO_DEV, unless
397   you're not afraid.
398 
399 Resources
400 ---------
401 
402   Application note, datasheet and reference manual are available on ST website
403   (STM32MP1_).
404 
405   Dedicated focus on three application notes (AN5224_, AN4031_ & AN5001_)
406   dealing with STM32 DMAMUX, STM32 DMA and STM32 MDMA.
407 
408 .. _STM32MP1: https://www.st.com/en/microcontrollers-microprocessors/stm32mp1-series.html
409 .. _AN5224: https://www.st.com/resource/en/application_note/an5224-stm32-dmamux-the-dma-request-router-stmicroelectronics.pdf
410 .. _AN4031: https://www.st.com/resource/en/application_note/dm00046011-using-the-stm32f2-stm32f4-and-stm32f7-series-dma-controller-stmicroelectronics.pdf
411 .. _AN5001: https://www.st.com/resource/en/application_note/an5001-stm32cube-expansion-package-for-stm32h7-series-mdma-stmicroelectronics.pdf
412 
413 :Authors:
414 
415 - Amelie Delaunay <amelie.delaunay@foss.st.com>

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php