1 .. SPDX-License-Identifier: GPL-2.0 1 .. SPDX-License-Identifier: GPL-2.0 2 2 3 ============================ 3 ============================ 4 PCI Peer-to-Peer DMA Support 4 PCI Peer-to-Peer DMA Support 5 ============================ 5 ============================ 6 6 7 The PCI bus has pretty decent support for perf 7 The PCI bus has pretty decent support for performing DMA transfers 8 between two devices on the bus. This type of t 8 between two devices on the bus. This type of transaction is henceforth 9 called Peer-to-Peer (or P2P). However, there a 9 called Peer-to-Peer (or P2P). However, there are a number of issues that 10 make P2P transactions tricky to do in a perfec 10 make P2P transactions tricky to do in a perfectly safe way. 11 11 12 One of the biggest issues is that PCI doesn't 12 One of the biggest issues is that PCI doesn't require forwarding 13 transactions between hierarchy domains, and in 13 transactions between hierarchy domains, and in PCIe, each Root Port 14 defines a separate hierarchy domain. To make t 14 defines a separate hierarchy domain. To make things worse, there is no 15 simple way to determine if a given Root Comple 15 simple way to determine if a given Root Complex supports this or not. 16 (See PCIe r4.0, sec 1.3.1). Therefore, as of t 16 (See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel 17 only supports doing P2P when the endpoints inv 17 only supports doing P2P when the endpoints involved are all behind the 18 same PCI bridge, as such devices are all in th 18 same PCI bridge, as such devices are all in the same PCI hierarchy 19 domain, and the spec guarantees that all trans 19 domain, and the spec guarantees that all transactions within the 20 hierarchy will be routable, but it does not re 20 hierarchy will be routable, but it does not require routing 21 between hierarchies. 21 between hierarchies. 22 22 23 The second issue is that to make use of existi 23 The second issue is that to make use of existing interfaces in Linux, 24 memory that is used for P2P transactions needs 24 memory that is used for P2P transactions needs to be backed by struct 25 pages. However, PCI BARs are not typically cac 25 pages. However, PCI BARs are not typically cache coherent so there are 26 a few corner case gotchas with these pages so 26 a few corner case gotchas with these pages so developers need to 27 be careful about what they do with them. 27 be careful about what they do with them. 28 28 29 29 30 Driver Writer's Guide 30 Driver Writer's Guide 31 ===================== 31 ===================== 32 32 33 In a given P2P implementation there may be thr 33 In a given P2P implementation there may be three or more different 34 types of kernel drivers in play: 34 types of kernel drivers in play: 35 35 36 * Provider - A driver which provides or publis 36 * Provider - A driver which provides or publishes P2P resources like 37 memory or doorbell registers to other driver 37 memory or doorbell registers to other drivers. 38 * Client - A driver which makes use of a resou 38 * Client - A driver which makes use of a resource by setting up a 39 DMA transaction to or from it. 39 DMA transaction to or from it. 40 * Orchestrator - A driver which orchestrates t 40 * Orchestrator - A driver which orchestrates the flow of data between 41 clients and providers. 41 clients and providers. 42 42 43 In many cases there could be overlap between t 43 In many cases there could be overlap between these three types (i.e., 44 it may be typical for a driver to be both a pr 44 it may be typical for a driver to be both a provider and a client). 45 45 46 For example, in the NVMe Target Copy Offload i 46 For example, in the NVMe Target Copy Offload implementation: 47 47 48 * The NVMe PCI driver is both a client, provid 48 * The NVMe PCI driver is both a client, provider and orchestrator 49 in that it exposes any CMB (Controller Memor 49 in that it exposes any CMB (Controller Memory Buffer) as a P2P memory 50 resource (provider), it accepts P2P memory p 50 resource (provider), it accepts P2P memory pages as buffers in requests 51 to be used directly (client) and it can also 51 to be used directly (client) and it can also make use of the CMB as 52 submission queue entries (orchestrator). !! 52 submission queue entries (orchastrator). 53 * The RDMA driver is a client in this arrangem 53 * The RDMA driver is a client in this arrangement so that an RNIC 54 can DMA directly to the memory exposed by th 54 can DMA directly to the memory exposed by the NVMe device. 55 * The NVMe Target driver (nvmet) can orchestra 55 * The NVMe Target driver (nvmet) can orchestrate the data from the RNIC 56 to the P2P memory (CMB) and then to the NVMe 56 to the P2P memory (CMB) and then to the NVMe device (and vice versa). 57 57 58 This is currently the only arrangement support 58 This is currently the only arrangement supported by the kernel but 59 one could imagine slight tweaks to this that w 59 one could imagine slight tweaks to this that would allow for the same 60 functionality. For example, if a specific RNIC 60 functionality. For example, if a specific RNIC added a BAR with some 61 memory behind it, its driver could add support 61 memory behind it, its driver could add support as a P2P provider and 62 then the NVMe Target could use the RNIC's memo 62 then the NVMe Target could use the RNIC's memory instead of the CMB 63 in cases where the NVMe cards in use do not ha 63 in cases where the NVMe cards in use do not have CMB support. 64 64 65 65 66 Provider Drivers 66 Provider Drivers 67 ---------------- 67 ---------------- 68 68 69 A provider simply needs to register a BAR (or 69 A provider simply needs to register a BAR (or a portion of a BAR) 70 as a P2P DMA resource using :c:func:`pci_p2pdm 70 as a P2P DMA resource using :c:func:`pci_p2pdma_add_resource()`. 71 This will register struct pages for all the sp 71 This will register struct pages for all the specified memory. 72 72 73 After that it may optionally publish all of it 73 After that it may optionally publish all of its resources as 74 P2P memory using :c:func:`pci_p2pmem_publish() 74 P2P memory using :c:func:`pci_p2pmem_publish()`. This will allow 75 any orchestrator drivers to find and use the m 75 any orchestrator drivers to find and use the memory. When marked in 76 this way, the resource must be regular memory 76 this way, the resource must be regular memory with no side effects. 77 77 78 For the time being this is fairly rudimentary 78 For the time being this is fairly rudimentary in that all resources 79 are typically going to be P2P memory. Future w 79 are typically going to be P2P memory. Future work will likely expand 80 this to include other types of resources like 80 this to include other types of resources like doorbells. 81 81 82 82 83 Client Drivers 83 Client Drivers 84 -------------- 84 -------------- 85 85 86 A client driver only has to use the mapping AP !! 86 A client driver typically only has to conditionally change its DMA map 87 and :c:func:`dma_unmap_sg()` functions as usua !! 87 routine to use the mapping function :c:func:`pci_p2pdma_map_sg()` instead 88 will do the right thing for the P2P capable me !! 88 of the usual :c:func:`dma_map_sg()` function. Memory mapped in this >> 89 way does not need to be unmapped. >> 90 >> 91 The client may also, optionally, make use of >> 92 :c:func:`is_pci_p2pdma_page()` to determine when to use the P2P mapping >> 93 functions and when to use the regular mapping functions. In some >> 94 situations, it may be more appropriate to use a flag to indicate a >> 95 given request is P2P memory and map appropriately. It is important to >> 96 ensure that struct pages that back P2P memory stay out of code that >> 97 does not have support for them as other code may treat the pages as >> 98 regular memory which may not be appropriate. 89 99 90 100 91 Orchestrator Drivers 101 Orchestrator Drivers 92 -------------------- 102 -------------------- 93 103 94 The first task an orchestrator driver must do 104 The first task an orchestrator driver must do is compile a list of 95 all client devices that will be involved in a 105 all client devices that will be involved in a given transaction. For 96 example, the NVMe Target driver creates a list 106 example, the NVMe Target driver creates a list including the namespace 97 block device and the RNIC in use. If the orche 107 block device and the RNIC in use. If the orchestrator has access to 98 a specific P2P provider to use it may check co 108 a specific P2P provider to use it may check compatibility using 99 :c:func:`pci_p2pdma_distance()` otherwise it m 109 :c:func:`pci_p2pdma_distance()` otherwise it may find a memory provider 100 that's compatible with all clients using :c:f 110 that's compatible with all clients using :c:func:`pci_p2pmem_find()`. 101 If more than one provider is supported, the on 111 If more than one provider is supported, the one nearest to all the clients will 102 be chosen first. If more than one provider is 112 be chosen first. If more than one provider is an equal distance away, the 103 one returned will be chosen at random (it is n 113 one returned will be chosen at random (it is not an arbitrary but 104 truly random). This function returns the PCI d !! 114 truely random). This function returns the PCI device to use for the provider 105 with a reference taken and therefore when it's 115 with a reference taken and therefore when it's no longer needed it should be 106 returned with pci_dev_put(). 116 returned with pci_dev_put(). 107 117 108 Once a provider is selected, the orchestrator 118 Once a provider is selected, the orchestrator can then use 109 :c:func:`pci_alloc_p2pmem()` and :c:func:`pci_ 119 :c:func:`pci_alloc_p2pmem()` and :c:func:`pci_free_p2pmem()` to 110 allocate P2P memory from the provider. :c:func 120 allocate P2P memory from the provider. :c:func:`pci_p2pmem_alloc_sgl()` 111 and :c:func:`pci_p2pmem_free_sgl()` are conven 121 and :c:func:`pci_p2pmem_free_sgl()` are convenience functions for 112 allocating scatter-gather lists with P2P memor 122 allocating scatter-gather lists with P2P memory. 113 123 114 Struct Page Caveats 124 Struct Page Caveats 115 ------------------- 125 ------------------- 116 126 117 Driver writers should be very careful about no 127 Driver writers should be very careful about not passing these special 118 struct pages to code that isn't prepared for i 128 struct pages to code that isn't prepared for it. At this time, the kernel 119 interfaces do not have any checks for ensuring 129 interfaces do not have any checks for ensuring this. This obviously 120 precludes passing these pages to userspace. 130 precludes passing these pages to userspace. 121 131 122 P2P memory is also technically IO memory but s 132 P2P memory is also technically IO memory but should never have any side 123 effects behind it. Thus, the order of loads an 133 effects behind it. Thus, the order of loads and stores should not be important 124 and ioreadX(), iowriteX() and friends should n 134 and ioreadX(), iowriteX() and friends should not be necessary. >> 135 However, as the memory is not cache coherent, if access ever needs to >> 136 be protected by a spinlock then :c:func:`mmiowb()` must be used before >> 137 unlocking the lock. (See ACQUIRES VS I/O ACCESSES in >> 138 Documentation/memory-barriers.txt) 125 139 126 140 127 P2P DMA Support Library 141 P2P DMA Support Library 128 ======================= 142 ======================= 129 143 130 .. kernel-doc:: drivers/pci/p2pdma.c 144 .. kernel-doc:: drivers/pci/p2pdma.c 131 :export: 145 :export:
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.