1 =========== 2 NTB Drivers 3 =========== 4 5 NTB (Non-Transparent Bridge) is a type of PCI-Express bridge chip that connects 6 the separate memory systems of two or more computers to the same PCI-Express 7 fabric. Existing NTB hardware supports a common feature set: doorbell 8 registers and memory translation windows, as well as non common features like 9 scratchpad and message registers. Scratchpad registers are read-and-writable 10 registers that are accessible from either side of the device, so that peers can 11 exchange a small amount of information at a fixed address. Message registers can 12 be utilized for the same purpose. Additionally they are provided with 13 special status bits to make sure the information isn't rewritten by another 14 peer. Doorbell registers provide a way for peers to send interrupt events. 15 Memory windows allow translated read and write access to the peer memory. 16 17 NTB Core Driver (ntb) 18 ===================== 19 20 The NTB core driver defines an api wrapping the common feature set, and allows 21 clients interested in NTB features to discover NTB the devices supported by 22 hardware drivers. The term "client" is used here to mean an upper layer 23 component making use of the NTB api. The term "driver," or "hardware driver," 24 is used here to mean a driver for a specific vendor and model of NTB hardware. 25 26 NTB Client Drivers 27 ================== 28 29 NTB client drivers should register with the NTB core driver. After 30 registering, the client probe and remove functions will be called appropriately 31 as ntb hardware, or hardware drivers, are inserted and removed. The 32 registration uses the Linux Device framework, so it should feel familiar to 33 anyone who has written a pci driver. 34 35 NTB Typical client driver implementation 36 ---------------------------------------- 37 38 Primary purpose of NTB is to share some peace of memory between at least two 39 systems. So the NTB device features like Scratchpad/Message registers are 40 mainly used to perform the proper memory window initialization. Typically 41 there are two types of memory window interfaces supported by the NTB API: 42 inbound translation configured on the local ntb port and outbound translation 43 configured by the peer, on the peer ntb port. The first type is 44 depicted on the next figure:: 45 46 Inbound translation: 47 48 Memory: Local NTB Port: Peer NTB Port: Peer MMIO: 49 ____________ 50 | dma-mapped |-ntb_mw_set_trans(addr) | 51 | memory | _v____________ | ______________ 52 | (addr) |<======| MW xlat addr |<====| MW base addr |<== memory-mapped IO 53 |------------| |--------------| | |--------------| 54 55 So typical scenario of the first type memory window initialization looks: 56 1) allocate a memory region, 2) put translated address to NTB config, 57 3) somehow notify a peer device of performed initialization, 4) peer device 58 maps corresponding outbound memory window so to have access to the shared 59 memory region. 60 61 The second type of interface, that implies the shared windows being 62 initialized by a peer device, is depicted on the figure:: 63 64 Outbound translation: 65 66 Memory: Local NTB Port: Peer NTB Port: Peer MMIO: 67 ____________ ______________ 68 | dma-mapped | | | MW base addr |<== memory-mapped IO 69 | memory | | |--------------| 70 | (addr) |<===================| MW xlat addr |<-ntb_peer_mw_set_trans(addr) 71 |------------| | |--------------| 72 73 Typical scenario of the second type interface initialization would be: 74 1) allocate a memory region, 2) somehow deliver a translated address to a peer 75 device, 3) peer puts the translated address to NTB config, 4) peer device maps 76 outbound memory window so to have access to the shared memory region. 77 78 As one can see the described scenarios can be combined in one portable 79 algorithm. 80 81 Local device: 82 1) Allocate memory for a shared window 83 2) Initialize memory window by translated address of the allocated region 84 (it may fail if local memory window initialization is unsupported) 85 3) Send the translated address and memory window index to a peer device 86 87 Peer device: 88 1) Initialize memory window with retrieved address of the allocated 89 by another device memory region (it may fail if peer memory window 90 initialization is unsupported) 91 2) Map outbound memory window 92 93 In accordance with this scenario, the NTB Memory Window API can be used as 94 follows: 95 96 Local device: 97 1) ntb_mw_count(pidx) - retrieve number of memory ranges, which can 98 be allocated for memory windows between local device and peer device 99 of port with specified index. 100 2) ntb_get_align(pidx, midx) - retrieve parameters restricting the 101 shared memory region alignment and size. Then memory can be properly 102 allocated. 103 3) Allocate physically contiguous memory region in compliance with 104 restrictions retrieved in 2). 105 4) ntb_mw_set_trans(pidx, midx) - try to set translation address of 106 the memory window with specified index for the defined peer device 107 (it may fail if local translated address setting is not supported) 108 5) Send translated base address (usually together with memory window 109 number) to the peer device using, for instance, scratchpad or message 110 registers. 111 112 Peer device: 113 1) ntb_peer_mw_set_trans(pidx, midx) - try to set received from other 114 device (related to pidx) translated address for specified memory 115 window. It may fail if retrieved address, for instance, exceeds 116 maximum possible address or isn't properly aligned. 117 2) ntb_peer_mw_get_addr(widx) - retrieve MMIO address to map the memory 118 window so to have an access to the shared memory. 119 120 Also it is worth to note, that method ntb_mw_count(pidx) should return the 121 same value as ntb_peer_mw_count() on the peer with port index - pidx. 122 123 NTB Transport Client (ntb\_transport) and NTB Netdev (ntb\_netdev) 124 ------------------------------------------------------------------ 125 126 The primary client for NTB is the Transport client, used in tandem with NTB 127 Netdev. These drivers function together to create a logical link to the peer, 128 across the ntb, to exchange packets of network data. The Transport client 129 establishes a logical link to the peer, and creates queue pairs to exchange 130 messages and data. The NTB Netdev then creates an ethernet device using a 131 Transport queue pair. Network data is copied between socket buffers and the 132 Transport queue pair buffer. The Transport client may be used for other things 133 besides Netdev, however no other applications have yet been written. 134 135 NTB Ping Pong Test Client (ntb\_pingpong) 136 ----------------------------------------- 137 138 The Ping Pong test client serves as a demonstration to exercise the doorbell 139 and scratchpad registers of NTB hardware, and as an example simple NTB client. 140 Ping Pong enables the link when started, waits for the NTB link to come up, and 141 then proceeds to read and write the doorbell scratchpad registers of the NTB. 142 The peers interrupt each other using a bit mask of doorbell bits, which is 143 shifted by one in each round, to test the behavior of multiple doorbell bits 144 and interrupt vectors. The Ping Pong driver also reads the first local 145 scratchpad, and writes the value plus one to the first peer scratchpad, each 146 round before writing the peer doorbell register. 147 148 Module Parameters: 149 150 * unsafe - Some hardware has known issues with scratchpad and doorbell 151 registers. By default, Ping Pong will not attempt to exercise such 152 hardware. You may override this behavior at your own risk by setting 153 unsafe=1. 154 * delay\_ms - Specify the delay between receiving a doorbell 155 interrupt event and setting the peer doorbell register for the next 156 round. 157 * init\_db - Specify the doorbell bits to start new series of rounds. A new 158 series begins once all the doorbell bits have been shifted out of 159 range. 160 * dyndbg - It is suggested to specify dyndbg=+p when loading this module, and 161 then to observe debugging output on the console. 162 163 NTB Tool Test Client (ntb\_tool) 164 -------------------------------- 165 166 The Tool test client serves for debugging, primarily, ntb hardware and drivers. 167 The Tool provides access through debugfs for reading, setting, and clearing the 168 NTB doorbell, and reading and writing scratchpads. 169 170 The Tool does not currently have any module parameters. 171 172 Debugfs Files: 173 174 * *debugfs*/ntb\_tool/*hw*/ 175 A directory in debugfs will be created for each 176 NTB device probed by the tool. This directory is shortened to *hw* 177 below. 178 * *hw*/db 179 This file is used to read, set, and clear the local doorbell. Not 180 all operations may be supported by all hardware. To read the doorbell, 181 read the file. To set the doorbell, write `s` followed by the bits to 182 set (eg: `echo 's 0x0101' > db`). To clear the doorbell, write `c` 183 followed by the bits to clear. 184 * *hw*/mask 185 This file is used to read, set, and clear the local doorbell mask. 186 See *db* for details. 187 * *hw*/peer\_db 188 This file is used to read, set, and clear the peer doorbell. 189 See *db* for details. 190 * *hw*/peer\_mask 191 This file is used to read, set, and clear the peer doorbell 192 mask. See *db* for details. 193 * *hw*/spad 194 This file is used to read and write local scratchpads. To read 195 the values of all scratchpads, read the file. To write values, write a 196 series of pairs of scratchpad number and value 197 (eg: `echo '4 0x123 7 0xabc' > spad` 198 # to set scratchpads `4` and `7` to `0x123` and `0xabc`, respectively). 199 * *hw*/peer\_spad 200 This file is used to read and write peer scratchpads. See 201 *spad* for details. 202 203 NTB MSI Test Client (ntb\_msi\_test) 204 ------------------------------------ 205 206 The MSI test client serves to test and debug the MSI library which 207 allows for passing MSI interrupts across NTB memory windows. The 208 test client is interacted with through the debugfs filesystem: 209 210 * *debugfs*/ntb\_msi\_test/*hw*/ 211 A directory in debugfs will be created for each 212 NTB device probed by the msi test. This directory is shortened to *hw* 213 below. 214 * *hw*/port 215 This file describes the local port number 216 * *hw*/irq*_occurrences 217 One occurrences file exists for each interrupt and, when read, 218 returns the number of times the interrupt has been triggered. 219 * *hw*/peer*/port 220 This file describes the port number for each peer 221 * *hw*/peer*/count 222 This file describes the number of interrupts that can be 223 triggered on each peer 224 * *hw*/peer*/trigger 225 Writing an interrupt number (any number less than the value 226 specified in count) will trigger the interrupt on the 227 specified peer. That peer's interrupt's occurrence file 228 should be incremented. 229 230 NTB Hardware Drivers 231 ==================== 232 233 NTB hardware drivers should register devices with the NTB core driver. After 234 registering, clients probe and remove functions will be called. 235 236 NTB Intel Hardware Driver (ntb\_hw\_intel) 237 ------------------------------------------ 238 239 The Intel hardware driver supports NTB on Xeon and Atom CPUs. 240 241 Module Parameters: 242 243 * b2b\_mw\_idx 244 If the peer ntb is to be accessed via a memory window, then use 245 this memory window to access the peer ntb. A value of zero or positive 246 starts from the first mw idx, and a negative value starts from the last 247 mw idx. Both sides MUST set the same value here! The default value is 248 `-1`. 249 * b2b\_mw\_share 250 If the peer ntb is to be accessed via a memory window, and if 251 the memory window is large enough, still allow the client to use the 252 second half of the memory window for address translation to the peer. 253 * xeon\_b2b\_usd\_bar2\_addr64 254 If using B2B topology on Xeon hardware, use 255 this 64 bit address on the bus between the NTB devices for the window 256 at BAR2, on the upstream side of the link. 257 * xeon\_b2b\_usd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*. 258 * xeon\_b2b\_usd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*. 259 * xeon\_b2b\_usd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*. 260 * xeon\_b2b\_dsd\_bar2\_addr64 - See *xeon\_b2b\_bar2\_addr64*. 261 * xeon\_b2b\_dsd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*. 262 * xeon\_b2b\_dsd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*. 263 * xeon\_b2b\_dsd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.