1 ========================================== 2 Xillybus driver for generic FPGA interface 3 ========================================== 4 5 :Author: Eli Billauer, Xillybus Ltd. (http://x 6 :Email: eli.billauer@gmail.com or as advertis 7 8 .. Contents: 9 10 - Introduction 11 -- Background 12 -- Xillybus Overview 13 14 - Usage 15 -- User interface 16 -- Synchronization 17 -- Seekable pipes 18 19 - Internals 20 -- Source code organization 21 -- Pipe attributes 22 -- Host never reads from the FPGA 23 -- Channels, pipes, and the message channel 24 -- Data streaming 25 -- Data granularity 26 -- Probing 27 -- Buffer allocation 28 -- The "nonempty" message (supporting poll) 29 30 31 Introduction 32 ============ 33 34 Background 35 ---------- 36 37 An FPGA (Field Programmable Gate Array) is a p 38 can be programmed to become virtually anything 39 dedicated chipset: For instance, a display ada 40 or even a processor with its peripherals. FPGA 41 Based upon certain building blocks, you make y 42 them. It's usually pointless to reimplement so 43 available on the market as a chipset, so FPGAs 44 special functionality is needed, and the produ 45 (hence not justifying the development of an AS 46 47 The challenge with FPGAs is that everything is 48 level, even lower than assembly language. In o 49 focus on their specific project, and not reinv 50 again, pre-designed building blocks, IP cores, 51 FPGA parallels of library functions. IP cores 52 mathematical functions, a functional unit (e.g 53 processor (e.g. ARM) or anything that might co 54 building block, with electrical wires dangling 55 other blocks. 56 57 One of the daunting tasks in FPGA design is co 58 operating system (actually, with the processor 59 low-level bus protocol and the somewhat higher 60 (registers, interrupts, DMA etc.) is a project 61 function is a well-known one (e.g. a video ada 62 make sense to design the FPGA's interface logi 63 A special driver is then written to present th 64 to the kernel and/or user space. In that case, 65 FPGA differently than any device on the bus. 66 67 It's however common that the desired data comm 68 known peripheral function. Also, the effort of 69 abstraction for the data exchange is often con 70 a quicker and possibly less elegant solution i 71 effectively written as a user space program, l 72 with just elementary data transport. This stil 73 interface logic for the FPGA, and write a simp 74 75 Xillybus Overview 76 ----------------- 77 78 Xillybus is an IP core and a Linux driver. Tog 79 elementary data transport between an FPGA and 80 data streams with a straightforward user inter 81 effort solution for mixed FPGA-host projects, 82 have the project-specific part of the driver r 83 84 Since the communication requirements may vary 85 project to another (the number of data pipes n 86 their attributes), there isn't one specific ch 87 IP core. Rather, the IP core is configured and 88 specification given by its end user. 89 90 Xillybus presents independent data streams, wh 91 communication to the user. At the host side, a 92 just like any pipe file. On the FPGA side, har 93 the data. This is contrary to a common method 94 sized buffers (even though such buffers are us 95 There may be more than a hundred of these stre 96 also no more than one, depending on the config 97 98 In order to ease the deployment of the Xillybu 99 data structure which completely defines the co 100 driver fetches this data structure during its 101 up the DMA buffers and character devices accor 102 driver is used to work out of the box with any 103 104 The data structure just mentioned should not b 105 configuration space or the Flattened Device Tr 106 107 Usage 108 ===== 109 110 User interface 111 -------------- 112 113 On the host, all interface with Xillybus is do 114 device files, which are generated automaticall 115 names of these files depend on the IP core tha 116 Probing below). To communicate with the FPGA, 117 corresponds to the hardware FIFO you want to s 118 and use plain write() or read() calls, just li 119 particular, it makes perfect sense to go:: 120 121 $ cat mydata > /dev/xillybus_thisfifo 122 123 $ cat /dev/xillybus_thatfifo > hisdata 124 125 possibly pressing CTRL-C as some stage, even t 126 the capability to send an EOF (but may not use 127 128 The driver and hardware are designed to behave 129 130 * Supporting non-blocking I/O (by setting O_NO 131 132 * Supporting poll() and select(). 133 134 * Being bandwidth efficient under load (using 135 pieces of data sent across (like TCP/IP) by 136 137 A device file can be read only, write only or 138 device files are treated like two independent 139 "channel" structure in the implementation code 140 141 Synchronization 142 --------------- 143 144 Xillybus pipes are configured (on the IP core) 145 asynchronous. For a synchronous pipe, write() 146 some data has been submitted and acknowledged 147 bulk data transfers, and is nearly impossible 148 require data at a constant rate: There is no d 149 between write() calls, in particular when the 150 151 When a pipe is configured asynchronous, write( 152 room in the buffers to store any of the data i 153 154 For FPGA to host pipes, asynchronous pipes all 155 as soon as the respective device file is opene 156 has been requested by a read() call. On synchr 157 of data requested by a read() call is transmit 158 159 In summary, for synchronous pipes, data betwee 160 transmitted only to satisfy the read() or writ 161 by the driver, and those calls wait for the tr 162 returning. 163 164 Note that the synchronization attribute has no 165 that read() or write() completes less bytes th 166 separate configuration flag ("allowpartial") t 167 partial completion is allowed. 168 169 Seekable pipes 170 -------------- 171 172 A synchronous pipe can be configured to have t 173 to the user logic at the FPGA. Such a pipe is 174 With this feature, a memory or register interf 175 FPGA side to the seekable stream. Reading or w 176 the attached memory is done by seeking to the 177 read() or write() as required. 178 179 180 Internals 181 ========= 182 183 Source code organization 184 ------------------------ 185 186 The Xillybus driver consists of a core module, 187 that depend on the specific bus interface (xil 188 189 The bus specific modules are those probed when 190 the kernel. Since the DMA mapping and synchron 191 dependent by their nature, are used by the cor 192 xilly_endpoint_hardware structure is passed to 193 initialization. This structure is populated wi 194 which execute the DMA-related operations on th 195 196 Pipe attributes 197 --------------- 198 199 Each pipe has a number of attributes which are 200 (IP core) is built. They are fetched from the 201 defines the core's configuration, see Probing 202 in xillybus_core.c as follows: 203 204 * is_writebuf: The pipe's direction. A non-zer 205 host pipe (the FPGA "writes"). 206 207 * channelnum: The pipe's identification number 208 host and FPGA. 209 210 * format: The underlying data width. See Data 211 212 * allowpartial: A non-zero value means that a 213 applies) may return with less than the reque 214 choice is a non-zero value, to match standar 215 216 * synchronous: A non-zero value means that the 217 Synchronization above. 218 219 * bufsize: Each DMA buffer's size. Always a po 220 221 * bufnum: The number of buffers allocated for 222 223 * exclusive_open: A non-zero value forces excl 224 device file. If the device file is bidirecti 225 one direction, the opposite direction may be 226 227 * seekable: A non-zero value indicates that th 228 Seekable pipes above. 229 230 * supports_nonempty: A non-zero value (which i 231 hardware will send the messages that are nec 232 poll() for this pipe. 233 234 Host never reads from the FPGA 235 ------------------------------ 236 237 Even though PCI Express is hotpluggable in gen 238 doesn't expect a card to go away all of the su 239 is based upon reprogrammable logic, a sudden d 240 quite likely as a result of an accidental repr 241 host is up. In practice, nothing happens immed 242 if the host attempts to read from an address t 243 device, that leads to an immediate freeze of t 244 even though the PCIe standard requires a grace 245 246 In order to avoid these freezes, the Xillybus 247 reading from the device's register space. All 248 the host is done through DMA. In particular, t 249 doesn't follow the common practice of checking 250 invoked. Rather, the FPGA prepares a small buf 251 messages, which inform the host what the inter 252 253 This mechanism is used on non-PCIe buses as we 254 255 256 Channels, pipes, and the message channel 257 ---------------------------------------- 258 259 Each of the (possibly bidirectional) pipes pre 260 a data channel between the FPGA and the host. 261 and pipes is necessary only because of channel 262 related messages from the FPGA, and has no pip 263 264 Data streaming 265 -------------- 266 267 Even though a non-segmented data stream is pre 268 sides, the implementation relies on a set of D 269 for each channel. For the sake of illustration 270 direction: As data streams into the respective 271 FPGA, the Xillybus IP core writes it to one of 272 buffer is full, the FPGA informs the host abou 273 XILLYMSG_OPCODE_RELEASEBUF message channel 0 a 274 necessary). The host responds by making the da 275 the character device. When all data has been r 276 FPGA's buffer control register, allowing the b 277 control mechanisms exist on both sides to prev 278 279 This is not good enough for creating a TCP/IP- 280 stops momentarily before a DMA buffer is fille 281 that the partial data in buffer will arrive an 282 being completed. This is implemented by adding 283 XILLYMSG_OPCODE_RELEASEBUF message, through wh 284 which buffer is submitted, but how much data i 285 286 But the FPGA will submit a partially filled bu 287 by the host. This situation occurs when the re 288 for XILLY_RX_TIMEOUT jiffies (currently 10 ms) 289 the FPGA to submit a DMA buffer as soon as it 290 balances between bus bandwidth efficiency (pre 291 filled buffers being sent) and a latency held 292 293 A similar setting is used in the host to FPGA 294 partial DMA buffers is somewhat different, tho 295 driver to submit all data it has in the buffer 296 write() with the byte count set to zero. This 297 but it doesn't block. There is also an autoflu 298 an equivalent flush roughly XILLY_RX_TIMEOUT j 299 This allows the user to be oblivious about the 300 and yet enjoy a stream-like interface. 301 302 Note that the issue of partial buffer flushing 303 the "synchronous" attribute nonzero, since syn 304 to lay around in the DMA buffers between read( 305 306 Data granularity 307 ---------------- 308 309 The data arrives or is sent at the FPGA as 8, 310 configured by the "format" attribute. Whenever 311 to hide this when the pipe is accessed differe 312 For example, reading single bytes from a pipe 313 with no issues. Writing single bytes to pipes 314 will also work, but the driver can't send part 315 FPGA, so the transmission of up to one word ma 316 occupied with user data. 317 318 This somewhat complicates the handling of host 319 when a buffer is flushed, it may contain up to 320 the FPGA, and hence can't be sent. To prevent 321 bytes need to be moved to the next buffer. The 322 that mention "leftovers" in some way are relat 323 324 Probing 325 ------- 326 327 As mentioned earlier, the number of pipes that 328 loads and their attributes depend on the Xilly 329 the driver's initialization, a blob containing 330 Interface Description Table (IDT), is sent fro 331 bootstrap process is done in three phases: 332 333 1. Acquire the length of the IDT, so a buffer 334 is done by sending a quiesce command to the 335 for this command contains the IDT's buffer 336 337 2. Acquire the IDT itself. 338 339 3. Create the interfaces according to the IDT. 340 341 Buffer allocation 342 ----------------- 343 344 In order to simplify the logic that prevents i 345 PCIe packets, the following rule applies: If a 346 it must not cross a 4kB boundary. Otherwise, i 347 xilly_setupchannels() functions allocates thes 348 pages from the kernel, and diving them into DM 349 all buffers' sizes are powers of two, it's pos 350 buffers, with a maximal waste of one page of m 351 352 All buffers are allocated when the driver is l 353 since large continuous physical memory segment 354 which are more likely to be available when the 355 356 The allocation of buffer memory takes place in 357 the IDT. The driver relies on a rule that the 358 buffer size in the IDT. If a requested buffer 359 the necessary number of pages is requested fro 360 used for this buffer. If the requested buffer 361 single page is requested from the kernel, and 362 Or, if there already is a partially used page 363 into that page. It can be shown that all pages 364 (except possibly for the last) are 100% utiliz 365 366 The "nonempty" message (supporting poll) 367 ---------------------------------------- 368 369 In order to support the "poll" method (and hen 370 catch regarding the FPGA to host direction: Th 371 buffer with some data, but not submitted that 372 the buffer's submission by the FPGA, there wou 373 FPGA side has sent data, but a select() call w 374 host has not received any notification about t 375 XILLYMSG_OPCODE_NONEMPTY messages sent by the 376 completely empty to containing some data. 377 378 These messages are used only to support poll() 379 be configured not to send them for a slight re
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.