1 .. Copyright 2001 Matthew Wilcox 1 .. Copyright 2001 Matthew Wilcox 2 .. 2 .. 3 .. This documentation is free software; yo 3 .. This documentation is free software; you can redistribute 4 .. it and/or modify it under the terms of 4 .. it and/or modify it under the terms of the GNU General Public 5 .. License as published by the Free Softwa 5 .. License as published by the Free Software Foundation; either 6 .. version 2 of the License, or (at your o 6 .. version 2 of the License, or (at your option) any later 7 .. version. 7 .. version. 8 8 9 =============================== 9 =============================== 10 Bus-Independent Device Accesses 10 Bus-Independent Device Accesses 11 =============================== 11 =============================== 12 12 13 :Author: Matthew Wilcox 13 :Author: Matthew Wilcox 14 :Author: Alan Cox 14 :Author: Alan Cox 15 15 16 Introduction 16 Introduction 17 ============ 17 ============ 18 18 19 Linux provides an API which abstracts performi 19 Linux provides an API which abstracts performing IO across all busses 20 and devices, allowing device drivers to be wri 20 and devices, allowing device drivers to be written independently of bus 21 type. 21 type. 22 22 23 Memory Mapped IO 23 Memory Mapped IO 24 ================ 24 ================ 25 25 26 Getting Access to the Device 26 Getting Access to the Device 27 ---------------------------- 27 ---------------------------- 28 28 29 The most widely supported form of IO is memory 29 The most widely supported form of IO is memory mapped IO. That is, a 30 part of the CPU's address space is interpreted 30 part of the CPU's address space is interpreted not as accesses to 31 memory, but as accesses to a device. Some arch 31 memory, but as accesses to a device. Some architectures define devices 32 to be at a fixed address, but most have some m 32 to be at a fixed address, but most have some method of discovering 33 devices. The PCI bus walk is a good example of 33 devices. The PCI bus walk is a good example of such a scheme. This 34 document does not cover how to receive such an 34 document does not cover how to receive such an address, but assumes you 35 are starting with one. Physical addresses are 35 are starting with one. Physical addresses are of type unsigned long. 36 36 37 This address should not be used directly. Inst 37 This address should not be used directly. Instead, to get an address 38 suitable for passing to the accessor functions 38 suitable for passing to the accessor functions described below, you 39 should call ioremap(). An address suitable for 39 should call ioremap(). An address suitable for accessing 40 the device will be returned to you. 40 the device will be returned to you. 41 41 42 After you've finished using the device (say, i 42 After you've finished using the device (say, in your module's exit 43 routine), call iounmap() in order to return th 43 routine), call iounmap() in order to return the address 44 space to the kernel. Most architectures alloca 44 space to the kernel. Most architectures allocate new address space each 45 time you call ioremap(), and they can run out 45 time you call ioremap(), and they can run out unless you 46 call iounmap(). 46 call iounmap(). 47 47 48 Accessing the device 48 Accessing the device 49 -------------------- 49 -------------------- 50 50 51 The part of the interface most used by drivers 51 The part of the interface most used by drivers is reading and writing 52 memory-mapped registers on the device. Linux p 52 memory-mapped registers on the device. Linux provides interfaces to read 53 and write 8-bit, 16-bit, 32-bit and 64-bit qua 53 and write 8-bit, 16-bit, 32-bit and 64-bit quantities. Due to a 54 historical accident, these are named byte, wor 54 historical accident, these are named byte, word, long and quad accesses. 55 Both read and write accesses are supported; th 55 Both read and write accesses are supported; there is no prefetch support 56 at this time. 56 at this time. 57 57 58 The functions are named readb(), readw(), read 58 The functions are named readb(), readw(), readl(), readq(), 59 readb_relaxed(), readw_relaxed(), readl_relaxe 59 readb_relaxed(), readw_relaxed(), readl_relaxed(), readq_relaxed(), 60 writeb(), writew(), writel() and writeq(). 60 writeb(), writew(), writel() and writeq(). 61 61 62 Some devices (such as framebuffers) would like 62 Some devices (such as framebuffers) would like to use larger transfers than 63 8 bytes at a time. For these devices, the memc 63 8 bytes at a time. For these devices, the memcpy_toio(), 64 memcpy_fromio() and memset_io() functions are 64 memcpy_fromio() and memset_io() functions are 65 provided. Do not use memset or memcpy on IO ad 65 provided. Do not use memset or memcpy on IO addresses; they are not 66 guaranteed to copy data in order. 66 guaranteed to copy data in order. 67 67 68 The read and write functions are defined to be 68 The read and write functions are defined to be ordered. That is the 69 compiler is not permitted to reorder the I/O s 69 compiler is not permitted to reorder the I/O sequence. When the ordering 70 can be compiler optimised, you can use __readb 70 can be compiler optimised, you can use __readb() and friends to 71 indicate the relaxed ordering. Use this with c 71 indicate the relaxed ordering. Use this with care. 72 72 73 While the basic functions are defined to be sy 73 While the basic functions are defined to be synchronous with respect to 74 each other and ordered with respect to each ot 74 each other and ordered with respect to each other the busses the devices 75 sit on may themselves have asynchronicity. In 75 sit on may themselves have asynchronicity. In particular many authors 76 are burned by the fact that PCI bus writes are 76 are burned by the fact that PCI bus writes are posted asynchronously. A 77 driver author must issue a read from the same 77 driver author must issue a read from the same device to ensure that 78 writes have occurred in the specific cases the 78 writes have occurred in the specific cases the author cares. This kind 79 of property cannot be hidden from driver write 79 of property cannot be hidden from driver writers in the API. In some 80 cases, the read used to flush the device may b 80 cases, the read used to flush the device may be expected to fail (if the 81 card is resetting, for example). In that case, 81 card is resetting, for example). In that case, the read should be done 82 from config space, which is guaranteed to soft 82 from config space, which is guaranteed to soft-fail if the card doesn't 83 respond. 83 respond. 84 84 85 The following is an example of flushing a writ 85 The following is an example of flushing a write to a device when the 86 driver would like to ensure the write's effect 86 driver would like to ensure the write's effects are visible prior to 87 continuing execution:: 87 continuing execution:: 88 88 89 static inline void 89 static inline void 90 qla1280_disable_intrs(struct scsi_qla_host 90 qla1280_disable_intrs(struct scsi_qla_host *ha) 91 { 91 { 92 struct device_reg *reg; 92 struct device_reg *reg; 93 93 94 reg = ha->iobase; 94 reg = ha->iobase; 95 /* disable risc and host interrupts */ 95 /* disable risc and host interrupts */ 96 WRT_REG_WORD(®->ictrl, 0); 96 WRT_REG_WORD(®->ictrl, 0); 97 /* 97 /* 98 * The following read will ensure that 98 * The following read will ensure that the above write 99 * has been received by the device bef 99 * has been received by the device before we return from this 100 * function. 100 * function. 101 */ 101 */ 102 RD_REG_WORD(®->ictrl); 102 RD_REG_WORD(®->ictrl); 103 ha->flags.ints_enabled = 0; 103 ha->flags.ints_enabled = 0; 104 } 104 } 105 105 106 PCI ordering rules also guarantee that PIO rea 106 PCI ordering rules also guarantee that PIO read responses arrive after any 107 outstanding DMA writes from that bus, since fo 107 outstanding DMA writes from that bus, since for some devices the result of 108 a readb() call may signal to the driver that a 108 a readb() call may signal to the driver that a DMA transaction is 109 complete. In many cases, however, the driver m 109 complete. In many cases, however, the driver may want to indicate that the 110 next readb() call has no relation to any previ 110 next readb() call has no relation to any previous DMA writes 111 performed by the device. The driver can use re 111 performed by the device. The driver can use readb_relaxed() for 112 these cases, although only some platforms will 112 these cases, although only some platforms will honor the relaxed 113 semantics. Using the relaxed read functions wi 113 semantics. Using the relaxed read functions will provide significant 114 performance benefits on platforms that support 114 performance benefits on platforms that support it. The qla2xxx driver 115 provides examples of how to use readX_relaxed( 115 provides examples of how to use readX_relaxed(). In many cases, a majority 116 of the driver's readX() calls can safely be co 116 of the driver's readX() calls can safely be converted to readX_relaxed() 117 calls, since only a few will indicate or depen 117 calls, since only a few will indicate or depend on DMA completion. 118 118 119 Port Space Accesses 119 Port Space Accesses 120 =================== 120 =================== 121 121 122 Port Space Explained 122 Port Space Explained 123 -------------------- 123 -------------------- 124 124 125 Another form of IO commonly supported is Port 125 Another form of IO commonly supported is Port Space. This is a range of 126 addresses separate to the normal memory addres 126 addresses separate to the normal memory address space. Access to these 127 addresses is generally not as fast as accesses 127 addresses is generally not as fast as accesses to the memory mapped 128 addresses, and it also has a potentially small 128 addresses, and it also has a potentially smaller address space. 129 129 130 Unlike memory mapped IO, no preparation is req 130 Unlike memory mapped IO, no preparation is required to access port 131 space. 131 space. 132 132 133 Accessing Port Space 133 Accessing Port Space 134 -------------------- 134 -------------------- 135 135 136 Accesses to this space are provided through a 136 Accesses to this space are provided through a set of functions which 137 allow 8-bit, 16-bit and 32-bit accesses; also 137 allow 8-bit, 16-bit and 32-bit accesses; also known as byte, word and 138 long. These functions are inb(), inw(), 138 long. These functions are inb(), inw(), 139 inl(), outb(), outw() and 139 inl(), outb(), outw() and 140 outl(). 140 outl(). 141 141 142 Some variants are provided for these functions 142 Some variants are provided for these functions. Some devices require 143 that accesses to their ports are slowed down. 143 that accesses to their ports are slowed down. This functionality is 144 provided by appending a ``_p`` to the end of t 144 provided by appending a ``_p`` to the end of the function. 145 There are also equivalents to memcpy. The ins( 145 There are also equivalents to memcpy. The ins() and 146 outs() functions copy bytes, words or longs to 146 outs() functions copy bytes, words or longs to the given 147 port. 147 port. 148 148 149 __iomem pointer tokens << 150 ====================== << 151 << 152 The data type for an MMIO address is an ``__io << 153 ``void __iomem *reg``. On most architectures i << 154 points to a virtual memory address and can be << 155 portable code, it must only be passed from and << 156 operated on an ``__iomem`` token, in particula << 157 readl()/writel() functions. The 'sparse' seman << 158 verify that this is done correctly. << 159 << 160 While on most architectures, ioremap() creates << 161 uncached virtual address pointing to the physi << 162 architectures require special instructions for << 163 just encodes the physical address or an offset << 164 by readl()/writel(). << 165 << 166 Differences between I/O access functions << 167 ======================================== << 168 << 169 readq(), readl(), readw(), readb(), writeq(), << 170 << 171 These are the most generic accessors, provid << 172 MMIO accesses and DMA accesses as well as fi << 173 little-endian PCI devices and on-chip periph << 174 should generally use these for any access to << 175 << 176 Note that posted writes are not strictly ord << 177 Documentation/driver-api/io_ordering.rst. << 178 << 179 readq_relaxed(), readl_relaxed(), readw_relaxe << 180 writeq_relaxed(), writel_relaxed(), writew_rel << 181 << 182 On architectures that require an expensive b << 183 DMA, these "relaxed" versions of the MMIO ac << 184 each other, but contain a less expensive bar << 185 might use these in a particularly performanc << 186 comment that explains why the usage in a spe << 187 the extra barriers. << 188 << 189 See memory-barriers.txt for a more detailed << 190 guarantees of the non-relaxed and relaxed ve << 191 << 192 ioread64(), ioread32(), ioread16(), ioread8(), << 193 iowrite64(), iowrite32(), iowrite16(), iowrite << 194 << 195 These are an alternative to the normal readl << 196 identical behavior, but they can also operat << 197 for mapping PCI I/O space with pci_iomap() o << 198 that require special instructions for I/O po << 199 overhead for an indirect function call imple << 200 other architectures, these are simply aliase << 201 << 202 ioread64be(), ioread32be(), ioread16be() << 203 iowrite64be(), iowrite32be(), iowrite16be() << 204 << 205 These behave in the same way as the ioread32 << 206 reversed byte order, for accessing devices w << 207 Device drivers that can operate on either bi << 208 registers may have to implement a custom wra << 209 the other depending on which device was foun << 210 << 211 Note: On some architectures, the normal read << 212 traditionally assume that devices are the sa << 213 using a hardware byte-reverse on the PCI bus << 214 Drivers that use readl()/writel() this way a << 215 tend to be limited to a particular SoC. << 216 << 217 hi_lo_readq(), lo_hi_readq(), hi_lo_readq_rela << 218 ioread64_lo_hi(), ioread64_hi_lo(), ioread64be << 219 hi_lo_writeq(), lo_hi_writeq(), hi_lo_writeq_r << 220 iowrite64_lo_hi(), iowrite64_hi_lo(), iowrite6 << 221 << 222 Some device drivers have 64-bit registers th << 223 on 32-bit architectures but allow two consec << 224 Since it depends on the particular device wh << 225 accessed first, a helper is provided for eac << 226 with either low/high or high/low word orderi << 227 either <linux/io-64-nonatomic-lo-hi.h> or <l << 228 get the function definitions along with help << 229 readq()/writeq() to them on architectures th << 230 natively. << 231 << 232 __raw_readq(), __raw_readl(), __raw_readw(), _ << 233 __raw_writeq(), __raw_writel(), __raw_writew() << 234 << 235 These are low-level MMIO accessors without b << 236 architecture specific behavior. Accesses are << 237 a four-byte __raw_readl() does not get split << 238 multiple consecutive accesses can be combine << 239 is only safe to use these to access memory b << 240 registers, as there are no ordering guarante << 241 accesses or even spinlocks. The byte order i << 242 memory, so unlike the other functions, these << 243 kernel memory and device memory. << 244 << 245 inl(), inw(), inb(), outl(), outw(), outb() << 246 << 247 PCI I/O port resources traditionally require << 248 implemented using special instructions on th << 249 architectures, these are mapped to readl()/w << 250 internally, usually pointing to a fixed area << 251 ``__iomem`` pointer, the address is a 32-bit << 252 number. PCI requires I/O port access to be n << 253 must complete before the following code exec << 254 still be in progress. On architectures that << 255 access is therefore ordered against spinlock << 256 implementations and CPU architectures howeve << 257 space on PCI, so they can end up being poste << 258 << 259 In some architectures, the I/O port number s << 260 ``__iomem`` pointers, but this is not recomm << 261 not rely on that for portability. Similarly, << 262 in a PCI base address register may not corre << 263 by a device driver. Portable drivers need to << 264 resource provided by the kernel. << 265 << 266 There are no direct 64-bit I/O port accessor << 267 with ioread64/iowrite64 can be used instead. << 268 << 269 inl_p(), inw_p(), inb_p(), outl_p(), outw_p(), << 270 << 271 On ISA devices that require specific timing, << 272 accessors add a small delay. On architecture << 273 these are aliases to the normal inb/outb hel << 274 << 275 readsq, readsl, readsw, readsb << 276 writesq, writesl, writesw, writesb << 277 ioread64_rep, ioread32_rep, ioread16_rep, iore << 278 iowrite64_rep, iowrite32_rep, iowrite16_rep, i << 279 insl, insw, insb, outsl, outsw, outsb << 280 << 281 These are helpers that access the same addre << 282 data between kernel memory byte stream and a << 283 MMIO accessors, these do not perform a bytes << 284 first byte in the FIFO register corresponds << 285 buffer regardless of the architecture. << 286 << 287 Device memory mapping modes << 288 =========================== << 289 << 290 Some architectures support multiple modes for << 291 ioremap_*() variants provide a common abstract << 292 architecture-specific modes, with a shared set << 293 << 294 ioremap() is the most common mapping type, and << 295 memory (e.g. I/O registers). Other modes can o << 296 guarantees, if supported by the architecture. << 297 are as follows: << 298 << 299 ioremap() << 300 --------- << 301 << 302 The default mode, suitable for most memory-map << 303 registers. Memory mapped using ioremap() has t << 304 << 305 * Uncached - CPU-side caches are bypassed, and << 306 directly by the device << 307 * No speculative operations - the CPU may not << 308 memory, unless the instruction that does so << 309 program flow. << 310 * No reordering - The CPU may not reorder acce << 311 respect to each other. On some architectures << 312 readl_relaxed()/writel_relaxed(). << 313 * No repetition - The CPU may not issue multip << 314 program instruction. << 315 * No write-combining - Each I/O operation resu << 316 being issued to the device, and multiple wri << 317 writes. This may or may not be enforced when << 318 pointer dereferences. << 319 * Non-executable - The CPU is not allowed to s << 320 from this memory (it probably goes without s << 321 allowed to jump into device memory). << 322 << 323 On many platforms and buses (e.g. PCI), writes << 324 mappings are posted, which means that the CPU << 325 actually reach the target device before retiri << 326 << 327 On many platforms, I/O accesses must be aligne << 328 size; failure to do so will result in an excep << 329 << 330 ioremap_wc() << 331 ------------ << 332 << 333 Maps I/O memory as normal memory with write co << 334 << 335 * The CPU may speculatively issue reads from t << 336 didn't actually execute, and may choose to b << 337 * The CPU may reorder operations as long as th << 338 program's point of view. << 339 * The CPU may write to the same location multi << 340 issued a single write. << 341 * The CPU may combine several writes into a si << 342 << 343 This mode is typically used for video framebuf << 344 performance of writes. It can also be used for << 345 devices (e.g. buffers or shared memory), but c << 346 not guaranteed to be ordered with respect to n << 347 accesses without explicit barriers. << 348 << 349 On a PCI bus, it is usually safe to use iorema << 350 ``IORESOURCE_PREFETCH``, but it may not be use << 351 For on-chip devices, there is no corresponding << 352 ioremap_wc() on a device that is known to be s << 353 << 354 ioremap_wt() << 355 ------------ << 356 << 357 Maps I/O memory as normal memory with write-th << 358 but also, << 359 << 360 * The CPU may cache writes issued to and reads << 361 from that cache. << 362 << 363 This mode is sometimes used for video framebuf << 364 writes to reach the device in a timely manner << 365 cache), but reads may be served from the cache << 366 rarely useful these days, as framebuffer drive << 367 for which ioremap_wc() is more efficient (as i << 368 cache). Most drivers should not use this. << 369 << 370 ioremap_np() << 371 ------------ << 372 << 373 Like ioremap(), but explicitly requests non-po << 374 architectures and buses, ioremap() mappings ha << 375 means that writes can appear to "complete" fro << 376 CPU before the written data actually arrives a << 377 still ordered with respect to other writes and << 378 due to the posted write semantics, this is not << 379 devices. ioremap_np() explicitly requests non- << 380 that the write instruction will not appear to << 381 received (and to some platform-specific extent << 382 << 383 This mapping mode primarily exists to cater fo << 384 require this particular mapping mode to work c << 385 ``IORESOURCE_MEM_NONPOSTED`` flag for a resour << 386 semantics and portable drivers should use an a << 387 selects it where appropriate (see the `Higher- << 388 section below). << 389 << 390 The bare ioremap_np() is only available on som << 391 always returns NULL. Drivers should not normal << 392 platform-specific or they derive benefit from << 393 supported, and can fall back to ioremap() othe << 394 ensure posted write completion is to do a dumm << 395 explained in `Accessing the device`_, which wo << 396 platforms. << 397 << 398 ioremap_np() should never be used for PCI driv << 399 always posted, even on architectures that othe << 400 Using ioremap_np() for PCI BARs will at best r << 401 and at worst result in complete breakage. << 402 << 403 Note that non-posted write semantics are ortho << 404 guarantees. A CPU may still choose to issue ot << 405 non-posted write instruction retires. See the << 406 functions for details on the CPU side of thing << 407 << 408 ioremap_uc() << 409 ------------ << 410 << 411 ioremap_uc() is only meaningful on old x86-32 << 412 and on ia64 with its slightly unconventional i << 413 elss ioremap_uc() defaults to return NULL. << 414 << 415 << 416 Portable drivers should avoid the use of iorem << 417 << 418 ioremap_cache() << 419 --------------- << 420 << 421 ioremap_cache() effectively maps I/O memory as << 422 caches can be used, and the CPU is free to tre << 423 block of RAM. This should never be used for de << 424 effects of any kind, or which does not return << 425 read. << 426 << 427 It should also not be used for actual RAM, as << 428 ``__iomem`` token. memremap() can be used for << 429 of the linear kernel memory area to a regular << 430 << 431 Portable drivers should avoid the use of iorem << 432 << 433 Architecture example << 434 -------------------- << 435 << 436 Here is how the above modes map to memory attr << 437 architecture: << 438 << 439 +------------------------+-------------------- << 440 | API | Memory region type << 441 +------------------------+-------------------- << 442 | ioremap_np() | Device-nGnRnE << 443 +------------------------+-------------------- << 444 | ioremap() | Device-nGnRE << 445 +------------------------+-------------------- << 446 | ioremap_uc() | (not implemented) << 447 +------------------------+-------------------- << 448 | ioremap_wc() | Normal-Non Cacheabl << 449 +------------------------+-------------------- << 450 | ioremap_wt() | (not implemented; f << 451 +------------------------+-------------------- << 452 | ioremap_cache() | Normal-Write-Back C << 453 +------------------------+-------------------- << 454 << 455 Higher-level ioremap abstractions << 456 ================================= << 457 << 458 Instead of using the above raw ioremap() modes << 459 higher-level APIs. These APIs may implement pl << 460 automatically choose an appropriate ioremap mo << 461 a platform-agnostic driver to work on those pl << 462 cases. At the time of this writing, the follow << 463 logic: << 464 << 465 devm_ioremap_resource() << 466 << 467 Can automatically select ioremap_np() over i << 468 requirements, if the ``IORESOURCE_MEM_NONPOS << 469 resource. Uses devres to automatically unmap << 470 probe() function fails or a device in unboun << 471 << 472 Documented in Documentation/driver-api/drive << 473 << 474 of_address_to_resource() << 475 << 476 Automatically sets the ``IORESOURCE_MEM_NONP << 477 require non-posted writes for certain buses << 478 posted-mmio device tree properties). << 479 << 480 of_iomap() << 481 << 482 Maps the resource described in a ``reg`` pro << 483 all required translations. Automatically sel << 484 platform requirements, as above. << 485 << 486 pci_ioremap_bar(), pci_ioremap_wc_bar() << 487 << 488 Maps the resource described in a PCI base ad << 489 the physical address first. << 490 << 491 pci_iomap(), pci_iomap_wc() << 492 << 493 Like pci_ioremap_bar()/pci_ioremap_bar(), bu << 494 used together with ioread32()/iowrite32() an << 495 << 496 pcim_iomap() << 497 << 498 Like pci_iomap(), but uses devres to automat << 499 the driver probe() function fails or a devic << 500 << 501 Documented in Documentation/driver-api/drive << 502 << 503 Not using these wrappers may make drivers unus << 504 stricter rules for mapping I/O memory. << 505 << 506 Generalizing Access to System and I/O Memory << 507 ============================================ << 508 << 509 .. kernel-doc:: include/linux/iosys-map.h << 510 :doc: overview << 511 << 512 .. kernel-doc:: include/linux/iosys-map.h << 513 :internal: << 514 << 515 Public Functions Provided 149 Public Functions Provided 516 ========================= 150 ========================= 517 151 518 .. kernel-doc:: arch/x86/include/asm/io.h 152 .. kernel-doc:: arch/x86/include/asm/io.h 519 :internal: 153 :internal: >> 154 >> 155 .. kernel-doc:: lib/pci_iomap.c >> 156 :export:
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.