1 .. SPDX-License-Identifier: GPL-2.0 2 3 ============================================== 4 Multi-Queue Block IO Queueing Mechanism (blk-m 5 ============================================== 6 7 The Multi-Queue Block IO Queueing Mechanism is 8 devices to achieve a huge number of input/outp 9 through queueing and submitting IO requests to 10 benefiting from the parallelism offered by mod 11 12 Introduction 13 ============ 14 15 Background 16 ---------- 17 18 Magnetic hard disks have been the de facto sta 19 development of the kernel. The Block IO subsys 20 performance possible for those devices with a 21 access, and the bottleneck was the mechanical 22 any layer on the storage stack. One example of 23 involves ordering read/write requests accordin 24 hard disk head. 25 26 However, with the development of Solid State D 27 without mechanical parts nor random access pen 28 high parallel access, the bottleneck of the st 29 device to the operating system. In order to ta 30 in those devices' design, the multi-queue mech 31 32 The former design had a single queue to store 33 lock. That did not scale well in SMP systems d 34 bottleneck of having a single lock for multipl 35 suffered with congestion when different proces 36 to different CPUs) wanted to perform block IO. 37 spawns multiple queues with individual entry p 38 the need for a lock. A deeper explanation on h 39 following section (`Operation`_). 40 41 Operation 42 --------- 43 44 When the userspace performs IO to a block devi 45 for instance), blk-mq takes action: it will st 46 the block device, acting as middleware between 47 system, if present) and the block device drive 48 49 blk-mq has two group of queues: software stagi 50 queues. When the request arrives at the block 51 path possible: send it directly to the hardwar 52 cases that it might not do that: if there's an 53 layer or if we want to try to merge requests. 54 sent to the software queue. 55 56 Then, after the requests are processed by soft 57 at the hardware queue, a second stage queue wh 58 to process those requests. However, if the har 59 resources to accept more requests, blk-mq will 60 queue, to be sent in the future, when the hard 61 62 Software staging queues 63 ~~~~~~~~~~~~~~~~~~~~~~~ 64 65 The block IO subsystem adds requests in the so 66 (represented by struct blk_mq_ctx) in case tha 67 directly to the driver. A request is one or mo 68 block layer through the data structure struct 69 will then build a new structure from it, the s 70 be used to communicate with the device driver. 71 the number of queues is defined by a per-CPU o 72 73 The staging queue can be used to merge request 74 instance, requests for sector 3-6, 6-7, 7-9 ca 75 Even if random access to SSDs and NVMs have th 76 to sequential access, grouped requests for seq 77 number of individual requests. This technique 78 plugging. 79 80 Along with that, the requests can be reordered 81 resources (e.g. to ensure that no application 82 improve IO performance, by an IO scheduler. 83 84 IO Schedulers 85 ^^^^^^^^^^^^^ 86 87 There are several schedulers implemented by th 88 a heuristic to improve the IO performance. The 89 and play), in the sense of they can be selecte 90 can read more about Linux's IO schedulers `her 91 <https://www.kernel.org/doc/html/latest/block/ 92 happens only between requests in the same queu 93 requests from different queues, otherwise ther 94 need to have a lock for each queue. After the 95 eligible to be sent to the hardware. One of th 96 selected is the NONE scheduler, the most strai 97 place requests on whatever software queue the 98 any reordering. When the device starts process 99 queue (a.k.a. run the hardware queue), the sof 100 hardware queue will be drained in sequence acc 101 102 Hardware dispatch queues 103 ~~~~~~~~~~~~~~~~~~~~~~~~ 104 105 The hardware queue (represented by struct blk_ 106 used by device drivers to map the device submi 107 buffer), and are the last step of the block la 108 low level device driver taking ownership of th 109 block layer removes requests from the associat 110 dispatch to the hardware. 111 112 If it's not possible to send the requests dire 113 added to a linked list (``hctx->dispatch``) of 114 next time the block layer runs a queue, it wil 115 ``dispatch`` list first, to ensure a fairness 116 requests that were ready to be sent first. The 117 depends on the number of hardware contexts sup 118 device driver, but it will not be more than th 119 There is no reordering at this stage, and each 120 hardware queues to send requests for. 121 122 .. note:: 123 124 Neither the block layer nor the device 125 the order of completion of requests. T 126 higher layers, like the filesystem. 127 128 Tag-based completion 129 ~~~~~~~~~~~~~~~~~~~~ 130 131 In order to indicate which request has been co 132 identified by an integer, ranging from 0 to th 133 is generated by the block layer and later reus 134 the need to create a redundant identifier. Whe 135 driver, the tag is sent back to the block laye 136 This removes the need to do a linear search to 137 completed. 138 139 Further reading 140 --------------- 141 142 - `Linux Block IO: Introducing Multi-queue SSD 143 144 - `NOOP scheduler <https://en.wikipedia.org/wi 145 146 - `Null block device driver <https://www.kerne 147 148 Source code documentation 149 ========================= 150 151 .. kernel-doc:: include/linux/blk-mq.h 152 153 .. kernel-doc:: block/blk-mq.c
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.