1 .. SPDX-License-Identifier: GPL-2.0 2 3 =========================================== 4 Userspace block device driver (ublk driver) 5 =========================================== 6 7 Overview 8 ======== 9 10 ublk is a generic framework for implementing b 11 The motivation behind it is that moving virtua 12 such as loop, nbd and similar can be very help 13 new virtual block device such as ublk-qcow2 (t 14 implementing qcow2 driver in kernel). 15 16 Userspace block devices are attractive because 17 18 - They can be written many programming languag 19 - They can use libraries that are not availabl 20 - They can be debugged with tools familiar to 21 - Crashes do not kernel panic the machine. 22 - Bugs are likely to have a lower security imp 23 code. 24 - They can be installed and updated independen 25 - They can be used to simulate block device ea 26 parameters/setting for test/debug purpose 27 28 ublk block device (``/dev/ublkb*``) is added b 29 on the device will be forwarded to ublk usersp 30 in this document, ``ublk server`` refers to ge 31 program. ``ublksrv`` [#userspace]_ is one of s 32 provides ``libublksrv`` [#userspace_lib]_ libr 33 user block device conveniently, while also gen 34 included, such as loop and null. Richard W.M. 35 ``nbdublk`` [#userspace_nbdublk]_ based on `` 36 37 After the IO is handled by userspace, the resu 38 driver, thus completing the request cycle. Thi 39 logic is totally done by userspace, such as lo 40 communication, or qcow2's IO mapping. 41 42 ``/dev/ublkb*`` is driven by blk-mq request-ba 43 assigned by one queue wide unique tag. ublk se 44 IO too, which is 1:1 mapped with IO of ``/dev/ 45 46 Both the IO request forward and IO handling re 47 ``io_uring`` passthrough command; that is why 48 block driver. It has been observed that using 49 give better IOPS than block IO; which is why u 50 implementation of userspace block device: not 51 done by io_uring, but also the preferred IO ha 52 based approach too. 53 54 ublk provides control interface to set/get ubl 55 The interface is extendable and kabi compatibl 56 queue's parameter or ublk generic feature para 57 interface. Thus, ublk is generic userspace blo 58 For example, it is easy to setup a ublk device 59 parameters from userspace. 60 61 Using ublk 62 ========== 63 64 ublk requires userspace ublk server to handle 65 66 Below is example of using ``ublksrv`` to provi 67 68 - add a device:: 69 70 ublk add -t loop -f ublk-loop.img 71 72 - format with xfs, then use it:: 73 74 mkfs.xfs /dev/ublkb0 75 mount /dev/ublkb0 /mnt 76 # do anything. all IOs are handled by io_ 77 ... 78 umount /mnt 79 80 - list the devices with their info:: 81 82 ublk list 83 84 - delete the device:: 85 86 ublk del -a 87 ublk del -n $ublk_dev_id 88 89 See usage details in README of ``ublksrv`` [#u 90 91 Design 92 ====== 93 94 Control plane 95 ------------- 96 97 ublk driver provides global misc device node ( 98 managing and controlling ublk devices with hel 99 100 - ``UBLK_CMD_ADD_DEV`` 101 102 Add a ublk char device (``/dev/ublkc*``) whi 103 WRT IO command communication. Basic device i 104 command. It sets UAPI structure of ``ublksrv 105 such as ``nr_hw_queues``, ``queue_depth``, a 106 for which the info is negotiated with the dr 107 When this command is completed, the basic de 108 109 - ``UBLK_CMD_SET_PARAMS`` / ``UBLK_CMD_GET_PAR 110 111 Set or get parameters of the device, which c 112 related, or request queue limit related, but 113 because the driver does not handle any IO lo 114 sent before sending ``UBLK_CMD_START_DEV``. 115 116 - ``UBLK_CMD_START_DEV`` 117 118 After the server prepares userspace resource 119 pthread & io_uring for handling ublk IO), th 120 driver for allocating & exposing ``/dev/ublk 121 ``UBLK_CMD_SET_PARAMS`` are applied for crea 122 123 - ``UBLK_CMD_STOP_DEV`` 124 125 Halt IO on ``/dev/ublkb*`` and remove the de 126 ublk server will release resources (such as 127 io_uring). 128 129 - ``UBLK_CMD_DEL_DEV`` 130 131 Remove ``/dev/ublkc*``. When this command re 132 number can be reused. 133 134 - ``UBLK_CMD_GET_QUEUE_AFFINITY`` 135 136 When ``/dev/ublkc`` is added, the driver cre 137 that each queue's affinity info is available 138 ``UBLK_CMD_GET_QUEUE_AFFINITY`` to retrieve 139 set up the per-queue context efficiently, su 140 pthread and try to allocate buffers in IO th 141 142 - ``UBLK_CMD_GET_DEV_INFO`` 143 144 For retrieving device info via ``ublksrv_ctr 145 responsibility to save IO target specific in 146 147 - ``UBLK_CMD_GET_DEV_INFO2`` 148 Same purpose with ``UBLK_CMD_GET_DEV_INFO``, 149 provide path of the char device of ``/dev/ub 150 permission check, and this command is added 151 ublk device, and introduced with ``UBLK_F_UN 152 Only the user owning the requested device ca 153 154 How to deal with userspace/kernel compatibil 155 156 1) if kernel is capable of handling ``UBLK_F 157 158 If ublk server supports ``UBLK_F_UNPRIVILE 159 160 ublk server should send ``UBLK_CMD_GET_DEV 161 unprivileged application needs to query de 162 when the application has no idea if ``UBLK 163 given the capability info is stateless, an 164 retrieve it via ``UBLK_CMD_GET_DEV_INFO2`` 165 166 If ublk server doesn't support ``UBLK_F_UN 167 168 ``UBLK_CMD_GET_DEV_INFO`` is always sent t 169 UBLK_F_UNPRIVILEGED_DEV isn't available fo 170 171 2) if kernel isn't capable of handling ``UBL 172 173 If ublk server supports ``UBLK_F_UNPRIVILE 174 175 ``UBLK_CMD_GET_DEV_INFO2`` is tried first, 176 ``UBLK_CMD_GET_DEV_INFO`` needs to be retr 177 ``UBLK_F_UNPRIVILEGED_DEV`` can't be set 178 179 If ublk server doesn't support ``UBLK_F_UN 180 181 ``UBLK_CMD_GET_DEV_INFO`` is always sent t 182 ``UBLK_F_UNPRIVILEGED_DEV`` isn't availabl 183 184 - ``UBLK_CMD_START_USER_RECOVERY`` 185 186 This command is valid if ``UBLK_F_USER_RECOV 187 command is accepted after the old process ha 188 and ``/dev/ublkc*`` is released. User should 189 a new process which re-opens ``/dev/ublkc*`` 190 ublk device is ready for the new process. 191 192 - ``UBLK_CMD_END_USER_RECOVERY`` 193 194 This command is valid if ``UBLK_F_USER_RECOV 195 command is accepted after ublk device is qui 196 opened ``/dev/ublkc*`` and get all ublk queu 197 returns, ublk device is unquiesced and new I 198 new process. 199 200 - user recovery feature description 201 202 Two new features are added for user recovery 203 ``UBLK_F_USER_RECOVERY_REISSUE``. 204 205 With ``UBLK_F_USER_RECOVERY`` set, after one 206 handler) is dying, ublk does not delete ``/d 207 recovery stage and ublk device ID is kept. I 208 responsibility to recover the device context 209 Requests which have not been issued to users 210 which have been issued to userspace are abor 211 212 With ``UBLK_F_USER_RECOVERY_REISSUE`` set, a 213 server's io handler) is dying, contrary to ` 214 requests which have been issued to userspace 215 re-issued to the new process after handling 216 ``UBLK_F_USER_RECOVERY_REISSUE`` is designed 217 double-write since the driver may issue the 218 might be useful to a read-only FS or a VM ba 219 220 Unprivileged ublk device is supported by passi 221 Once the flag is set, all control commands can 222 user. Except for command of ``UBLK_CMD_ADD_DEV 223 the specified char device(``/dev/ublkc*``) is 224 commands by ublk driver, for doing that, path 225 be provided in these commands' payload from ub 226 ublk device becomes container-ware, and device 227 can be controlled/accessed just inside this co 228 229 Data plane 230 ---------- 231 232 ublk server needs to create per-queue IO pthre 233 commands via io_uring passthrough. The per-que 234 focuses on IO handling and shouldn't handle an 235 tasks. 236 237 The's IO is assigned by a unique tag, which is 238 request of ``/dev/ublkb*``. 239 240 UAPI structure of ``ublksrv_io_desc`` is defin 241 the driver. A fixed mmapped area (array) on `` 242 exporting IO info to the server; such as IO of 243 buffer address. Each ``ublksrv_io_desc`` insta 244 and IO tag directly. 245 246 The following IO commands are communicated via 247 and each command is only for forwarding the IO 248 with specified IO tag in the command data: 249 250 - ``UBLK_IO_FETCH_REQ`` 251 252 Sent from the server IO pthread for fetching 253 destined to ``/dev/ublkb*``. This command is 254 IO pthread for ublk driver to setup IO forwa 255 256 - ``UBLK_IO_COMMIT_AND_FETCH_REQ`` 257 258 When an IO request is destined to ``/dev/ubl 259 the IO's ``ublksrv_io_desc`` to the specifie 260 previous received IO command of this IO tag 261 or ``UBLK_IO_COMMIT_AND_FETCH_REQ)`` is comp 262 the IO notification via io_uring. 263 264 After the server handles the IO, its result 265 driver by sending ``UBLK_IO_COMMIT_AND_FETCH 266 received this command, it parses the result 267 ``/dev/ublkb*``. In the meantime setup envir 268 requests with the same IO tag. That is, ``UB 269 is reused for both fetching request and comm 270 271 - ``UBLK_IO_NEED_GET_DATA`` 272 273 With ``UBLK_F_NEED_GET_DATA`` enabled, the W 274 issued to ublk server without data copy. The 275 receives the request and it can allocate dat 276 inside this new io command. After the kernel 277 data copy is done from request pages to this 278 backend receives the request again with data 279 truly handle the request. 280 281 ``UBLK_IO_NEED_GET_DATA`` adds one additiona 282 io_uring_enter() syscall. Any user thinks th 283 should not enable UBLK_F_NEED_GET_DATA. ublk 284 buffer for each IO by default. Any new proje 285 buffer to communicate with ublk driver. Howe 286 break or not able to consume the new buffer 287 command is added for backwards compatibility 288 can still consume existing buffers. 289 290 - data copy between ublk server IO buffer and 291 292 The driver needs to copy the block IO reques 293 (pages) first for WRITE before notifying the 294 that the server can handle WRITE request. 295 296 When the server handles READ request and sen 297 ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the serv 298 the server buffer (pages) read to the IO req 299 300 Future development 301 ================== 302 303 Zero copy 304 --------- 305 306 Zero copy is a generic requirement for nbd, fu 307 problem [#xiaoguang]_ Xiaoguang mentioned is t 308 can't be remapped any more in kernel with exis 309 occurs when destining direct IO to ``/dev/ublk 310 big requests (IO size >= 256 KB) may benefit a 311 312 313 References 314 ========== 315 316 .. [#userspace] https://github.com/ming1/ubdsr 317 318 .. [#userspace_lib] https://github.com/ming1/u 319 320 .. [#userspace_nbdublk] https://gitlab.com/rwm 321 322 .. [#userspace_readme] https://github.com/ming 323 324 .. [#stefan] https://lore.kernel.org/linux-blo 325 326 .. [#xiaoguang] https://lore.kernel.org/linux-
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.