1 ================================== 2 vfio-ccw: the basic infrastructure 3 ================================== 4 5 Introduction 6 ------------ 7 8 Here we describe the vfio support for I/O subc 9 Linux/s390. Motivation for vfio-ccw is to pass 10 virtual machine, while vfio is the means. 11 12 Different than other hardware architectures, s 13 I/O access method, which is so called Channel 14 patterns: 15 16 - Channel programs run asynchronously on a sep 17 - The channel subsystem will access any memory 18 in the channel program directly, i.e. there 19 20 Thus when we introduce vfio support for these 21 with a mediated device (mdev) implementation. 22 added to an iommu group, so as to make itself 23 vfio framework. And we add read/write callback 24 regions to pass the channel programs from the 25 (the real I/O subchannel device) to do further 26 to perform I/O instructions. 27 28 This document does not intend to explain the s 29 every detail. More information/reference could 30 31 - A good start to know Channel I/O in general: 32 https://en.wikipedia.org/wiki/Channel_I/O 33 - s390 architecture: 34 s390 Principles of Operation manual (IBM For 35 - The existing QEMU code which implements a si 36 subsystem could also be a good reference. It 37 the flow. 38 qemu/hw/s390x/css.c 39 40 For vfio mediated device framework: 41 - Documentation/driver-api/vfio-mediated-devic 42 43 Motivation of vfio-ccw 44 ---------------------- 45 46 Typically, a guest virtualized via QEMU/KVM on 47 paravirtualized virtio devices via the "Virtio 48 (virtio-ccw)" transport. This makes virtio dev 49 standard operating system algorithms for handl 50 51 However this is not enough. On s390 for the ma 52 use the standard Channel I/O based mechanism, 53 the functionality of passing through them to a 54 This includes devices that don't have a virtio 55 drives) or that have specific characteristics 56 exploit. 57 58 For passing a device to a guest, we want to us 59 everybody else, namely vfio. We implement this 60 devices via the vfio mediated device framework 61 driver "vfio_ccw". 62 63 Access patterns of CCW devices 64 ------------------------------ 65 66 s390 architecture has implemented a so called 67 provides a unified view of the devices physica 68 systems. Though the s390 hardware platform kno 69 different peripheral attachments like disk dev 70 communication controllers, etc. They can all b 71 defined access method and they are presenting 72 way: I/O interruptions. 73 74 All I/O requires the use of channel command wo 75 instruction to a specialized I/O channel proce 76 a sequence of CCWs which are executed by the I 77 issue a channel program to the channel subsyst 78 build an operation request block (ORB), which 79 the format of the CCW and other control inform 80 operating system signals the I/O channel subsy 81 the channel program with a SSCH (start sub-cha 82 central processor is then free to proceed with 83 until interrupted. The I/O completion result i 84 interrupt handler in the form of interrupt res 85 86 Back to vfio-ccw, in short: 87 88 - ORBs and channel programs are built in guest 89 physical addresses). 90 - ORBs and channel programs are passed to the 91 - Host kernel translates the guest physical ad 92 and starts the I/O with issuing a privileged 93 (e.g SSCH). 94 - channel programs run asynchronously on a sep 95 - I/O completion will be signaled to the host 96 And it will be copied as IRB to user space t 97 guest. 98 99 Physical vfio ccw device and its child mdev 100 ------------------------------------------- 101 102 As mentioned above, we realize vfio-ccw with a 103 104 Channel I/O does not have IOMMU hardware suppo 105 vfio-ccw device does not have an IOMMU level t 106 107 Subchannel I/O instructions are all privileged 108 handling the I/O instruction interception, vfi 109 policing and translation how the channel progr 110 it gets sent to hardware. 111 112 Within this implementation, we have two driver 113 devices: 114 115 - The vfio_ccw driver for the physical subchan 116 This is an I/O subchannel driver for the rea 117 realizes a group of callbacks and registers 118 parent (physical) device. As a consequence, 119 generic interface (sysfs) to create mdev dev 120 created by vfio_ccw then and added to the me 121 device that added to an IOMMU group and a vf 122 vfio_ccw also provides an I/O region to acce 123 request from user space and store I/O interr 124 space to retrieve. To notify user space an I 125 an interface to setup an eventfd fd for asyn 126 127 - The vfio_mdev driver for the mediated vfio c 128 This is provided by the mdev framework. It i 129 the mdev that created by vfio_ccw. 130 It realizes a group of vfio device driver ca 131 vfio group, and registers itself to the mdev 132 driver. 133 It uses a vfio iommu backend that uses the e 134 ioctls, but rather than programming them int 135 it simply stores the translations for use by 136 means that a device programmed in a VM with 137 can have the vfio kernel convert that addres 138 address, pin the page and program the hardwa 139 address in one step. 140 For a mdev, the vfio iommu backend will not 141 VFIO_IOMMU_MAP_DMA ioctl. Mdev framework wil 142 of the iova<->vaddr mappings in this operati 143 vfio_pin_pages and a vfio_unpin_pages interf 144 backend for the physical devices to pin and 145 146 Below is a high Level block diagram:: 147 148 +-------------+ 149 | | 150 | +---------+ | mdev_register_driver() +----- 151 | | Mdev | +<-----------------------+ 152 | | bus | | | vfio 153 | | driver | +----------------------->+ 154 | +---------+ | probe()/remove() +----- 155 | | 156 | MDEV CORE | 157 | MODULE | 158 | mdev.ko | 159 | +---------+ | mdev_register_parent() +----- 160 | |Physical | +<-----------------------+ 161 | | device | | | vfi 162 | |interface| +----------------------->+ 163 | +---------+ | callback +----- 164 +-------------+ 165 166 The process of how these work together. 167 168 1. vfio_ccw.ko drives the physical I/O subchan 169 physical device (with callbacks) to mdev fr 170 When vfio_ccw probing the subchannel device 171 pointer and callbacks to the mdev framework 172 under the device node in sysfs would be cre 173 device, namely 'mdev_create', 'mdev_destroy 174 'mdev_supported_types'. 175 2. Create a mediated vfio ccw device. 176 Use the 'mdev_create' sysfs file, we need t 177 only one for our case) mediated device. 178 3. vfio_mdev.ko drives the mediated ccw device 179 vfio_mdev is also the vfio device driver. I 180 add it to an iommu_group and a vfio_group. 181 the mdev to a guest. 182 183 184 VFIO-CCW Regions 185 ---------------- 186 187 The vfio-ccw driver exposes MMIO regions to ac 188 results to userspace. 189 190 vfio-ccw I/O region 191 ------------------- 192 193 An I/O region is used to accept channel progra 194 space and store I/O interrupt result for user 195 definition of the region is:: 196 197 struct ccw_io_region { 198 #define ORB_AREA_SIZE 12 199 __u8 orb_area[ORB_AREA_SIZE]; 200 #define SCSW_AREA_SIZE 12 201 __u8 scsw_area[SCSW_AREA_SIZE]; 202 #define IRB_AREA_SIZE 96 203 __u8 irb_area[IRB_AREA_SIZE]; 204 __u32 ret_code; 205 } __packed; 206 207 This region is always available. 208 209 While starting an I/O request, orb_area should 210 guest ORB, and scsw_area should be filled with 211 Subchannel. 212 213 irb_area stores the I/O result. 214 215 ret_code stores a return code for each access 216 values may occur: 217 218 ``0`` 219 The operation was successful. 220 221 ``-EOPNOTSUPP`` 222 The ORB specified transport mode or the 223 SCSW specified a function other than the sta 224 225 ``-EIO`` 226 A request was issued while the device was no 227 requests, or an internal error occurred. 228 229 ``-EBUSY`` 230 The subchannel was status pending or busy, o 231 232 ``-EAGAIN`` 233 A request was being processed, and the calle 234 235 ``-EACCES`` 236 The channel path(s) used for the I/O were fo 237 238 ``-ENODEV`` 239 The device was found to be not operational. 240 241 ``-EINVAL`` 242 The orb specified a chain longer than 255 cc 243 occurred. 244 245 246 vfio-ccw cmd region 247 ------------------- 248 249 The vfio-ccw cmd region is used to accept asyn 250 from userspace:: 251 252 #define VFIO_CCW_ASYNC_CMD_HSCH (1 << 0) 253 #define VFIO_CCW_ASYNC_CMD_CSCH (1 << 1) 254 struct ccw_cmd_region { 255 __u32 command; 256 __u32 ret_code; 257 } __packed; 258 259 This region is exposed via region type VFIO_RE 260 261 Currently, CLEAR SUBCHANNEL and HALT SUBCHANNE 262 263 command specifies the command to be issued; re 264 for each access of the region. The following v 265 266 ``0`` 267 The operation was successful. 268 269 ``-ENODEV`` 270 The device was found to be not operational. 271 272 ``-EINVAL`` 273 A command other than halt or clear was speci 274 275 ``-EIO`` 276 A request was issued while the device was no 277 requests. 278 279 ``-EAGAIN`` 280 A request was being processed, and the calle 281 282 ``-EBUSY`` 283 The subchannel was status pending or busy wh 284 285 vfio-ccw schib region 286 --------------------- 287 288 The vfio-ccw schib region is used to return Su 289 Block (SCHIB) data to userspace:: 290 291 struct ccw_schib_region { 292 #define SCHIB_AREA_SIZE 52 293 __u8 schib_area[SCHIB_AREA_SIZE]; 294 } __packed; 295 296 This region is exposed via region type VFIO_RE 297 298 Reading this region triggers a STORE SUBCHANNE 299 associated hardware. 300 301 vfio-ccw crw region 302 --------------------- 303 304 The vfio-ccw crw region is used to return Chan 305 data to userspace:: 306 307 struct ccw_crw_region { 308 __u32 crw; 309 __u32 pad; 310 } __packed; 311 312 This region is exposed via region type VFIO_RE 313 314 Reading this region returns a CRW if one that 315 subchannel (e.g. one reporting changes in chan 316 pending, or all zeroes if not. If multiple CRW 317 possibly chained CRWs), reading this region ag 318 one, until no more CRWs are pending and zeroes 319 similar to how STORE CHANNEL REPORT WORD works 320 321 vfio-ccw operation details 322 -------------------------- 323 324 vfio-ccw follows what vfio-pci did on the s390 325 vfio-iommu-type1 as the vfio iommu backend. 326 327 * CCW translation APIs 328 A group of APIs (start with `cp_`) to do CCW 329 passed in by a user space program are organi 330 physical memory addresses. These APIs will c 331 space, and assemble a runnable kernel channe 332 guest physical addresses with their correspo 333 Note that we have to use IDALs even for dire 334 referenced memory can be located anywhere, i 335 336 * vfio_ccw device driver 337 This driver utilizes the CCW translation API 338 vfio_ccw, which is the driver for the I/O su 339 to pass through. 340 vfio_ccw implements the following vfio ioctl 341 342 VFIO_DEVICE_GET_INFO 343 VFIO_DEVICE_GET_IRQ_INFO 344 VFIO_DEVICE_GET_REGION_INFO 345 VFIO_DEVICE_RESET 346 VFIO_DEVICE_SET_IRQS 347 348 This provides an I/O region, so that the use 349 channel program to the kernel, to do further 350 issuing them to a real device. 351 This also provides the SET_IRQ ioctl to setu 352 notify the user space program the I/O comple 353 way. 354 355 The use of vfio-ccw is not limited to QEMU, wh 356 good example to get understand how these patch 357 bit more detail how an I/O request triggered b 358 handled (without error handling). 359 360 Explanation: 361 362 - Q1-Q7: QEMU side process. 363 - K1-K5: Kernel side process. 364 365 Q1. 366 Get I/O region info during initialization. 367 368 Q2. 369 Setup event notifier and handler to handle 370 371 ... ... 372 373 Q3. 374 Intercept a ssch instruction. 375 Q4. 376 Write the guest channel program and ORB to 377 378 K1. 379 Copy from guest to kernel. 380 K2. 381 Translate the guest channel program to 382 channel program, which becomes runnabl 383 K3. 384 With the necessary information contain 385 by QEMU, issue the ccwchain to the dev 386 K4. 387 Return the ssch CC code. 388 Q5. 389 Return the CC code to the guest. 390 391 ... ... 392 393 K5. 394 Interrupt handler gets the I/O result 395 the I/O region. 396 K6. 397 Signal QEMU to retrieve the result. 398 399 Q6. 400 Get the signal and event handler reads out 401 region. 402 Q7. 403 Update the irb for the guest. 404 405 Limitations 406 ----------- 407 408 The current vfio-ccw implementation focuses on 409 needed to implement block device functionality 410 device only. Some commands may need special ha 411 example, anything related to path grouping. 412 413 DASD is a kind of storage device. While ECKD i 414 More information for DASD and ECKD could be fo 415 https://en.wikipedia.org/wiki/Direct-access_st 416 https://en.wikipedia.org/wiki/Count_key_data 417 418 Together with the corresponding work in QEMU, 419 through DASD/ECKD device online in a guest now 420 device. 421 422 The current code allows the guest to start cha 423 START SUBCHANNEL, and to issue HALT SUBCHANNEL 424 and STORE SUBCHANNEL. 425 426 Currently all channel programs are prefetched, 427 p-bit setting in the ORB. As a result, self m 428 programs are not supported. For this reason, 429 a special case by a userspace/guest program; t 430 in QEMU's s390-ccw bios as of QEMU 4.1. 431 432 vfio-ccw supports classic (command mode) chann 433 mode (HPF) is not supported. 434 435 QDIO subchannels are currently not supported. 436 DASD/ECKD might work, but have not been tested 437 438 Reference 439 --------- 440 1. ESA/s390 Principles of Operation manual (IB 441 2. ESA/390 Common I/O Device Commands manual ( 442 3. https://en.wikipedia.org/wiki/Channel_I/O 443 4. Documentation/arch/s390/cds.rst 444 5. Documentation/driver-api/vfio.rst 445 6. Documentation/driver-api/vfio-mediated-devi
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.