~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/arch/s390/vfio-ccw.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 ==================================
  2 vfio-ccw: the basic infrastructure
  3 ==================================
  4 
  5 Introduction
  6 ------------
  7 
  8 Here we describe the vfio support for I/O subchannel devices for
  9 Linux/s390. Motivation for vfio-ccw is to passthrough subchannels to a
 10 virtual machine, while vfio is the means.
 11 
 12 Different than other hardware architectures, s390 has defined a unified
 13 I/O access method, which is so called Channel I/O. It has its own access
 14 patterns:
 15 
 16 - Channel programs run asynchronously on a separate (co)processor.
 17 - The channel subsystem will access any memory designated by the caller
 18   in the channel program directly, i.e. there is no iommu involved.
 19 
 20 Thus when we introduce vfio support for these devices, we realize it
 21 with a mediated device (mdev) implementation. The vfio mdev will be
 22 added to an iommu group, so as to make itself able to be managed by the
 23 vfio framework. And we add read/write callbacks for special vfio I/O
 24 regions to pass the channel programs from the mdev to its parent device
 25 (the real I/O subchannel device) to do further address translation and
 26 to perform I/O instructions.
 27 
 28 This document does not intend to explain the s390 I/O architecture in
 29 every detail. More information/reference could be found here:
 30 
 31 - A good start to know Channel I/O in general:
 32   https://en.wikipedia.org/wiki/Channel_I/O
 33 - s390 architecture:
 34   s390 Principles of Operation manual (IBM Form. No. SA22-7832)
 35 - The existing QEMU code which implements a simple emulated channel
 36   subsystem could also be a good reference. It makes it easier to follow
 37   the flow.
 38   qemu/hw/s390x/css.c
 39 
 40 For vfio mediated device framework:
 41 - Documentation/driver-api/vfio-mediated-device.rst
 42 
 43 Motivation of vfio-ccw
 44 ----------------------
 45 
 46 Typically, a guest virtualized via QEMU/KVM on s390 only sees
 47 paravirtualized virtio devices via the "Virtio Over Channel I/O
 48 (virtio-ccw)" transport. This makes virtio devices discoverable via
 49 standard operating system algorithms for handling channel devices.
 50 
 51 However this is not enough. On s390 for the majority of devices, which
 52 use the standard Channel I/O based mechanism, we also need to provide
 53 the functionality of passing through them to a QEMU virtual machine.
 54 This includes devices that don't have a virtio counterpart (e.g. tape
 55 drives) or that have specific characteristics which guests want to
 56 exploit.
 57 
 58 For passing a device to a guest, we want to use the same interface as
 59 everybody else, namely vfio. We implement this vfio support for channel
 60 devices via the vfio mediated device framework and the subchannel device
 61 driver "vfio_ccw".
 62 
 63 Access patterns of CCW devices
 64 ------------------------------
 65 
 66 s390 architecture has implemented a so called channel subsystem, that
 67 provides a unified view of the devices physically attached to the
 68 systems. Though the s390 hardware platform knows about a huge variety of
 69 different peripheral attachments like disk devices (aka. DASDs), tapes,
 70 communication controllers, etc. They can all be accessed by a well
 71 defined access method and they are presenting I/O completion a unified
 72 way: I/O interruptions.
 73 
 74 All I/O requires the use of channel command words (CCWs). A CCW is an
 75 instruction to a specialized I/O channel processor. A channel program is
 76 a sequence of CCWs which are executed by the I/O channel subsystem.  To
 77 issue a channel program to the channel subsystem, it is required to
 78 build an operation request block (ORB), which can be used to point out
 79 the format of the CCW and other control information to the system. The
 80 operating system signals the I/O channel subsystem to begin executing
 81 the channel program with a SSCH (start sub-channel) instruction. The
 82 central processor is then free to proceed with non-I/O instructions
 83 until interrupted. The I/O completion result is received by the
 84 interrupt handler in the form of interrupt response block (IRB).
 85 
 86 Back to vfio-ccw, in short:
 87 
 88 - ORBs and channel programs are built in guest kernel (with guest
 89   physical addresses).
 90 - ORBs and channel programs are passed to the host kernel.
 91 - Host kernel translates the guest physical addresses to real addresses
 92   and starts the I/O with issuing a privileged Channel I/O instruction
 93   (e.g SSCH).
 94 - channel programs run asynchronously on a separate processor.
 95 - I/O completion will be signaled to the host with I/O interruptions.
 96   And it will be copied as IRB to user space to pass it back to the
 97   guest.
 98 
 99 Physical vfio ccw device and its child mdev
100 -------------------------------------------
101 
102 As mentioned above, we realize vfio-ccw with a mdev implementation.
103 
104 Channel I/O does not have IOMMU hardware support, so the physical
105 vfio-ccw device does not have an IOMMU level translation or isolation.
106 
107 Subchannel I/O instructions are all privileged instructions. When
108 handling the I/O instruction interception, vfio-ccw has the software
109 policing and translation how the channel program is programmed before
110 it gets sent to hardware.
111 
112 Within this implementation, we have two drivers for two types of
113 devices:
114 
115 - The vfio_ccw driver for the physical subchannel device.
116   This is an I/O subchannel driver for the real subchannel device.  It
117   realizes a group of callbacks and registers to the mdev framework as a
118   parent (physical) device. As a consequence, mdev provides vfio_ccw a
119   generic interface (sysfs) to create mdev devices. A vfio mdev could be
120   created by vfio_ccw then and added to the mediated bus. It is the vfio
121   device that added to an IOMMU group and a vfio group.
122   vfio_ccw also provides an I/O region to accept channel program
123   request from user space and store I/O interrupt result for user
124   space to retrieve. To notify user space an I/O completion, it offers
125   an interface to setup an eventfd fd for asynchronous signaling.
126 
127 - The vfio_mdev driver for the mediated vfio ccw device.
128   This is provided by the mdev framework. It is a vfio device driver for
129   the mdev that created by vfio_ccw.
130   It realizes a group of vfio device driver callbacks, adds itself to a
131   vfio group, and registers itself to the mdev framework as a mdev
132   driver.
133   It uses a vfio iommu backend that uses the existing map and unmap
134   ioctls, but rather than programming them into an IOMMU for a device,
135   it simply stores the translations for use by later requests. This
136   means that a device programmed in a VM with guest physical addresses
137   can have the vfio kernel convert that address to process virtual
138   address, pin the page and program the hardware with the host physical
139   address in one step.
140   For a mdev, the vfio iommu backend will not pin the pages during the
141   VFIO_IOMMU_MAP_DMA ioctl. Mdev framework will only maintain a database
142   of the iova<->vaddr mappings in this operation. And they export a
143   vfio_pin_pages and a vfio_unpin_pages interfaces from the vfio iommu
144   backend for the physical devices to pin and unpin pages by demand.
145 
146 Below is a high Level block diagram::
147 
148  +-------------+
149  |             |
150  | +---------+ | mdev_register_driver() +--------------+
151  | |  Mdev   | +<-----------------------+              |
152  | |  bus    | |                        | vfio_mdev.ko |
153  | | driver  | +----------------------->+              |<-> VFIO user
154  | +---------+ |    probe()/remove()    +--------------+    APIs
155  |             |
156  |  MDEV CORE  |
157  |   MODULE    |
158  |   mdev.ko   |
159  | +---------+ | mdev_register_parent() +--------------+
160  | |Physical | +<-----------------------+              |
161  | | device  | |                        |  vfio_ccw.ko |<-> subchannel
162  | |interface| +----------------------->+              |     device
163  | +---------+ |       callback         +--------------+
164  +-------------+
165 
166 The process of how these work together.
167 
168 1. vfio_ccw.ko drives the physical I/O subchannel, and registers the
169    physical device (with callbacks) to mdev framework.
170    When vfio_ccw probing the subchannel device, it registers device
171    pointer and callbacks to the mdev framework. Mdev related file nodes
172    under the device node in sysfs would be created for the subchannel
173    device, namely 'mdev_create', 'mdev_destroy' and
174    'mdev_supported_types'.
175 2. Create a mediated vfio ccw device.
176    Use the 'mdev_create' sysfs file, we need to manually create one (and
177    only one for our case) mediated device.
178 3. vfio_mdev.ko drives the mediated ccw device.
179    vfio_mdev is also the vfio device driver. It will probe the mdev and
180    add it to an iommu_group and a vfio_group. Then we could pass through
181    the mdev to a guest.
182 
183 
184 VFIO-CCW Regions
185 ----------------
186 
187 The vfio-ccw driver exposes MMIO regions to accept requests from and return
188 results to userspace.
189 
190 vfio-ccw I/O region
191 -------------------
192 
193 An I/O region is used to accept channel program request from user
194 space and store I/O interrupt result for user space to retrieve. The
195 definition of the region is::
196 
197   struct ccw_io_region {
198   #define ORB_AREA_SIZE 12
199           __u8    orb_area[ORB_AREA_SIZE];
200   #define SCSW_AREA_SIZE 12
201           __u8    scsw_area[SCSW_AREA_SIZE];
202   #define IRB_AREA_SIZE 96
203           __u8    irb_area[IRB_AREA_SIZE];
204           __u32   ret_code;
205   } __packed;
206 
207 This region is always available.
208 
209 While starting an I/O request, orb_area should be filled with the
210 guest ORB, and scsw_area should be filled with the SCSW of the Virtual
211 Subchannel.
212 
213 irb_area stores the I/O result.
214 
215 ret_code stores a return code for each access of the region. The following
216 values may occur:
217 
218 ``0``
219   The operation was successful.
220 
221 ``-EOPNOTSUPP``
222   The ORB specified transport mode or the
223   SCSW specified a function other than the start function.
224 
225 ``-EIO``
226   A request was issued while the device was not in a state ready to accept
227   requests, or an internal error occurred.
228 
229 ``-EBUSY``
230   The subchannel was status pending or busy, or a request is already active.
231 
232 ``-EAGAIN``
233   A request was being processed, and the caller should retry.
234 
235 ``-EACCES``
236   The channel path(s) used for the I/O were found to be not operational.
237 
238 ``-ENODEV``
239   The device was found to be not operational.
240 
241 ``-EINVAL``
242   The orb specified a chain longer than 255 ccws, or an internal error
243   occurred.
244 
245 
246 vfio-ccw cmd region
247 -------------------
248 
249 The vfio-ccw cmd region is used to accept asynchronous instructions
250 from userspace::
251 
252   #define VFIO_CCW_ASYNC_CMD_HSCH (1 << 0)
253   #define VFIO_CCW_ASYNC_CMD_CSCH (1 << 1)
254   struct ccw_cmd_region {
255          __u32 command;
256          __u32 ret_code;
257   } __packed;
258 
259 This region is exposed via region type VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD.
260 
261 Currently, CLEAR SUBCHANNEL and HALT SUBCHANNEL use this region.
262 
263 command specifies the command to be issued; ret_code stores a return code
264 for each access of the region. The following values may occur:
265 
266 ``0``
267   The operation was successful.
268 
269 ``-ENODEV``
270   The device was found to be not operational.
271 
272 ``-EINVAL``
273   A command other than halt or clear was specified.
274 
275 ``-EIO``
276   A request was issued while the device was not in a state ready to accept
277   requests.
278 
279 ``-EAGAIN``
280   A request was being processed, and the caller should retry.
281 
282 ``-EBUSY``
283   The subchannel was status pending or busy while processing a halt request.
284 
285 vfio-ccw schib region
286 ---------------------
287 
288 The vfio-ccw schib region is used to return Subchannel-Information
289 Block (SCHIB) data to userspace::
290 
291   struct ccw_schib_region {
292   #define SCHIB_AREA_SIZE 52
293          __u8 schib_area[SCHIB_AREA_SIZE];
294   } __packed;
295 
296 This region is exposed via region type VFIO_REGION_SUBTYPE_CCW_SCHIB.
297 
298 Reading this region triggers a STORE SUBCHANNEL to be issued to the
299 associated hardware.
300 
301 vfio-ccw crw region
302 ---------------------
303 
304 The vfio-ccw crw region is used to return Channel Report Word (CRW)
305 data to userspace::
306 
307   struct ccw_crw_region {
308          __u32 crw;
309          __u32 pad;
310   } __packed;
311 
312 This region is exposed via region type VFIO_REGION_SUBTYPE_CCW_CRW.
313 
314 Reading this region returns a CRW if one that is relevant for this
315 subchannel (e.g. one reporting changes in channel path state) is
316 pending, or all zeroes if not. If multiple CRWs are pending (including
317 possibly chained CRWs), reading this region again will return the next
318 one, until no more CRWs are pending and zeroes are returned. This is
319 similar to how STORE CHANNEL REPORT WORD works.
320 
321 vfio-ccw operation details
322 --------------------------
323 
324 vfio-ccw follows what vfio-pci did on the s390 platform and uses
325 vfio-iommu-type1 as the vfio iommu backend.
326 
327 * CCW translation APIs
328   A group of APIs (start with `cp_`) to do CCW translation. The CCWs
329   passed in by a user space program are organized with their guest
330   physical memory addresses. These APIs will copy the CCWs into kernel
331   space, and assemble a runnable kernel channel program by updating the
332   guest physical addresses with their corresponding host physical addresses.
333   Note that we have to use IDALs even for direct-access CCWs, as the
334   referenced memory can be located anywhere, including above 2G.
335 
336 * vfio_ccw device driver
337   This driver utilizes the CCW translation APIs and introduces
338   vfio_ccw, which is the driver for the I/O subchannel devices you want
339   to pass through.
340   vfio_ccw implements the following vfio ioctls::
341 
342     VFIO_DEVICE_GET_INFO
343     VFIO_DEVICE_GET_IRQ_INFO
344     VFIO_DEVICE_GET_REGION_INFO
345     VFIO_DEVICE_RESET
346     VFIO_DEVICE_SET_IRQS
347 
348   This provides an I/O region, so that the user space program can pass a
349   channel program to the kernel, to do further CCW translation before
350   issuing them to a real device.
351   This also provides the SET_IRQ ioctl to setup an event notifier to
352   notify the user space program the I/O completion in an asynchronous
353   way.
354 
355 The use of vfio-ccw is not limited to QEMU, while QEMU is definitely a
356 good example to get understand how these patches work. Here is a little
357 bit more detail how an I/O request triggered by the QEMU guest will be
358 handled (without error handling).
359 
360 Explanation:
361 
362 - Q1-Q7: QEMU side process.
363 - K1-K5: Kernel side process.
364 
365 Q1.
366     Get I/O region info during initialization.
367 
368 Q2.
369     Setup event notifier and handler to handle I/O completion.
370 
371 ... ...
372 
373 Q3.
374     Intercept a ssch instruction.
375 Q4.
376     Write the guest channel program and ORB to the I/O region.
377 
378     K1.
379         Copy from guest to kernel.
380     K2.
381         Translate the guest channel program to a host kernel space
382         channel program, which becomes runnable for a real device.
383     K3.
384         With the necessary information contained in the orb passed in
385         by QEMU, issue the ccwchain to the device.
386     K4.
387         Return the ssch CC code.
388 Q5.
389     Return the CC code to the guest.
390 
391 ... ...
392 
393     K5.
394         Interrupt handler gets the I/O result and write the result to
395         the I/O region.
396     K6.
397         Signal QEMU to retrieve the result.
398 
399 Q6.
400     Get the signal and event handler reads out the result from the I/O
401     region.
402 Q7.
403     Update the irb for the guest.
404 
405 Limitations
406 -----------
407 
408 The current vfio-ccw implementation focuses on supporting basic commands
409 needed to implement block device functionality (read/write) of DASD/ECKD
410 device only. Some commands may need special handling in the future, for
411 example, anything related to path grouping.
412 
413 DASD is a kind of storage device. While ECKD is a data recording format.
414 More information for DASD and ECKD could be found here:
415 https://en.wikipedia.org/wiki/Direct-access_storage_device
416 https://en.wikipedia.org/wiki/Count_key_data
417 
418 Together with the corresponding work in QEMU, we can bring the passed
419 through DASD/ECKD device online in a guest now and use it as a block
420 device.
421 
422 The current code allows the guest to start channel programs via
423 START SUBCHANNEL, and to issue HALT SUBCHANNEL, CLEAR SUBCHANNEL,
424 and STORE SUBCHANNEL.
425 
426 Currently all channel programs are prefetched, regardless of the
427 p-bit setting in the ORB.  As a result, self modifying channel
428 programs are not supported.  For this reason, IPL has to be handled as
429 a special case by a userspace/guest program; this has been implemented
430 in QEMU's s390-ccw bios as of QEMU 4.1.
431 
432 vfio-ccw supports classic (command mode) channel I/O only. Transport
433 mode (HPF) is not supported.
434 
435 QDIO subchannels are currently not supported. Classic devices other than
436 DASD/ECKD might work, but have not been tested.
437 
438 Reference
439 ---------
440 1. ESA/s390 Principles of Operation manual (IBM Form. No. SA22-7832)
441 2. ESA/390 Common I/O Device Commands manual (IBM Form. No. SA22-7204)
442 3. https://en.wikipedia.org/wiki/Channel_I/O
443 4. Documentation/arch/s390/cds.rst
444 5. Documentation/driver-api/vfio.rst
445 6. Documentation/driver-api/vfio-mediated-device.rst

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php