~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/arch/s390/vfio-ccw.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/arch/s390/vfio-ccw.rst (Version linux-6.12-rc7) and /Documentation/arch/m68k/vfio-ccw.rst (Version linux-6.5.13)


  1 ==================================                
  2 vfio-ccw: the basic infrastructure                
  3 ==================================                
  4                                                   
  5 Introduction                                      
  6 ------------                                      
  7                                                   
  8 Here we describe the vfio support for I/O subc    
  9 Linux/s390. Motivation for vfio-ccw is to pass    
 10 virtual machine, while vfio is the means.         
 11                                                   
 12 Different than other hardware architectures, s    
 13 I/O access method, which is so called Channel     
 14 patterns:                                         
 15                                                   
 16 - Channel programs run asynchronously on a sep    
 17 - The channel subsystem will access any memory    
 18   in the channel program directly, i.e. there     
 19                                                   
 20 Thus when we introduce vfio support for these     
 21 with a mediated device (mdev) implementation.     
 22 added to an iommu group, so as to make itself     
 23 vfio framework. And we add read/write callback    
 24 regions to pass the channel programs from the     
 25 (the real I/O subchannel device) to do further    
 26 to perform I/O instructions.                      
 27                                                   
 28 This document does not intend to explain the s    
 29 every detail. More information/reference could    
 30                                                   
 31 - A good start to know Channel I/O in general:    
 32   https://en.wikipedia.org/wiki/Channel_I/O       
 33 - s390 architecture:                              
 34   s390 Principles of Operation manual (IBM For    
 35 - The existing QEMU code which implements a si    
 36   subsystem could also be a good reference. It    
 37   the flow.                                       
 38   qemu/hw/s390x/css.c                             
 39                                                   
 40 For vfio mediated device framework:               
 41 - Documentation/driver-api/vfio-mediated-devic    
 42                                                   
 43 Motivation of vfio-ccw                            
 44 ----------------------                            
 45                                                   
 46 Typically, a guest virtualized via QEMU/KVM on    
 47 paravirtualized virtio devices via the "Virtio    
 48 (virtio-ccw)" transport. This makes virtio dev    
 49 standard operating system algorithms for handl    
 50                                                   
 51 However this is not enough. On s390 for the ma    
 52 use the standard Channel I/O based mechanism,     
 53 the functionality of passing through them to a    
 54 This includes devices that don't have a virtio    
 55 drives) or that have specific characteristics     
 56 exploit.                                          
 57                                                   
 58 For passing a device to a guest, we want to us    
 59 everybody else, namely vfio. We implement this    
 60 devices via the vfio mediated device framework    
 61 driver "vfio_ccw".                                
 62                                                   
 63 Access patterns of CCW devices                    
 64 ------------------------------                    
 65                                                   
 66 s390 architecture has implemented a so called     
 67 provides a unified view of the devices physica    
 68 systems. Though the s390 hardware platform kno    
 69 different peripheral attachments like disk dev    
 70 communication controllers, etc. They can all b    
 71 defined access method and they are presenting     
 72 way: I/O interruptions.                           
 73                                                   
 74 All I/O requires the use of channel command wo    
 75 instruction to a specialized I/O channel proce    
 76 a sequence of CCWs which are executed by the I    
 77 issue a channel program to the channel subsyst    
 78 build an operation request block (ORB), which     
 79 the format of the CCW and other control inform    
 80 operating system signals the I/O channel subsy    
 81 the channel program with a SSCH (start sub-cha    
 82 central processor is then free to proceed with    
 83 until interrupted. The I/O completion result i    
 84 interrupt handler in the form of interrupt res    
 85                                                   
 86 Back to vfio-ccw, in short:                       
 87                                                   
 88 - ORBs and channel programs are built in guest    
 89   physical addresses).                            
 90 - ORBs and channel programs are passed to the     
 91 - Host kernel translates the guest physical ad    
 92   and starts the I/O with issuing a privileged    
 93   (e.g SSCH).                                     
 94 - channel programs run asynchronously on a sep    
 95 - I/O completion will be signaled to the host     
 96   And it will be copied as IRB to user space t    
 97   guest.                                          
 98                                                   
 99 Physical vfio ccw device and its child mdev       
100 -------------------------------------------       
101                                                   
102 As mentioned above, we realize vfio-ccw with a    
103                                                   
104 Channel I/O does not have IOMMU hardware suppo    
105 vfio-ccw device does not have an IOMMU level t    
106                                                   
107 Subchannel I/O instructions are all privileged    
108 handling the I/O instruction interception, vfi    
109 policing and translation how the channel progr    
110 it gets sent to hardware.                         
111                                                   
112 Within this implementation, we have two driver    
113 devices:                                          
114                                                   
115 - The vfio_ccw driver for the physical subchan    
116   This is an I/O subchannel driver for the rea    
117   realizes a group of callbacks and registers     
118   parent (physical) device. As a consequence,     
119   generic interface (sysfs) to create mdev dev    
120   created by vfio_ccw then and added to the me    
121   device that added to an IOMMU group and a vf    
122   vfio_ccw also provides an I/O region to acce    
123   request from user space and store I/O interr    
124   space to retrieve. To notify user space an I    
125   an interface to setup an eventfd fd for asyn    
126                                                   
127 - The vfio_mdev driver for the mediated vfio c    
128   This is provided by the mdev framework. It i    
129   the mdev that created by vfio_ccw.              
130   It realizes a group of vfio device driver ca    
131   vfio group, and registers itself to the mdev    
132   driver.                                         
133   It uses a vfio iommu backend that uses the e    
134   ioctls, but rather than programming them int    
135   it simply stores the translations for use by    
136   means that a device programmed in a VM with     
137   can have the vfio kernel convert that addres    
138   address, pin the page and program the hardwa    
139   address in one step.                            
140   For a mdev, the vfio iommu backend will not     
141   VFIO_IOMMU_MAP_DMA ioctl. Mdev framework wil    
142   of the iova<->vaddr mappings in this operati    
143   vfio_pin_pages and a vfio_unpin_pages interf    
144   backend for the physical devices to pin and     
145                                                   
146 Below is a high Level block diagram::             
147                                                   
148  +-------------+                                  
149  |             |                                  
150  | +---------+ | mdev_register_driver() +-----    
151  | |  Mdev   | +<-----------------------+         
152  | |  bus    | |                        | vfio    
153  | | driver  | +----------------------->+         
154  | +---------+ |    probe()/remove()    +-----    
155  |             |                                  
156  |  MDEV CORE  |                                  
157  |   MODULE    |                                  
158  |   mdev.ko   |                                  
159  | +---------+ | mdev_register_parent() +-----    
160  | |Physical | +<-----------------------+         
161  | | device  | |                        |  vfi    
162  | |interface| +----------------------->+         
163  | +---------+ |       callback         +-----    
164  +-------------+                                  
165                                                   
166 The process of how these work together.           
167                                                   
168 1. vfio_ccw.ko drives the physical I/O subchan    
169    physical device (with callbacks) to mdev fr    
170    When vfio_ccw probing the subchannel device    
171    pointer and callbacks to the mdev framework    
172    under the device node in sysfs would be cre    
173    device, namely 'mdev_create', 'mdev_destroy    
174    'mdev_supported_types'.                        
175 2. Create a mediated vfio ccw device.             
176    Use the 'mdev_create' sysfs file, we need t    
177    only one for our case) mediated device.        
178 3. vfio_mdev.ko drives the mediated ccw device    
179    vfio_mdev is also the vfio device driver. I    
180    add it to an iommu_group and a vfio_group.     
181    the mdev to a guest.                           
182                                                   
183                                                   
184 VFIO-CCW Regions                                  
185 ----------------                                  
186                                                   
187 The vfio-ccw driver exposes MMIO regions to ac    
188 results to userspace.                             
189                                                   
190 vfio-ccw I/O region                               
191 -------------------                               
192                                                   
193 An I/O region is used to accept channel progra    
194 space and store I/O interrupt result for user     
195 definition of the region is::                     
196                                                   
197   struct ccw_io_region {                          
198   #define ORB_AREA_SIZE 12                        
199           __u8    orb_area[ORB_AREA_SIZE];        
200   #define SCSW_AREA_SIZE 12                       
201           __u8    scsw_area[SCSW_AREA_SIZE];      
202   #define IRB_AREA_SIZE 96                        
203           __u8    irb_area[IRB_AREA_SIZE];        
204           __u32   ret_code;                       
205   } __packed;                                     
206                                                   
207 This region is always available.                  
208                                                   
209 While starting an I/O request, orb_area should    
210 guest ORB, and scsw_area should be filled with    
211 Subchannel.                                       
212                                                   
213 irb_area stores the I/O result.                   
214                                                   
215 ret_code stores a return code for each access     
216 values may occur:                                 
217                                                   
218 ``0``                                             
219   The operation was successful.                   
220                                                   
221 ``-EOPNOTSUPP``                                   
222   The ORB specified transport mode or the         
223   SCSW specified a function other than the sta    
224                                                   
225 ``-EIO``                                          
226   A request was issued while the device was no    
227   requests, or an internal error occurred.        
228                                                   
229 ``-EBUSY``                                        
230   The subchannel was status pending or busy, o    
231                                                   
232 ``-EAGAIN``                                       
233   A request was being processed, and the calle    
234                                                   
235 ``-EACCES``                                       
236   The channel path(s) used for the I/O were fo    
237                                                   
238 ``-ENODEV``                                       
239   The device was found to be not operational.     
240                                                   
241 ``-EINVAL``                                       
242   The orb specified a chain longer than 255 cc    
243   occurred.                                       
244                                                   
245                                                   
246 vfio-ccw cmd region                               
247 -------------------                               
248                                                   
249 The vfio-ccw cmd region is used to accept asyn    
250 from userspace::                                  
251                                                   
252   #define VFIO_CCW_ASYNC_CMD_HSCH (1 << 0)        
253   #define VFIO_CCW_ASYNC_CMD_CSCH (1 << 1)        
254   struct ccw_cmd_region {                         
255          __u32 command;                           
256          __u32 ret_code;                          
257   } __packed;                                     
258                                                   
259 This region is exposed via region type VFIO_RE    
260                                                   
261 Currently, CLEAR SUBCHANNEL and HALT SUBCHANNE    
262                                                   
263 command specifies the command to be issued; re    
264 for each access of the region. The following v    
265                                                   
266 ``0``                                             
267   The operation was successful.                   
268                                                   
269 ``-ENODEV``                                       
270   The device was found to be not operational.     
271                                                   
272 ``-EINVAL``                                       
273   A command other than halt or clear was speci    
274                                                   
275 ``-EIO``                                          
276   A request was issued while the device was no    
277   requests.                                       
278                                                   
279 ``-EAGAIN``                                       
280   A request was being processed, and the calle    
281                                                   
282 ``-EBUSY``                                        
283   The subchannel was status pending or busy wh    
284                                                   
285 vfio-ccw schib region                             
286 ---------------------                             
287                                                   
288 The vfio-ccw schib region is used to return Su    
289 Block (SCHIB) data to userspace::                 
290                                                   
291   struct ccw_schib_region {                       
292   #define SCHIB_AREA_SIZE 52                      
293          __u8 schib_area[SCHIB_AREA_SIZE];        
294   } __packed;                                     
295                                                   
296 This region is exposed via region type VFIO_RE    
297                                                   
298 Reading this region triggers a STORE SUBCHANNE    
299 associated hardware.                              
300                                                   
301 vfio-ccw crw region                               
302 ---------------------                             
303                                                   
304 The vfio-ccw crw region is used to return Chan    
305 data to userspace::                               
306                                                   
307   struct ccw_crw_region {                         
308          __u32 crw;                               
309          __u32 pad;                               
310   } __packed;                                     
311                                                   
312 This region is exposed via region type VFIO_RE    
313                                                   
314 Reading this region returns a CRW if one that     
315 subchannel (e.g. one reporting changes in chan    
316 pending, or all zeroes if not. If multiple CRW    
317 possibly chained CRWs), reading this region ag    
318 one, until no more CRWs are pending and zeroes    
319 similar to how STORE CHANNEL REPORT WORD works    
320                                                   
321 vfio-ccw operation details                        
322 --------------------------                        
323                                                   
324 vfio-ccw follows what vfio-pci did on the s390    
325 vfio-iommu-type1 as the vfio iommu backend.       
326                                                   
327 * CCW translation APIs                            
328   A group of APIs (start with `cp_`) to do CCW    
329   passed in by a user space program are organi    
330   physical memory addresses. These APIs will c    
331   space, and assemble a runnable kernel channe    
332   guest physical addresses with their correspo    
333   Note that we have to use IDALs even for dire    
334   referenced memory can be located anywhere, i    
335                                                   
336 * vfio_ccw device driver                          
337   This driver utilizes the CCW translation API    
338   vfio_ccw, which is the driver for the I/O su    
339   to pass through.                                
340   vfio_ccw implements the following vfio ioctl    
341                                                   
342     VFIO_DEVICE_GET_INFO                          
343     VFIO_DEVICE_GET_IRQ_INFO                      
344     VFIO_DEVICE_GET_REGION_INFO                   
345     VFIO_DEVICE_RESET                             
346     VFIO_DEVICE_SET_IRQS                          
347                                                   
348   This provides an I/O region, so that the use    
349   channel program to the kernel, to do further    
350   issuing them to a real device.                  
351   This also provides the SET_IRQ ioctl to setu    
352   notify the user space program the I/O comple    
353   way.                                            
354                                                   
355 The use of vfio-ccw is not limited to QEMU, wh    
356 good example to get understand how these patch    
357 bit more detail how an I/O request triggered b    
358 handled (without error handling).                 
359                                                   
360 Explanation:                                      
361                                                   
362 - Q1-Q7: QEMU side process.                       
363 - K1-K5: Kernel side process.                     
364                                                   
365 Q1.                                               
366     Get I/O region info during initialization.    
367                                                   
368 Q2.                                               
369     Setup event notifier and handler to handle    
370                                                   
371 ... ...                                           
372                                                   
373 Q3.                                               
374     Intercept a ssch instruction.                 
375 Q4.                                               
376     Write the guest channel program and ORB to    
377                                                   
378     K1.                                           
379         Copy from guest to kernel.                
380     K2.                                           
381         Translate the guest channel program to    
382         channel program, which becomes runnabl    
383     K3.                                           
384         With the necessary information contain    
385         by QEMU, issue the ccwchain to the dev    
386     K4.                                           
387         Return the ssch CC code.                  
388 Q5.                                               
389     Return the CC code to the guest.              
390                                                   
391 ... ...                                           
392                                                   
393     K5.                                           
394         Interrupt handler gets the I/O result     
395         the I/O region.                           
396     K6.                                           
397         Signal QEMU to retrieve the result.       
398                                                   
399 Q6.                                               
400     Get the signal and event handler reads out    
401     region.                                       
402 Q7.                                               
403     Update the irb for the guest.                 
404                                                   
405 Limitations                                       
406 -----------                                       
407                                                   
408 The current vfio-ccw implementation focuses on    
409 needed to implement block device functionality    
410 device only. Some commands may need special ha    
411 example, anything related to path grouping.       
412                                                   
413 DASD is a kind of storage device. While ECKD i    
414 More information for DASD and ECKD could be fo    
415 https://en.wikipedia.org/wiki/Direct-access_st    
416 https://en.wikipedia.org/wiki/Count_key_data      
417                                                   
418 Together with the corresponding work in QEMU,     
419 through DASD/ECKD device online in a guest now    
420 device.                                           
421                                                   
422 The current code allows the guest to start cha    
423 START SUBCHANNEL, and to issue HALT SUBCHANNEL    
424 and STORE SUBCHANNEL.                             
425                                                   
426 Currently all channel programs are prefetched,    
427 p-bit setting in the ORB.  As a result, self m    
428 programs are not supported.  For this reason,     
429 a special case by a userspace/guest program; t    
430 in QEMU's s390-ccw bios as of QEMU 4.1.           
431                                                   
432 vfio-ccw supports classic (command mode) chann    
433 mode (HPF) is not supported.                      
434                                                   
435 QDIO subchannels are currently not supported.     
436 DASD/ECKD might work, but have not been tested    
437                                                   
438 Reference                                         
439 ---------                                         
440 1. ESA/s390 Principles of Operation manual (IB    
441 2. ESA/390 Common I/O Device Commands manual (    
442 3. https://en.wikipedia.org/wiki/Channel_I/O      
443 4. Documentation/arch/s390/cds.rst                
444 5. Documentation/driver-api/vfio.rst              
445 6. Documentation/driver-api/vfio-mediated-devi    
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php