1 ================================ 2 Coherent Accelerator (CXL) Flash 3 ================================ 4 5 Introduction 6 ============ 7 8 The IBM Power architecture provides support for CAPI (Coherent 9 Accelerator Power Interface), which is available to certain PCIe slots 10 on Power 8 systems. CAPI can be thought of as a special tunneling 11 protocol through PCIe that allow PCIe adapters to look like special 12 purpose co-processors which can read or write an application's 13 memory and generate page faults. As a result, the host interface to 14 an adapter running in CAPI mode does not require the data buffers to 15 be mapped to the device's memory (IOMMU bypass) nor does it require 16 memory to be pinned. 17 18 On Linux, Coherent Accelerator (CXL) kernel services present CAPI 19 devices as a PCI device by implementing a virtual PCI host bridge. 20 This abstraction simplifies the infrastructure and programming 21 model, allowing for drivers to look similar to other native PCI 22 device drivers. 23 24 CXL provides a mechanism by which user space applications can 25 directly talk to a device (network or storage) bypassing the typical 26 kernel/device driver stack. The CXL Flash Adapter Driver enables a 27 user space application direct access to Flash storage. 28 29 The CXL Flash Adapter Driver is a kernel module that sits in the 30 SCSI stack as a low level device driver (below the SCSI disk and 31 protocol drivers) for the IBM CXL Flash Adapter. This driver is 32 responsible for the initialization of the adapter, setting up the 33 special path for user space access, and performing error recovery. It 34 communicates directly the Flash Accelerator Functional Unit (AFU) 35 as described in Documentation/arch/powerpc/cxl.rst. 36 37 The cxlflash driver supports two, mutually exclusive, modes of 38 operation at the device (LUN) level: 39 40 - Any flash device (LUN) can be configured to be accessed as a 41 regular disk device (i.e.: /dev/sdc). This is the default mode. 42 43 - Any flash device (LUN) can be configured to be accessed from 44 user space with a special block library. This mode further 45 specifies the means of accessing the device and provides for 46 either raw access to the entire LUN (referred to as direct 47 or physical LUN access) or access to a kernel/AFU-mediated 48 partition of the LUN (referred to as virtual LUN access). The 49 segmentation of a disk device into virtual LUNs is assisted 50 by special translation services provided by the Flash AFU. 51 52 Overview 53 ======== 54 55 The Coherent Accelerator Interface Architecture (CAIA) introduces a 56 concept of a master context. A master typically has special privileges 57 granted to it by the kernel or hypervisor allowing it to perform AFU 58 wide management and control. The master may or may not be involved 59 directly in each user I/O, but at the minimum is involved in the 60 initial setup before the user application is allowed to send requests 61 directly to the AFU. 62 63 The CXL Flash Adapter Driver establishes a master context with the 64 AFU. It uses memory mapped I/O (MMIO) for this control and setup. The 65 Adapter Problem Space Memory Map looks like this:: 66 67 +-------------------------------+ 68 | 512 * 64 KB User MMIO | 69 | (per context) | 70 | User Accessible | 71 +-------------------------------+ 72 | 512 * 128 B per context | 73 | Provisioning and Control | 74 | Trusted Process accessible | 75 +-------------------------------+ 76 | 64 KB Global | 77 | Trusted Process accessible | 78 +-------------------------------+ 79 80 This driver configures itself into the SCSI software stack as an 81 adapter driver. The driver is the only entity that is considered a 82 Trusted Process to program the Provisioning and Control and Global 83 areas in the MMIO Space shown above. The master context driver 84 discovers all LUNs attached to the CXL Flash adapter and instantiates 85 scsi block devices (/dev/sdb, /dev/sdc etc.) for each unique LUN 86 seen from each path. 87 88 Once these scsi block devices are instantiated, an application 89 written to a specification provided by the block library may get 90 access to the Flash from user space (without requiring a system call). 91 92 This master context driver also provides a series of ioctls for this 93 block library to enable this user space access. The driver supports 94 two modes for accessing the block device. 95 96 The first mode is called a virtual mode. In this mode a single scsi 97 block device (/dev/sdb) may be carved up into any number of distinct 98 virtual LUNs. The virtual LUNs may be resized as long as the sum of 99 the sizes of all the virtual LUNs, along with the meta-data associated 100 with it does not exceed the physical capacity. 101 102 The second mode is called the physical mode. In this mode a single 103 block device (/dev/sdb) may be opened directly by the block library 104 and the entire space for the LUN is available to the application. 105 106 Only the physical mode provides persistence of the data. i.e. The 107 data written to the block device will survive application exit and 108 restart and also reboot. The virtual LUNs do not persist (i.e. do 109 not survive after the application terminates or the system reboots). 110 111 112 Block library API 113 ================= 114 115 Applications intending to get access to the CXL Flash from user 116 space should use the block library, as it abstracts the details of 117 interfacing directly with the cxlflash driver that are necessary for 118 performing administrative actions (i.e.: setup, tear down, resize). 119 The block library can be thought of as a 'user' of services, 120 implemented as IOCTLs, that are provided by the cxlflash driver 121 specifically for devices (LUNs) operating in user space access 122 mode. While it is not a requirement that applications understand 123 the interface between the block library and the cxlflash driver, 124 a high-level overview of each supported service (IOCTL) is provided 125 below. 126 127 The block library can be found on GitHub: 128 http://github.com/open-power/capiflash 129 130 131 CXL Flash Driver LUN IOCTLs 132 =========================== 133 134 Users, such as the block library, that wish to interface with a flash 135 device (LUN) via user space access need to use the services provided 136 by the cxlflash driver. As these services are implemented as ioctls, 137 a file descriptor handle must first be obtained in order to establish 138 the communication channel between a user and the kernel. This file 139 descriptor is obtained by opening the device special file associated 140 with the scsi disk device (/dev/sdb) that was created during LUN 141 discovery. As per the location of the cxlflash driver within the 142 SCSI protocol stack, this open is actually not seen by the cxlflash 143 driver. Upon successful open, the user receives a file descriptor 144 (herein referred to as fd1) that should be used for issuing the 145 subsequent ioctls listed below. 146 147 The structure definitions for these IOCTLs are available in: 148 uapi/scsi/cxlflash_ioctl.h 149 150 DK_CXLFLASH_ATTACH 151 ------------------ 152 153 This ioctl obtains, initializes, and starts a context using the CXL 154 kernel services. These services specify a context id (u16) by which 155 to uniquely identify the context and its allocated resources. The 156 services additionally provide a second file descriptor (herein 157 referred to as fd2) that is used by the block library to initiate 158 memory mapped I/O (via mmap()) to the CXL flash device and poll for 159 completion events. This file descriptor is intentionally installed by 160 this driver and not the CXL kernel services to allow for intermediary 161 notification and access in the event of a non-user-initiated close(), 162 such as a killed process. This design point is described in further 163 detail in the description for the DK_CXLFLASH_DETACH ioctl. 164 165 There are a few important aspects regarding the "tokens" (context id 166 and fd2) that are provided back to the user: 167 168 - These tokens are only valid for the process under which they 169 were created. The child of a forked process cannot continue 170 to use the context id or file descriptor created by its parent 171 (see DK_CXLFLASH_VLUN_CLONE for further details). 172 173 - These tokens are only valid for the lifetime of the context and 174 the process under which they were created. Once either is 175 destroyed, the tokens are to be considered stale and subsequent 176 usage will result in errors. 177 178 - A valid adapter file descriptor (fd2 >= 0) is only returned on 179 the initial attach for a context. Subsequent attaches to an 180 existing context (DK_CXLFLASH_ATTACH_REUSE_CONTEXT flag present) 181 do not provide the adapter file descriptor as it was previously 182 made known to the application. 183 184 - When a context is no longer needed, the user shall detach from 185 the context via the DK_CXLFLASH_DETACH ioctl. When this ioctl 186 returns with a valid adapter file descriptor and the return flag 187 DK_CXLFLASH_APP_CLOSE_ADAP_FD is present, the application _must_ 188 close the adapter file descriptor following a successful detach. 189 190 - When this ioctl returns with a valid fd2 and the return flag 191 DK_CXLFLASH_APP_CLOSE_ADAP_FD is present, the application _must_ 192 close fd2 in the following circumstances: 193 194 + Following a successful detach of the last user of the context 195 + Following a successful recovery on the context's original fd2 196 + In the child process of a fork(), following a clone ioctl, 197 on the fd2 associated with the source context 198 199 - At any time, a close on fd2 will invalidate the tokens. Applications 200 should exercise caution to only close fd2 when appropriate (outlined 201 in the previous bullet) to avoid premature loss of I/O. 202 203 DK_CXLFLASH_USER_DIRECT 204 ----------------------- 205 This ioctl is responsible for transitioning the LUN to direct 206 (physical) mode access and configuring the AFU for direct access from 207 user space on a per-context basis. Additionally, the block size and 208 last logical block address (LBA) are returned to the user. 209 210 As mentioned previously, when operating in user space access mode, 211 LUNs may be accessed in whole or in part. Only one mode is allowed 212 at a time and if one mode is active (outstanding references exist), 213 requests to use the LUN in a different mode are denied. 214 215 The AFU is configured for direct access from user space by adding an 216 entry to the AFU's resource handle table. The index of the entry is 217 treated as a resource handle that is returned to the user. The user 218 is then able to use the handle to reference the LUN during I/O. 219 220 DK_CXLFLASH_USER_VIRTUAL 221 ------------------------ 222 This ioctl is responsible for transitioning the LUN to virtual mode 223 of access and configuring the AFU for virtual access from user space 224 on a per-context basis. Additionally, the block size and last logical 225 block address (LBA) are returned to the user. 226 227 As mentioned previously, when operating in user space access mode, 228 LUNs may be accessed in whole or in part. Only one mode is allowed 229 at a time and if one mode is active (outstanding references exist), 230 requests to use the LUN in a different mode are denied. 231 232 The AFU is configured for virtual access from user space by adding 233 an entry to the AFU's resource handle table. The index of the entry 234 is treated as a resource handle that is returned to the user. The 235 user is then able to use the handle to reference the LUN during I/O. 236 237 By default, the virtual LUN is created with a size of 0. The user 238 would need to use the DK_CXLFLASH_VLUN_RESIZE ioctl to adjust the grow 239 the virtual LUN to a desired size. To avoid having to perform this 240 resize for the initial creation of the virtual LUN, the user has the 241 option of specifying a size as part of the DK_CXLFLASH_USER_VIRTUAL 242 ioctl, such that when success is returned to the user, the 243 resource handle that is provided is already referencing provisioned 244 storage. This is reflected by the last LBA being a non-zero value. 245 246 When a LUN is accessible from more than one port, this ioctl will 247 return with the DK_CXLFLASH_ALL_PORTS_ACTIVE return flag set. This 248 provides the user with a hint that I/O can be retried in the event 249 of an I/O error as the LUN can be reached over multiple paths. 250 251 DK_CXLFLASH_VLUN_RESIZE 252 ----------------------- 253 This ioctl is responsible for resizing a previously created virtual 254 LUN and will fail if invoked upon a LUN that is not in virtual 255 mode. Upon success, an updated last LBA is returned to the user 256 indicating the new size of the virtual LUN associated with the 257 resource handle. 258 259 The partitioning of virtual LUNs is jointly mediated by the cxlflash 260 driver and the AFU. An allocation table is kept for each LUN that is 261 operating in the virtual mode and used to program a LUN translation 262 table that the AFU references when provided with a resource handle. 263 264 This ioctl can return -EAGAIN if an AFU sync operation takes too long. 265 In addition to returning a failure to user, cxlflash will also schedule 266 an asynchronous AFU reset. Should the user choose to retry the operation, 267 it is expected to succeed. If this ioctl fails with -EAGAIN, the user 268 can either retry the operation or treat it as a failure. 269 270 DK_CXLFLASH_RELEASE 271 ------------------- 272 This ioctl is responsible for releasing a previously obtained 273 reference to either a physical or virtual LUN. This can be 274 thought of as the inverse of the DK_CXLFLASH_USER_DIRECT or 275 DK_CXLFLASH_USER_VIRTUAL ioctls. Upon success, the resource handle 276 is no longer valid and the entry in the resource handle table is 277 made available to be used again. 278 279 As part of the release process for virtual LUNs, the virtual LUN 280 is first resized to 0 to clear out and free the translation tables 281 associated with the virtual LUN reference. 282 283 DK_CXLFLASH_DETACH 284 ------------------ 285 This ioctl is responsible for unregistering a context with the 286 cxlflash driver and release outstanding resources that were 287 not explicitly released via the DK_CXLFLASH_RELEASE ioctl. Upon 288 success, all "tokens" which had been provided to the user from the 289 DK_CXLFLASH_ATTACH onward are no longer valid. 290 291 When the DK_CXLFLASH_APP_CLOSE_ADAP_FD flag was returned on a successful 292 attach, the application _must_ close the fd2 associated with the context 293 following the detach of the final user of the context. 294 295 DK_CXLFLASH_VLUN_CLONE 296 ---------------------- 297 This ioctl is responsible for cloning a previously created 298 context to a more recently created context. It exists solely to 299 support maintaining user space access to storage after a process 300 forks. Upon success, the child process (which invoked the ioctl) 301 will have access to the same LUNs via the same resource handle(s) 302 as the parent, but under a different context. 303 304 Context sharing across processes is not supported with CXL and 305 therefore each fork must be met with establishing a new context 306 for the child process. This ioctl simplifies the state management 307 and playback required by a user in such a scenario. When a process 308 forks, child process can clone the parents context by first creating 309 a context (via DK_CXLFLASH_ATTACH) and then using this ioctl to 310 perform the clone from the parent to the child. 311 312 The clone itself is fairly simple. The resource handle and lun 313 translation tables are copied from the parent context to the child's 314 and then synced with the AFU. 315 316 When the DK_CXLFLASH_APP_CLOSE_ADAP_FD flag was returned on a successful 317 attach, the application _must_ close the fd2 associated with the source 318 context (still resident/accessible in the parent process) following the 319 clone. This is to avoid a stale entry in the file descriptor table of the 320 child process. 321 322 This ioctl can return -EAGAIN if an AFU sync operation takes too long. 323 In addition to returning a failure to user, cxlflash will also schedule 324 an asynchronous AFU reset. Should the user choose to retry the operation, 325 it is expected to succeed. If this ioctl fails with -EAGAIN, the user 326 can either retry the operation or treat it as a failure. 327 328 DK_CXLFLASH_VERIFY 329 ------------------ 330 This ioctl is used to detect various changes such as the capacity of 331 the disk changing, the number of LUNs visible changing, etc. In cases 332 where the changes affect the application (such as a LUN resize), the 333 cxlflash driver will report the changed state to the application. 334 335 The user calls in when they want to validate that a LUN hasn't been 336 changed in response to a check condition. As the user is operating out 337 of band from the kernel, they will see these types of events without 338 the kernel's knowledge. When encountered, the user's architected 339 behavior is to call in to this ioctl, indicating what they want to 340 verify and passing along any appropriate information. For now, only 341 verifying a LUN change (ie: size different) with sense data is 342 supported. 343 344 DK_CXLFLASH_RECOVER_AFU 345 ----------------------- 346 This ioctl is used to drive recovery (if such an action is warranted) 347 of a specified user context. Any state associated with the user context 348 is re-established upon successful recovery. 349 350 User contexts are put into an error condition when the device needs to 351 be reset or is terminating. Users are notified of this error condition 352 by seeing all 0xF's on an MMIO read. Upon encountering this, the 353 architected behavior for a user is to call into this ioctl to recover 354 their context. A user may also call into this ioctl at any time to 355 check if the device is operating normally. If a failure is returned 356 from this ioctl, the user is expected to gracefully clean up their 357 context via release/detach ioctls. Until they do, the context they 358 hold is not relinquished. The user may also optionally exit the process 359 at which time the context/resources they held will be freed as part of 360 the release fop. 361 362 When the DK_CXLFLASH_APP_CLOSE_ADAP_FD flag was returned on a successful 363 attach, the application _must_ unmap and close the fd2 associated with the 364 original context following this ioctl returning success and indicating that 365 the context was recovered (DK_CXLFLASH_RECOVER_AFU_CONTEXT_RESET). 366 367 DK_CXLFLASH_MANAGE_LUN 368 ---------------------- 369 This ioctl is used to switch a LUN from a mode where it is available 370 for file-system access (legacy), to a mode where it is set aside for 371 exclusive user space access (superpipe). In case a LUN is visible 372 across multiple ports and adapters, this ioctl is used to uniquely 373 identify each LUN by its World Wide Node Name (WWNN). 374 375 376 CXL Flash Driver Host IOCTLs 377 ============================ 378 379 Each host adapter instance that is supported by the cxlflash driver 380 has a special character device associated with it to enable a set of 381 host management function. These character devices are hosted in a 382 class dedicated for cxlflash and can be accessed via `/dev/cxlflash/*`. 383 384 Applications can be written to perform various functions using the 385 host ioctl APIs below. 386 387 The structure definitions for these IOCTLs are available in: 388 uapi/scsi/cxlflash_ioctl.h 389 390 HT_CXLFLASH_LUN_PROVISION 391 ------------------------- 392 This ioctl is used to create and delete persistent LUNs on cxlflash 393 devices that lack an external LUN management interface. It is only 394 valid when used with AFUs that support the LUN provision capability. 395 396 When sufficient space is available, LUNs can be created by specifying 397 the target port to host the LUN and a desired size in 4K blocks. Upon 398 success, the LUN ID and WWID of the created LUN will be returned and 399 the SCSI bus can be scanned to detect the change in LUN topology. Note 400 that partial allocations are not supported. Should a creation fail due 401 to a space issue, the target port can be queried for its current LUN 402 geometry. 403 404 To remove a LUN, the device must first be disassociated from the Linux 405 SCSI subsystem. The LUN deletion can then be initiated by specifying a 406 target port and LUN ID. Upon success, the LUN geometry associated with 407 the port will be updated to reflect new number of provisioned LUNs and 408 available capacity. 409 410 To query the LUN geometry of a port, the target port is specified and 411 upon success, the following information is presented: 412 413 - Maximum number of provisioned LUNs allowed for the port 414 - Current number of provisioned LUNs for the port 415 - Maximum total capacity of provisioned LUNs for the port (4K blocks) 416 - Current total capacity of provisioned LUNs for the port (4K blocks) 417 418 With this information, the number of available LUNs and capacity can be 419 can be calculated. 420 421 HT_CXLFLASH_AFU_DEBUG 422 --------------------- 423 This ioctl is used to debug AFUs by supporting a command pass-through 424 interface. It is only valid when used with AFUs that support the AFU 425 debug capability. 426 427 With exception of buffer management, AFU debug commands are opaque to 428 cxlflash and treated as pass-through. For debug commands that do require 429 data transfer, the user supplies an adequately sized data buffer and must 430 specify the data transfer direction with respect to the host. There is a 431 maximum transfer size of 256K imposed. Note that partial read completions 432 are not supported - when errors are experienced with a host read data 433 transfer, the data buffer is not copied back to the user.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.