1 ==================== 2 TCM Userspace Design 3 ==================== 4 5 6 .. Contents: 7 8 1) Design 9 a) Background 10 b) Benefits 11 c) Design constraints 12 d) Implementation overview 13 i. Mailbox 14 ii. Command ring 15 iii. Data Area 16 e) Device discovery 17 f) Device events 18 g) Other contingencies 19 2) Writing a user pass-through handler 20 a) Discovering and configuring TCMU uio d 21 b) Waiting for events on the device(s) 22 c) Managing the command ring 23 3) A final note 24 25 26 Design 27 ====== 28 29 TCM is another name for LIO, an in-kernel iSCS 30 Existing TCM targets run in the kernel. TCMU 31 allows userspace programs to be written which 32 This document describes the design. 33 34 The existing kernel provides modules for diffe 35 protocols. TCM also modularizes the data stor 36 modules for file, block device, RAM or using a 37 storage. These are called "backstores" or "st 38 built-in modules are implemented entirely as k 39 40 Background 41 ---------- 42 43 In addition to modularizing the transport prot 44 SCSI commands ("fabrics"), the Linux kernel ta 45 the actual data storage as well. These are ref 46 or "storage engines". The target comes with ba 47 file, a block device, RAM, or another SCSI dev 48 local storage needed for the exported SCSI LUN 49 these are implemented entirely as kernel code. 50 51 These backstores cover the most common use cas 52 use case that other non-kernel target solution 53 to support is using Gluster's GLFS or Ceph's R 54 target then serves as a translator, allowing i 55 in these non-traditional networked storage sys 56 using standard protocols themselves. 57 58 If the target is a userspace process, supporti 59 for example, needs only a small adapter module 60 modules just use the available userspace libra 61 62 Adding support for these backstores in LIO is 63 difficult, because LIO is entirely kernel code 64 the significant work to port the GLFS or RBD A 65 kernel, another approach is to create a usersp 66 backstore for LIO, "TCMU". 67 68 69 Benefits 70 -------- 71 72 In addition to allowing relatively easy suppor 73 will also allow easier development of new back 74 with the LIO loopback fabric to become somethi 75 (Filesystem in Userspace), but at the SCSI lay 76 filesystem layer. A SUSE, if you will. 77 78 The disadvantage is there are more distinct co 79 potentially to malfunction. This is unavoidabl 80 fatal if we're careful to keep things as simpl 81 82 Design constraints 83 ------------------ 84 85 - Good performance: high throughput, low laten 86 - Cleanly handle if userspace: 87 88 1) never attaches 89 2) hangs 90 3) dies 91 4) misbehaves 92 93 - Allow future flexibility in user & kernel im 94 - Be reasonably memory-efficient 95 - Simple to configure & run 96 - Simple to write a userspace backend 97 98 99 Implementation overview 100 ----------------------- 101 102 The core of the TCMU interface is a memory reg 103 between kernel and userspace. Within this regi 104 (mailbox); a lockless producer/consumer circul 105 to be passed up, and status returned; and an i 106 107 TCMU uses the pre-existing UIO subsystem. UIO 108 development in userspace, and this is conceptu 109 TCMU use case, except instead of a physical de 110 memory-mapped layout designed for SCSI command 111 benefits TCMU by handling device introspection 112 userspace to determine how large the shared re 113 mechanisms in both directions. 114 115 There are no embedded pointers in the memory r 116 expressed as an offset from the region's start 117 the ring to still work if the user process die 118 the region mapped at a different virtual addre 119 120 See target_core_user.h for the struct definiti 121 122 The Mailbox 123 ----------- 124 125 The mailbox is always at the start of the shar 126 contains a version, details about the starting 127 command ring, and head and tail pointers to be 128 userspace (respectively) to put commands on th 129 when the commands are completed. 130 131 version - 1 (userspace should abort if otherwi 132 133 flags: 134 - TCMU_MAILBOX_FLAG_CAP_OOOC: 135 indicates out-of-order completion is s 136 See "The Command Ring" for details. 137 138 cmdr_off 139 The offset of the start of the command 140 of the memory region, to account for t 141 cmdr_size 142 The size of the command ring. This doe 143 power of two. 144 cmd_head 145 Modified by the kernel to indicate whe 146 placed on the ring. 147 cmd_tail 148 Modified by userspace to indicate when 149 processing of a command. 150 151 The Command Ring 152 ---------------- 153 154 Commands are placed on the ring by the kernel 155 mailbox.cmd_head by the size of the command, m 156 then signaling userspace via uio_event_notify( 157 completed, userspace updates mailbox.cmd_tail 158 signals the kernel via a 4-byte write(). When 159 cmd_tail, the ring is empty -- no commands are 160 processed by userspace. 161 162 TCMU commands are 8-byte aligned. They start w 163 containing "len_op", a 32-bit value that store 164 the opcode in the lowest unused bits. It also 165 flags fields for setting by the kernel (kflags 166 (uflags). 167 168 Currently only two opcodes are defined, TCMU_O 169 170 When the opcode is CMD, the entry in the comma 171 tcmu_cmd_entry. Userspace finds the SCSI CDB ( 172 tcmu_cmd_entry.req.cdb_off. This is an offset 173 overall shared memory region, not the entry. T 174 are accessible via the req.iov[] array. iov_cn 175 entries in iov[] needed to describe either the 176 buffers. For bidirectional commands, iov_cnt s 177 entries cover the Data-Out area, and iov_bidi_ 178 iovec entries immediately after that in iov[] 179 area. Just like other fields, iov.iov_base is 180 of the region. 181 182 When completing a command, userspace sets rsp. 183 rsp.sense_buffer if necessary. Userspace then 184 mailbox.cmd_tail by entry.hdr.length (mod cmdr 185 kernel via the UIO method, a 4-byte write to t 186 187 If TCMU_MAILBOX_FLAG_CAP_OOOC is set for mailb 188 capable of handling out-of-order completions. 189 handle command in different order other than o 190 still process the commands in the same order i 191 ring, userspace need to update the cmd->id whe 192 command(a.k.a steal the original command's ent 193 194 When the opcode is PAD, userspace only updates 195 it's a no-op. (The kernel inserts PAD entries 196 is contiguous within the command ring.) 197 198 More opcodes may be added in the future. If us 199 opcode it does not handle, it must set UNKNOWN 200 hdr.uflags, update cmd_tail, and proceed with 201 commands, if any. 202 203 The Data Area 204 ------------- 205 206 This is shared-memory space after the command 207 of this area is not defined in the TCMU interf 208 should access only the parts referenced by pen 209 210 211 Device Discovery 212 ---------------- 213 214 Other devices may be using UIO besides TCMU. U 215 may also be handling different sets of TCMU de 216 processes must find their devices by scanning 217 class/uio/uio*/name. For TCMU devices, these n 218 format:: 219 220 tcm-user/<hba_num>/<device_name>/<subt 221 222 where "tcm-user" is common for all TCMU-backed 223 and <device_name> allow userspace to find the 224 kernel target's configfs tree. Assuming the us 225 found at:: 226 227 /sys/kernel/config/target/core/user_<h 228 229 This location contains attributes such as "hw_ 230 userspace needs to know for correct operation. 231 232 <subtype> will be a userspace-process-unique s 233 TCMU device as expecting to be backed by a cer 234 will be an additional handler-specific string 235 configure the device, if needed. The name cann 236 LIO limitations. 237 238 For all devices so discovered, the user handle 239 calls mmap():: 240 241 mmap(NULL, size, PROT_READ|PROT_WRITE, 242 243 where size must be equal to the value read fro 244 /sys/class/uio/uioX/maps/map0/size. 245 246 247 Device Events 248 ------------- 249 250 If a new device is added or removed, a notific 251 over netlink, using a generic netlink family n 252 multicast group named "config". This will incl 253 described in the previous section, as well as 254 number. This should allow userspace to identif 255 the LIO device, so that after determining the 256 (based on subtype) it can take the appropriate 257 258 259 Other contingencies 260 ------------------- 261 262 Userspace handler process never attaches: 263 264 - TCMU will post commands, and then abort them 265 (30 seconds.) 266 267 Userspace handler process is killed: 268 269 - It is still possible to restart and re-conne 270 devices. Command ring is preserved. However, 271 the kernel will abort pending tasks. 272 273 Userspace handler process hangs: 274 275 - The kernel will abort pending tasks after a 276 277 Userspace handler process is malicious: 278 279 - The process can trivially break the handling 280 but should not be able to access kernel memo 281 memory areas. 282 283 284 Writing a user pass-through handler (with exam 285 ============================================== 286 287 A user process handing a TCMU device must supp 288 289 a) Discovering and configuring TCMU uio device 290 b) Waiting for events on the device(s) 291 c) Managing the command ring: Parsing operatio 292 performing work as needed, setting response 293 possibly sense_buffer), updating cmd_tail, 294 that work has been finished 295 296 First, consider instead writing a plugin for t 297 implements all of this, and provides a higher- 298 authors. 299 300 TCMU is designed so that multiple unrelated pr 301 devices separately. All handlers should make s 302 devices, based opon a known subtype string. 303 304 a) Discovering and configuring TCMU UIO device 305 306 /* error checking omitted for brevity */ 307 308 int fd, dev_fd; 309 char buf[256]; 310 unsigned long long map_len; 311 void *map; 312 313 fd = open("/sys/class/uio/uio0/name", O_ 314 ret = read(fd, buf, sizeof(buf)); 315 close(fd); 316 buf[ret-1] = '\0'; /* null-terminate and 317 318 /* we only want uio devices whose name i 319 if (strncmp(buf, "tcm-user", 8)) 320 exit(-1); 321 322 /* Further checking for subtype also nee 323 324 fd = open(/sys/class/uio/%s/maps/map0/si 325 ret = read(fd, buf, sizeof(buf)); 326 close(fd); 327 str_buf[ret-1] = '\0'; /* null-terminate 328 329 map_len = strtoull(buf, NULL, 0); 330 331 dev_fd = open("/dev/uio0", O_RDWR); 332 map = mmap(NULL, map_len, PROT_READ|PROT 333 334 335 b) Waiting for events on the device(s) 336 337 while (1) { 338 char buf[4]; 339 340 int ret = read(dev_fd, buf, 4); /* wil 341 342 handle_device_events(dev_fd, map); 343 } 344 345 346 c) Managing the command ring:: 347 348 #include <linux/target_core_user.h> 349 350 int handle_device_events(int fd, void *m 351 { 352 struct tcmu_mailbox *mb = map; 353 struct tcmu_cmd_entry *ent = (void *) 354 int did_some_work = 0; 355 356 /* Process events from cmd ring until 357 while (ent != (void *)mb + mb->cmdr_of 358 359 if (tcmu_hdr_get_op(ent->hdr.len_op) 360 uint8_t *cdb = (void *)mb + ent->r 361 bool success = true; 362 363 /* Handle command here. */ 364 printf("SCSI opcode: 0x%x\n", cdb[ 365 366 /* Set response fields */ 367 if (success) 368 ent->rsp.scsi_status = SCSI_NO_S 369 else { 370 /* Also fill in rsp->sense_buffe 371 ent->rsp.scsi_status = SCSI_CHEC 372 } 373 } 374 else if (tcmu_hdr_get_op(ent->hdr.le 375 /* Tell the kernel we didn't handl 376 ent->hdr.uflags |= TCMU_UFLAG_UNKN 377 } 378 else { 379 /* Do nothing for PAD entries exce 380 } 381 382 /* update cmd_tail */ 383 mb->cmd_tail = (mb->cmd_tail + tcmu_ 384 ent = (void *) mb + mb->cmdr_off + m 385 did_some_work = 1; 386 } 387 388 /* Notify the kernel that work has bee 389 if (did_some_work) { 390 uint32_t buf = 0; 391 392 write(fd, &buf, 4); 393 } 394 395 return 0; 396 } 397 398 399 A final note 400 ============ 401 402 Please be careful to return codes as defined b 403 specifications. These are different than some 404 scsi/scsi.h include file. For example, CHECK C 405 is 2, not 1.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.