~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/target/tcmu-design.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/target/tcmu-design.rst (Version linux-6.12-rc7) and /Documentation/target/tcmu-design.rst (Version linux-4.9.337)


  1 ====================                              
  2 TCM Userspace Design                              
  3 ====================                              
  4                                                   
  5                                                   
  6 .. Contents:                                      
  7                                                   
  8    1) Design                                      
  9      a) Background                                
 10      b) Benefits                                  
 11      c) Design constraints                        
 12      d) Implementation overview                   
 13         i. Mailbox                                
 14         ii. Command ring                          
 15         iii. Data Area                            
 16      e) Device discovery                          
 17      f) Device events                             
 18      g) Other contingencies                       
 19    2) Writing a user pass-through handler         
 20      a) Discovering and configuring TCMU uio d    
 21      b) Waiting for events on the device(s)       
 22      c) Managing the command ring                 
 23    3) A final note                                
 24                                                   
 25                                                   
 26 Design                                            
 27 ======                                            
 28                                                   
 29 TCM is another name for LIO, an in-kernel iSCS    
 30 Existing TCM targets run in the kernel.  TCMU     
 31 allows userspace programs to be written which     
 32 This document describes the design.               
 33                                                   
 34 The existing kernel provides modules for diffe    
 35 protocols.  TCM also modularizes the data stor    
 36 modules for file, block device, RAM or using a    
 37 storage.  These are called "backstores" or "st    
 38 built-in modules are implemented entirely as k    
 39                                                   
 40 Background                                        
 41 ----------                                        
 42                                                   
 43 In addition to modularizing the transport prot    
 44 SCSI commands ("fabrics"), the Linux kernel ta    
 45 the actual data storage as well. These are ref    
 46 or "storage engines". The target comes with ba    
 47 file, a block device, RAM, or another SCSI dev    
 48 local storage needed for the exported SCSI LUN    
 49 these are implemented entirely as kernel code.    
 50                                                   
 51 These backstores cover the most common use cas    
 52 use case that other non-kernel target solution    
 53 to support is using Gluster's GLFS or Ceph's R    
 54 target then serves as a translator, allowing i    
 55 in these non-traditional networked storage sys    
 56 using standard protocols themselves.              
 57                                                   
 58 If the target is a userspace process, supporti    
 59 for example, needs only a small adapter module    
 60 modules just use the available userspace libra    
 61                                                   
 62 Adding support for these backstores in LIO is     
 63 difficult, because LIO is entirely kernel code    
 64 the significant work to port the GLFS or RBD A    
 65 kernel, another approach is to create a usersp    
 66 backstore for LIO, "TCMU".                        
 67                                                   
 68                                                   
 69 Benefits                                          
 70 --------                                          
 71                                                   
 72 In addition to allowing relatively easy suppor    
 73 will also allow easier development of new back    
 74 with the LIO loopback fabric to become somethi    
 75 (Filesystem in Userspace), but at the SCSI lay    
 76 filesystem layer. A SUSE, if you will.            
 77                                                   
 78 The disadvantage is there are more distinct co    
 79 potentially to malfunction. This is unavoidabl    
 80 fatal if we're careful to keep things as simpl    
 81                                                   
 82 Design constraints                                
 83 ------------------                                
 84                                                   
 85 - Good performance: high throughput, low laten    
 86 - Cleanly handle if userspace:                    
 87                                                   
 88    1) never attaches                              
 89    2) hangs                                       
 90    3) dies                                        
 91    4) misbehaves                                  
 92                                                   
 93 - Allow future flexibility in user & kernel im    
 94 - Be reasonably memory-efficient                  
 95 - Simple to configure & run                       
 96 - Simple to write a userspace backend             
 97                                                   
 98                                                   
 99 Implementation overview                           
100 -----------------------                           
101                                                   
102 The core of the TCMU interface is a memory reg    
103 between kernel and userspace. Within this regi    
104 (mailbox); a lockless producer/consumer circul    
105 to be passed up, and status returned; and an i    
106                                                   
107 TCMU uses the pre-existing UIO subsystem. UIO     
108 development in userspace, and this is conceptu    
109 TCMU use case, except instead of a physical de    
110 memory-mapped layout designed for SCSI command    
111 benefits TCMU by handling device introspection    
112 userspace to determine how large the shared re    
113 mechanisms in both directions.                    
114                                                   
115 There are no embedded pointers in the memory r    
116 expressed as an offset from the region's start    
117 the ring to still work if the user process die    
118 the region mapped at a different virtual addre    
119                                                   
120 See target_core_user.h for the struct definiti    
121                                                   
122 The Mailbox                                       
123 -----------                                       
124                                                   
125 The mailbox is always at the start of the shar    
126 contains a version, details about the starting    
127 command ring, and head and tail pointers to be    
128 userspace (respectively) to put commands on th    
129 when the commands are completed.                  
130                                                   
131 version - 1 (userspace should abort if otherwi    
132                                                   
133 flags:                                            
134     - TCMU_MAILBOX_FLAG_CAP_OOOC:                 
135         indicates out-of-order completion is s    
136         See "The Command Ring" for details.       
137                                                   
138 cmdr_off                                          
139         The offset of the start of the command    
140         of the memory region, to account for t    
141 cmdr_size                                         
142         The size of the command ring. This doe    
143         power of two.                             
144 cmd_head                                          
145         Modified by the kernel to indicate whe    
146         placed on the ring.                       
147 cmd_tail                                          
148         Modified by userspace to indicate when    
149         processing of a command.                  
150                                                   
151 The Command Ring                                  
152 ----------------                                  
153                                                   
154 Commands are placed on the ring by the kernel     
155 mailbox.cmd_head by the size of the command, m    
156 then signaling userspace via uio_event_notify(    
157 completed, userspace updates mailbox.cmd_tail     
158 signals the kernel via a 4-byte write(). When     
159 cmd_tail, the ring is empty -- no commands are    
160 processed by userspace.                           
161                                                   
162 TCMU commands are 8-byte aligned. They start w    
163 containing "len_op", a 32-bit value that store    
164 the opcode in the lowest unused bits. It also     
165 flags fields for setting by the kernel (kflags    
166 (uflags).                                         
167                                                   
168 Currently only two opcodes are defined, TCMU_O    
169                                                   
170 When the opcode is CMD, the entry in the comma    
171 tcmu_cmd_entry. Userspace finds the SCSI CDB (    
172 tcmu_cmd_entry.req.cdb_off. This is an offset     
173 overall shared memory region, not the entry. T    
174 are accessible via the req.iov[] array. iov_cn    
175 entries in iov[] needed to describe either the    
176 buffers. For bidirectional commands, iov_cnt s    
177 entries cover the Data-Out area, and iov_bidi_    
178 iovec entries immediately after that in iov[]     
179 area. Just like other fields, iov.iov_base is     
180 of the region.                                    
181                                                   
182 When completing a command, userspace sets rsp.    
183 rsp.sense_buffer if necessary. Userspace then     
184 mailbox.cmd_tail by entry.hdr.length (mod cmdr    
185 kernel via the UIO method, a 4-byte write to t    
186                                                   
187 If TCMU_MAILBOX_FLAG_CAP_OOOC is set for mailb    
188 capable of handling out-of-order completions.     
189 handle command in different order other than o    
190 still process the commands in the same order i    
191 ring, userspace need to update the cmd->id whe    
192 command(a.k.a steal the original command's ent    
193                                                   
194 When the opcode is PAD, userspace only updates    
195 it's a no-op. (The kernel inserts PAD entries     
196 is contiguous within the command ring.)           
197                                                   
198 More opcodes may be added in the future. If us    
199 opcode it does not handle, it must set UNKNOWN    
200 hdr.uflags, update cmd_tail, and proceed with     
201 commands, if any.                                 
202                                                   
203 The Data Area                                     
204 -------------                                     
205                                                   
206 This is shared-memory space after the command     
207 of this area is not defined in the TCMU interf    
208 should access only the parts referenced by pen    
209                                                   
210                                                   
211 Device Discovery                                  
212 ----------------                                  
213                                                   
214 Other devices may be using UIO besides TCMU. U    
215 may also be handling different sets of TCMU de    
216 processes must find their devices by scanning     
217 class/uio/uio*/name. For TCMU devices, these n    
218 format::                                          
219                                                   
220         tcm-user/<hba_num>/<device_name>/<subt    
221                                                   
222 where "tcm-user" is common for all TCMU-backed    
223 and <device_name> allow userspace to find the     
224 kernel target's configfs tree. Assuming the us    
225 found at::                                        
226                                                   
227         /sys/kernel/config/target/core/user_<h    
228                                                   
229 This location contains attributes such as "hw_    
230 userspace needs to know for correct operation.    
231                                                   
232 <subtype> will be a userspace-process-unique s    
233 TCMU device as expecting to be backed by a cer    
234 will be an additional handler-specific string     
235 configure the device, if needed. The name cann    
236 LIO limitations.                                  
237                                                   
238 For all devices so discovered, the user handle    
239 calls mmap()::                                    
240                                                   
241         mmap(NULL, size, PROT_READ|PROT_WRITE,    
242                                                   
243 where size must be equal to the value read fro    
244 /sys/class/uio/uioX/maps/map0/size.               
245                                                   
246                                                   
247 Device Events                                     
248 -------------                                     
249                                                   
250 If a new device is added or removed, a notific    
251 over netlink, using a generic netlink family n    
252 multicast group named "config". This will incl    
253 described in the previous section, as well as     
254 number. This should allow userspace to identif    
255 the LIO device, so that after determining the     
256 (based on subtype) it can take the appropriate    
257                                                   
258                                                   
259 Other contingencies                               
260 -------------------                               
261                                                   
262 Userspace handler process never attaches:         
263                                                   
264 - TCMU will post commands, and then abort them    
265   (30 seconds.)                                   
266                                                   
267 Userspace handler process is killed:              
268                                                   
269 - It is still possible to restart and re-conne    
270   devices. Command ring is preserved. However,    
271   the kernel will abort pending tasks.            
272                                                   
273 Userspace handler process hangs:                  
274                                                   
275 - The kernel will abort pending tasks after a     
276                                                   
277 Userspace handler process is malicious:           
278                                                   
279 - The process can trivially break the handling    
280   but should not be able to access kernel memo    
281   memory areas.                                   
282                                                   
283                                                   
284 Writing a user pass-through handler (with exam    
285 ==============================================    
286                                                   
287 A user process handing a TCMU device must supp    
288                                                   
289 a) Discovering and configuring TCMU uio device    
290 b) Waiting for events on the device(s)            
291 c) Managing the command ring: Parsing operatio    
292    performing work as needed, setting response    
293    possibly sense_buffer), updating cmd_tail,     
294    that work has been finished                    
295                                                   
296 First, consider instead writing a plugin for t    
297 implements all of this, and provides a higher-    
298 authors.                                          
299                                                   
300 TCMU is designed so that multiple unrelated pr    
301 devices separately. All handlers should make s    
302 devices, based opon a known subtype string.       
303                                                   
304 a) Discovering and configuring TCMU UIO device    
305                                                   
306       /* error checking omitted for brevity */    
307                                                   
308       int fd, dev_fd;                             
309       char buf[256];                              
310       unsigned long long map_len;                 
311       void *map;                                  
312                                                   
313       fd = open("/sys/class/uio/uio0/name", O_    
314       ret = read(fd, buf, sizeof(buf));           
315       close(fd);                                  
316       buf[ret-1] = '\0'; /* null-terminate and    
317                                                   
318       /* we only want uio devices whose name i    
319       if (strncmp(buf, "tcm-user", 8))            
320         exit(-1);                                 
321                                                   
322       /* Further checking for subtype also nee    
323                                                   
324       fd = open(/sys/class/uio/%s/maps/map0/si    
325       ret = read(fd, buf, sizeof(buf));           
326       close(fd);                                  
327       str_buf[ret-1] = '\0'; /* null-terminate    
328                                                   
329       map_len = strtoull(buf, NULL, 0);           
330                                                   
331       dev_fd = open("/dev/uio0", O_RDWR);         
332       map = mmap(NULL, map_len, PROT_READ|PROT    
333                                                   
334                                                   
335       b) Waiting for events on the device(s)      
336                                                   
337       while (1) {                                 
338         char buf[4];                              
339                                                   
340         int ret = read(dev_fd, buf, 4); /* wil    
341                                                   
342         handle_device_events(dev_fd, map);        
343       }                                           
344                                                   
345                                                   
346 c) Managing the command ring::                    
347                                                   
348       #include <linux/target_core_user.h>         
349                                                   
350       int handle_device_events(int fd, void *m    
351       {                                           
352         struct tcmu_mailbox *mb = map;            
353         struct tcmu_cmd_entry *ent = (void *)     
354         int did_some_work = 0;                    
355                                                   
356         /* Process events from cmd ring until     
357         while (ent != (void *)mb + mb->cmdr_of    
358                                                   
359           if (tcmu_hdr_get_op(ent->hdr.len_op)    
360             uint8_t *cdb = (void *)mb + ent->r    
361             bool success = true;                  
362                                                   
363             /* Handle command here. */            
364             printf("SCSI opcode: 0x%x\n", cdb[    
365                                                   
366             /* Set response fields */             
367             if (success)                          
368               ent->rsp.scsi_status = SCSI_NO_S    
369             else {                                
370               /* Also fill in rsp->sense_buffe    
371               ent->rsp.scsi_status = SCSI_CHEC    
372             }                                     
373           }                                       
374           else if (tcmu_hdr_get_op(ent->hdr.le    
375             /* Tell the kernel we didn't handl    
376             ent->hdr.uflags |= TCMU_UFLAG_UNKN    
377           }                                       
378           else {                                  
379             /* Do nothing for PAD entries exce    
380           }                                       
381                                                   
382           /* update cmd_tail */                   
383           mb->cmd_tail = (mb->cmd_tail + tcmu_    
384           ent = (void *) mb + mb->cmdr_off + m    
385           did_some_work = 1;                      
386         }                                         
387                                                   
388         /* Notify the kernel that work has bee    
389         if (did_some_work) {                      
390           uint32_t buf = 0;                       
391                                                   
392           write(fd, &buf, 4);                     
393         }                                         
394                                                   
395         return 0;                                 
396       }                                           
397                                                   
398                                                   
399 A final note                                      
400 ============                                      
401                                                   
402 Please be careful to return codes as defined b    
403 specifications. These are different than some     
404 scsi/scsi.h include file. For example, CHECK C    
405 is 2, not 1.                                      
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php