1 .. SPDX-License-Identifier: GPL-2.0 2 3 ================================== 4 relay interface (formerly relayfs) 5 ================================== 6 7 The relay interface provides a means for kerne 8 efficiently log and transfer large quantities 9 to userspace via user-defined 'relay channels' 10 11 A 'relay channel' is a kernel->user data relay 12 as a set of per-cpu kernel buffers ('channel b 13 represented as a regular file ('relay file') i 14 clients write into the channel buffers using e 15 functions; these automatically log into the cu 16 buffer. User space applications mmap() or rea 17 and retrieve the data as it becomes available. 18 themselves are files created in a host filesys 19 are associated with the channel buffers using 20 21 The format of the data logged into the channel 22 up to the kernel client; the relay interface d 23 hooks which allow kernel clients to impose som 24 buffer data. The relay interface doesn't impl 25 filtering - this also is left to the kernel cl 26 keep things as simple as possible. 27 28 This document provides an overview of the rela 29 details of the function parameters are documen 30 functions in the relay interface code - please 31 32 Semantics 33 ========= 34 35 Each relay channel has one buffer per CPU, eac 36 sub-buffers. Messages are written to the firs 37 too full to contain a new message, in which ca 38 the next (if available). Messages are never s 39 At this point, userspace can be notified so it 40 sub-buffer, while the kernel continues writing 41 42 When notified that a sub-buffer is full, the k 43 bytes of it are padding i.e. unused space occu 44 message couldn't fit into a sub-buffer. Users 45 knowledge to copy only valid data. 46 47 After copying it, userspace can notify the ker 48 has been consumed. 49 50 A relay channel can operate in a mode where it 51 yet collected by userspace, and not wait for i 52 53 The relay channel itself does not provide for 54 data between userspace and kernel, allowing th 55 simple and not impose a single interface on us 56 provide a set of examples and a separate helpe 57 below. 58 59 The read() interface both removes padding and 60 read sub-buffers; thus in cases where read(2) 61 the channel buffers, special-purpose communica 62 user isn't necessary for basic operation. 63 64 One of the major goals of the relay interface 65 overhead mechanism for conveying kernel data t 66 read() interface is easy to use, it's not as e 67 approach; the example code attempts to make th 68 two approaches as small as possible. 69 70 klog and relay-apps example code 71 ================================ 72 73 The relay interface itself is ready to use, bu 74 a couple simple utility functions and a set of 75 76 The relay-apps example tarball, available on t 77 site, contains a set of self-contained example 78 pair of .c files containing boilerplate code f 79 kernel sides of a relay application. When com 80 boilerplate code provide glue to easily stream 81 having to bother with mundane housekeeping cho 82 83 The 'klog debugging functions' patch (klog.pat 84 tarball) provides a couple of high-level loggi 85 kernel which allow writing formatted text or r 86 regardless of whether a channel to write into 87 whether the relay interface is compiled into t 88 functions allow you to put unconditional 'trac 89 in the kernel or kernel modules; only when the 90 registered will data actually be logged (see t 91 examples for details). 92 93 It is of course possible to use the relay inte 94 i.e. without using any of the relay-apps examp 95 you'll have to implement communication between 96 allowing both to convey the state of buffers ( 97 padding). The read() interface both removes p 98 consumes the read sub-buffers; thus in cases w 99 used to drain the channel buffers, special-pur 100 between kernel and user isn't necessary for ba 101 such as buffer-full conditions would still nee 102 some channel though. 103 104 klog and the relay-apps examples can be found 105 tarball on http://relayfs.sourceforge.net 106 107 The relay interface user space API 108 ================================== 109 110 The relay interface implements basic file oper 111 access to relay channel buffer data. Here are 112 that are available and some comments regarding 113 114 =========== ================================== 115 open() enables user to open an _existing_ 116 117 mmap() results in channel buffer being ma 118 memory space. Note that you can't 119 must map the entire file, which is 120 121 read() read the contents of a channel buf 122 'consumed' by the reader, i.e. the 123 again to subsequent reads. If the 124 in no-overwrite mode (the default) 125 time even if there's an active ker 126 channel is being used in overwrite 127 active channel writers, results ma 128 users should make sure that all lo 129 ended before using read() with ove 130 padding is automatically removed a 131 the reader. 132 133 sendfile() transfer data from a channel buffe 134 descriptor. Sub-buffer padding is 135 and will not be seen by the reader 136 137 poll() POLLIN/POLLRDNORM/POLLERR supporte 138 notified when sub-buffer boundarie 139 140 close() decrements the channel buffer's re 141 reaches 0, i.e. when no process or 142 buffer open, the channel buffer is 143 =========== ================================== 144 145 In order for a user application to make use of 146 host filesystem must be mounted. For example: 147 148 mount -t debugfs debugfs /sys/kernel/d 149 150 .. Note:: 151 152 the host filesystem doesn't need to be 153 clients to create or use channels - it 154 mounted when user space applications n 155 data. 156 157 158 The relay interface kernel API 159 ============================== 160 161 Here's a summary of the API the relay interfac 162 163 TBD(curr. line MT:/API/) 164 channel management functions:: 165 166 relay_open(base_filename, parent, subbuf_s 167 callbacks, private_data) 168 relay_close(chan) 169 relay_flush(chan) 170 relay_reset(chan) 171 172 channel management typically called on insti 173 174 relay_subbufs_consumed(chan, cpu, subbufs_ 175 176 write functions:: 177 178 relay_write(chan, data, length) 179 __relay_write(chan, data, length) 180 relay_reserve(chan, length) 181 182 callbacks:: 183 184 subbuf_start(buf, subbuf, prev_subbuf, pre 185 buf_mapped(buf, filp) 186 buf_unmapped(buf, filp) 187 create_buf_file(filename, parent, mode, bu 188 remove_buf_file(dentry) 189 190 helper functions:: 191 192 relay_buf_full(buf) 193 subbuf_start_reserve(buf, length) 194 195 196 Creating a channel 197 ------------------ 198 199 relay_open() is used to create a channel, alon 200 channel buffers. Each channel buffer will hav 201 created for it in the host filesystem, which c 202 read from in user space. The files are named 203 where N is the number of online cpus, and by d 204 in the root of the filesystem (if the parent p 205 want a directory structure to contain your rel 206 create it using the host filesystem's director 207 e.g. debugfs_create_dir(), and pass the parent 208 relay_open(). Users are responsible for clean 209 structure they create, when the channel is clo 210 filesystem's directory removal functions shoul 211 e.g. debugfs_remove(). 212 213 In order for a channel to be created and the h 214 associated with its channel buffers, the user 215 for two callback functions, create_buf_file() 216 create_buf_file() is called once for each per- 217 relay_open() and allows the user to create the 218 to represent the corresponding channel buffer. 219 return the dentry of the file created to repre 220 remove_buf_file() must also be defined; it's r 221 the file(s) created in create_buf_file() and i 222 relay_close(). 223 224 Here are some typical definitions for these ca 225 using debugfs:: 226 227 /* 228 * create_buf_file() callback. Creates rel 229 */ 230 static struct dentry *create_buf_file_hand 231 232 233 234 235 { 236 return debugfs_create_file(filenam 237 &relay_fil 238 } 239 240 /* 241 * remove_buf_file() callback. Removes rel 242 */ 243 static int remove_buf_file_handler(struct 244 { 245 debugfs_remove(dentry); 246 247 return 0; 248 } 249 250 /* 251 * relay interface callbacks 252 */ 253 static struct rchan_callbacks relay_callba 254 { 255 .create_buf_file = create_buf_file 256 .remove_buf_file = remove_buf_file 257 }; 258 259 And an example relay_open() invocation using t 260 261 chan = relay_open("cpu", NULL, SUBBUF_SIZE, 262 263 If the create_buf_file() callback fails, or is 264 creation and thus relay_open() will fail. 265 266 The total size of each per-cpu buffer is calcu 267 number of sub-buffers by the sub-buffer size p 268 The idea behind sub-buffers is that they're ba 269 double-buffering to N buffers, and they also a 270 easily implement random-access-on-buffer-bound 271 be important for some high-volume applications 272 of sub-buffers is completely dependent on the 273 the same application, different conditions wil 274 values for these parameters at different times 275 values to use are best decided after some expe 276 though, it's safe to assume that having only 1 277 idea - you're guaranteed to either overwrite d 278 depending on the channel mode being used. 279 280 The create_buf_file() implementation can also 281 as to allow the creation of a single 'global' 282 default per-cpu set. This can be useful for a 283 mainly in seeing the relative ordering of syst 284 the need to bother with saving explicit timest 285 merging/sorting per-cpu files in a postprocess 286 287 To have relay_open() create a global buffer, t 288 implementation should set the value of the is_ 289 non-zero value in addition to creating the fil 290 represent the single buffer. In the case of a 291 create_buf_file() and remove_buf_file() will b 292 normal channel-writing functions, e.g. relay_w 293 used - writes from any cpu will transparently 294 buffer - but since it is a global buffer, call 295 they use the proper locking for such a buffer, 296 writes in a spinlock, or by copying a write fu 297 creating a local version that internally does 298 299 The private_data passed into relay_open() allo 300 user-defined data with a channel, and is immed 301 (including in create_buf_file()) via chan->pri 302 buf->chan->private_data. 303 304 Buffer-only channels 305 -------------------- 306 307 These channels have no files associated and ca 308 relay_open(NULL, NULL, ...). Such channels are 309 as when doing early tracing in the kernel, bef 310 cases, one may open a buffer-only channel and 311 relay_late_setup_files() when the kernel is re 312 to expose the buffered data to the userspace. 313 314 Channel 'modes' 315 --------------- 316 317 relay channels can be used in either of two mo 318 'no-overwrite'. The mode is entirely determin 319 of the subbuf_start() callback, as described b 320 subbuf_start() callback is defined is 'no-over 321 default mode suits your needs, and you plan to 322 interface to retrieve channel data, you can ig 323 section, as it pertains mainly to mmap() imple 324 325 In 'overwrite' mode, also known as 'flight rec 326 continuously cycle around the buffer and will 327 unconditionally overwrite old data regardless 328 been consumed. In no-overwrite mode, writes w 329 be lost, if the number of unconsumed sub-buffe 330 number of sub-buffers in the channel. It shou 331 there is no consumer or if the consumer can't 332 enough, data will be lost in either case; the 333 whether data is lost from the beginning or the 334 335 As explained above, a relay channel is made of 336 per-cpu channel buffers, each implemented as a 337 subdivided into one or more sub-buffers. Mess 338 the current sub-buffer of the channel's curren 339 write functions described below. Whenever a m 340 the current sub-buffer, because there's no roo 341 client is notified via the subbuf_start() call 342 new sub-buffer is about to occur. The client 343 initialize the next sub-buffer if appropriate 344 sub-buffer if appropriate and 3) return a bool 345 whether or not to actually move on to the next 346 347 To implement 'no-overwrite' mode, the userspac 348 an implementation of the subbuf_start() callba 349 following:: 350 351 static int subbuf_start(struct rchan_buf * 352 void *subbuf, 353 void *prev_subbuf, 354 unsigned int prev_ 355 { 356 if (prev_subbuf) 357 *((unsigned *)prev_subbuf) 358 359 if (relay_buf_full(buf)) 360 return 0; 361 362 subbuf_start_reserve(buf, sizeof(u 363 364 return 1; 365 } 366 367 If the current buffer is full, i.e. all sub-bu 368 the callback returns 0 to indicate that the bu 369 occur yet, i.e. until the consumer has had a c 370 current set of ready sub-buffers. For the rel 371 to make sense, the consumer is responsible for 372 interface when sub-buffers have been consumed 373 relay_subbufs_consumed(). Any subsequent atte 374 buffer will again invoke the subbuf_start() ca 375 parameters; only when the consumer has consume 376 ready sub-buffers will relay_buf_full() return 377 buffer switch can continue. 378 379 The implementation of the subbuf_start() callb 380 would be very similar:: 381 382 static int subbuf_start(struct rchan_buf * 383 void *subbuf, 384 void *prev_subbuf, 385 size_t prev_paddin 386 { 387 if (prev_subbuf) 388 *((unsigned *)prev_subbuf) 389 390 subbuf_start_reserve(buf, sizeof(u 391 392 return 1; 393 } 394 395 In this case, the relay_buf_full() check is me 396 callback always returns 1, causing the buffer 397 unconditionally. It's also meaningless for th 398 relay_subbufs_consumed() function in this mode 399 consulted. 400 401 The default subbuf_start() implementation, use 402 define any callbacks, or doesn't define the su 403 implements the simplest possible 'no-overwrite 404 nothing but return 0. 405 406 Header information can be reserved at the begi 407 by calling the subbuf_start_reserve() helper f 408 subbuf_start() callback. This reserved area c 409 whatever information the client wants. In the 410 reserved in each sub-buffer to store the paddi 411 sub-buffer. This is filled in for the previou 412 subbuf_start() implementation; the padding val 413 sub-buffer is passed into the subbuf_start() c 414 pointer to the previous sub-buffer, since the 415 known until a sub-buffer is filled. The subbu 416 also called for the first sub-buffer when the 417 give the client a chance to reserve space in i 418 previous sub-buffer pointer passed into the ca 419 the client should check the value of the prev_ 420 writing into the previous sub-buffer. 421 422 Writing to a channel 423 -------------------- 424 425 Kernel clients write data into the current cpu 426 relay_write() or __relay_write(). relay_write 427 function - it uses local_irqsave() to protect 428 used if you might be logging from interrupt co 429 you'll never be logging from interrupt context 430 __relay_write(), which only disables preemptio 431 don't return a value, so you can't determine w 432 failed - the assumption is that you wouldn't w 433 value in the fast logging path anyway, and tha 434 unless the buffer is full and no-overwrite mod 435 which case you can detect a failed write in th 436 callback by calling the relay_buf_full() helpe 437 438 relay_reserve() is used to reserve a slot in a 439 can be written to later. This would typically 440 that need to write directly into a channel buf 441 stage data in a temporary buffer beforehand. 442 may not happen immediately after the slot is r 443 using relay_reserve() can keep a count of the 444 written, either in space reserved in the sub-b 445 a separate array. See the 'reserve' example i 446 at http://relayfs.sourceforge.net for an examp 447 done. Because the write is under control of t 448 separated from the reserve, relay_reserve() do 449 at all - it's up to the client to provide the 450 synchronization when using relay_reserve(). 451 452 Closing a channel 453 ----------------- 454 455 The client calls relay_close() when it's finis 456 The channel and its associated buffers are des 457 longer any references to any of the channel bu 458 forces a sub-buffer switch on all the channel 459 to finalize and process the last sub-buffers b 460 closed. 461 462 Misc 463 ---- 464 465 Some applications may want to keep a channel a 466 rather than open and close a new channel for e 467 can be used for this purpose - it resets a cha 468 state without reallocating channel buffer memo 469 existing mappings. It should however only be 470 do so, i.e. when the channel isn't currently b 471 472 Finally, there are a couple of utility callbac 473 different purposes. buf_mapped() is called wh 474 is mmapped from user space and buf_unmapped() 475 unmapped. The client can use this notificatio 476 within the kernel application, such as enablin 477 the channel. 478 479 480 Resources 481 ========= 482 483 For news, example code, mailing list, etc. see 484 485 http://relayfs.sourceforge.net 486 487 488 Credits 489 ======= 490 491 The ideas and specs for the relay interface ca 492 discussions on tracing involving the following 493 494 Michel Dagenais <michel.dagenais@polymt 495 Richard Moore <richardj_moore@uk.ibm. 496 Bob Wisniewski <bob@watson.ibm.com> 497 Karim Yaghmour <karim@opersys.com> 498 Tom Zanussi <zanussi@us.ibm.com> 499 500 Also thanks to Hubertus Franke for a lot of us 501 reports.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.