1 ============= 2 Event Tracing 3 ============= 4 5 :Author: Theodore Ts'o 6 :Updated: Li Zefan and Tom Zanussi 7 8 1. Introduction 9 =============== 10 11 Tracepoints (see Documentation/trace/tracepoin 12 without creating custom kernel modules to regi 13 using the event tracing infrastructure. 14 15 Not all tracepoints can be traced using the ev 16 the kernel developer must provide code snippet 17 tracing information is saved into the tracing 18 tracing information should be printed. 19 20 2. Using Event Tracing 21 ====================== 22 23 2.1 Via the 'set_event' interface 24 --------------------------------- 25 26 The events which are available for tracing can 27 /sys/kernel/tracing/available_events. 28 29 To enable a particular event, such as 'sched_w 30 to /sys/kernel/tracing/set_event. For example: 31 32 # echo sched_wakeup >> /sys/kernel/tra 33 34 .. Note:: '>>' is necessary, otherwise it will 35 36 To disable an event, echo the event name to th 37 with an exclamation point:: 38 39 # echo '!sched_wakeup' >> /sys/kernel/ 40 41 To disable all events, echo an empty line to t 42 43 # echo > /sys/kernel/tracing/set_event 44 45 To enable all events, echo ``*:*`` or ``*:`` t 46 47 # echo *:* > /sys/kernel/tracing/set_e 48 49 The events are organized into subsystems, such 50 etc., and a full event name looks like this: < 51 subsystem name is optional, but it is displaye 52 file. All of the events in a subsystem can be 53 ``<subsystem>:*``; for example, to enable all 54 command:: 55 56 # echo 'irq:*' > /sys/kernel/tracing/s 57 58 2.2 Via the 'enable' toggle 59 --------------------------- 60 61 The events available are also listed in /sys/k 62 of directories. 63 64 To enable event 'sched_wakeup':: 65 66 # echo 1 > /sys/kernel/tracing/events/ 67 68 To disable it:: 69 70 # echo 0 > /sys/kernel/tracing/events/ 71 72 To enable all events in sched subsystem:: 73 74 # echo 1 > /sys/kernel/tracing/events/ 75 76 To enable all events:: 77 78 # echo 1 > /sys/kernel/tracing/events/ 79 80 When reading one of these enable files, there 81 82 - 0 - all events this file affects are disabl 83 - 1 - all events this file affects are enable 84 - X - there is a mixture of events enabled an 85 - ? - this file does not affect any event 86 87 2.3 Boot option 88 --------------- 89 90 In order to facilitate early boot debugging, u 91 92 trace_event=[event-list] 93 94 event-list is a comma separated list of events 95 format. 96 97 3. Defining an event-enabled tracepoint 98 ======================================= 99 100 See The example provided in samples/trace_even 101 102 4. Event formats 103 ================ 104 105 Each trace event has a 'format' file associate 106 a description of each field in a logged event. 107 be used to parse the binary trace stream, and 108 find the field names that can be used in event 109 110 It also displays the format string that will b 111 event in text mode, along with the event name 112 profiling. 113 114 Every event has a set of ``common`` fields ass 115 the fields prefixed with ``common_``. The oth 116 events and correspond to the fields defined in 117 definition for that event. 118 119 Each field in the format has the form:: 120 121 field:field-type field-name; offset:N; si 122 123 where offset is the offset of the field in the 124 is the size of the data item, in bytes. 125 126 For example, here's the information displayed 127 event:: 128 129 # cat /sys/kernel/tracing/events/sched 130 131 name: sched_wakeup 132 ID: 60 133 format: 134 field:unsigned short common_ty 135 field:unsigned char common_fla 136 field:unsigned char common_pre 137 field:int common_pid; offset 138 field:int common_tgid; offset 139 140 field:char comm[TASK_COMM_LEN] 141 field:pid_t pid; offset 142 field:int prio; offset:32; 143 field:int success; offset 144 field:int cpu; offset:40; 145 146 print fmt: "task %s:%d [%d] success=%d 147 REC->prio, REC->success, RE 148 149 This event contains 10 fields, the first 5 com 150 event-specific. All the fields for this event 151 'comm' which is a string, a distinction import 152 153 5. Event filtering 154 ================== 155 156 Trace events can be filtered in the kernel by 157 'filter expressions' with them. As soon as an 158 the trace buffer, its fields are checked again 159 associated with that event type. An event wit 160 'match' the filter will appear in the trace ou 161 values don't match will be discarded. An even 162 associated with it matches everything, and is 163 filter has been set for an event. 164 165 5.1 Expression syntax 166 --------------------- 167 168 A filter expression consists of one or more 'p 169 combined using the logical operators '&&' and 170 simply a clause that compares the value of a f 171 logged event with a constant value and returns 172 on whether the field value matched (1) or didn 173 174 field-name relational-operator value 175 176 Parentheses can be used to provide arbitrary l 177 double-quotes can be used to prevent the shell 178 operators as shell metacharacters. 179 180 The field-names available for use in filters c 181 'format' files for trace events (see section 4 182 183 The relational-operators depend on the type of 184 185 The operators available for numeric fields are 186 187 ==, !=, <, <=, >, >=, & 188 189 And for string fields they are: 190 191 ==, !=, ~ 192 193 The glob (~) accepts a wild card character (\* 194 ([). For example:: 195 196 prev_comm ~ "*sh" 197 prev_comm ~ "sh*" 198 prev_comm ~ "*sh*" 199 prev_comm ~ "ba*sh" 200 201 If the field is a pointer that points into use 202 "filename" from sys_enter_openat), then you ha 203 field name:: 204 205 filename.ustring ~ "password" 206 207 As the kernel will have to know how to retriev 208 is at from user space. 209 210 You can convert any long type to a function ad 211 212 call_site.function == security_prepare_creds 213 214 The above will filter when the field "call_sit 215 "security_prepare_creds". That is, it will com 216 the filter will return true if it is greater t 217 the function "security_prepare_creds" and less 218 219 The ".function" postfix can only be attached t 220 be compared with "==" or "!=". 221 222 Cpumask fields or scalar fields that encode a 223 a user-provided cpumask in cpulist format. The 224 225 CPUS{$cpulist} 226 227 Operators available to cpumask filtering are: 228 229 & (intersection), ==, != 230 231 For example, this will filter events that have 232 in the given cpumask:: 233 234 target_cpu & CPUS{17-42} 235 236 5.2 Setting filters 237 ------------------- 238 239 A filter for an individual event is set by wri 240 to the 'filter' file for the given event. 241 242 For example:: 243 244 # cd /sys/kernel/tracing/events/sched/ 245 # echo "common_preempt_count > 4" > fi 246 247 A slightly more involved example:: 248 249 # cd /sys/kernel/tracing/events/signal 250 # echo "((sig >= 10 && sig < 15) || si 251 252 If there is an error in the expression, you'll 253 argument' error when setting it, and the erron 254 an error message can be seen by looking at the 255 256 # cd /sys/kernel/tracing/events/signal 257 # echo "((sig >= 10 && sig < 15) || ds 258 -bash: echo: write error: Invalid argu 259 # cat filter 260 ((sig >= 10 && sig < 15) || dsig == 17 261 ^ 262 parse_error: Field not found 263 264 Currently the caret ('^') for an error always 265 the filter string; the error message should st 266 even without more accurate position info. 267 268 5.2.1 Filter limitations 269 ------------------------ 270 271 If a filter is placed on a string pointer ``(c 272 to a string on the ring buffer, but instead po 273 memory, then, for safety reasons, at most 1024 274 copied onto a temporary buffer to do the compa 275 faults (the pointer points to memory that shou 276 string compare will be treated as not matching 277 278 5.3 Clearing filters 279 -------------------- 280 281 To clear the filter for an event, write a '0' 282 file. 283 284 To clear the filters for all events in a subsy 285 subsystem's filter file. 286 287 5.4 Subsystem filters 288 --------------------- 289 290 For convenience, filters for every event in a 291 cleared as a group by writing a filter express 292 at the root of the subsystem. Note however, t 293 event within the subsystem lacks a field speci 294 filter, or if the filter can't be applied for 295 filter for that event will retain its previous 296 result in an unintended mixture of filters whi 297 confusing (to the user who might think differe 298 effect) trace output. Only filters that refer 299 fields can be guaranteed to propagate successf 300 301 Here are a few subsystem filter examples that 302 above points: 303 304 Clear the filters on all events in the sched s 305 306 # cd /sys/kernel/tracing/events/sched 307 # echo 0 > filter 308 # cat sched_switch/filter 309 none 310 # cat sched_wakeup/filter 311 none 312 313 Set a filter using only common fields for all 314 subsystem (all events end up with the same fil 315 316 # cd /sys/kernel/tracing/events/sched 317 # echo common_pid == 0 > filter 318 # cat sched_switch/filter 319 common_pid == 0 320 # cat sched_wakeup/filter 321 common_pid == 0 322 323 Attempt to set a filter using a non-common fie 324 sched subsystem (all events but those that hav 325 their old filters):: 326 327 # cd /sys/kernel/tracing/events/sched 328 # echo prev_pid == 0 > filter 329 # cat sched_switch/filter 330 prev_pid == 0 331 # cat sched_wakeup/filter 332 common_pid == 0 333 334 5.5 PID filtering 335 ----------------- 336 337 The set_event_pid file in the same directory a 338 exists, will filter all events from tracing an 339 PID listed in the set_event_pid file. 340 :: 341 342 # cd /sys/kernel/tracing 343 # echo $$ > set_event_pid 344 # echo 1 > events/enable 345 346 Will only trace events for the current task. 347 348 To add more PIDs without losing the PIDs alrea 349 :: 350 351 # echo 123 244 1 >> set_event_pid 352 353 354 6. Event triggers 355 ================= 356 357 Trace events can be made to conditionally invo 358 which can take various forms and are described 359 examples would be enabling or disabling other 360 a stack trace whenever the trace event is hit. 361 with attached triggers is invoked, the set of 362 associated with that event is invoked. Any gi 363 additionally have an event filter of the same 364 section 5 (Event filtering) associated with it 365 be invoked if the event being invoked passes t 366 If no filter is associated with the trigger, i 367 368 Triggers are added to and removed from a parti 369 trigger expressions to the 'trigger' file for 370 371 A given event can have any number of triggers 372 subject to any restrictions that individual co 373 regard. 374 375 Event triggers are implemented on top of "soft 376 whenever a trace event has one or more trigger 377 the event is activated even if it isn't actual 378 disabled in a "soft" mode. That is, the trace 379 but just will not be traced, unless of course 380 This scheme allows triggers to be invoked even 381 enabled, and also allows the current event fil 382 used for conditionally invoking triggers. 383 384 The syntax for event triggers is roughly based 385 set_ftrace_filter 'ftrace filter commands' (se 386 section of Documentation/trace/ftrace.rst), bu 387 differences and the implementation isn't curre 388 way, so beware about making generalizations be 389 390 .. Note:: 391 Writing into trace_marker (See Documentat 392 can also enable triggers that are written 393 /sys/kernel/tracing/events/ftrace/print/t 394 395 6.1 Expression syntax 396 --------------------- 397 398 Triggers are added by echoing the command to t 399 400 # echo 'command[:count] [if filter]' > trigg 401 402 Triggers are removed by echoing the same comma 403 to the 'trigger' file:: 404 405 # echo '!command[:count] [if filter]' > trig 406 407 The [if filter] part isn't used in matching co 408 leaving that off in a '!' command will accompl 409 having it in. 410 411 The filter syntax is the same as that describe 412 filtering' section above. 413 414 For ease of use, writing to the trigger file u 415 adds or removes a single trigger and there's n 416 ('>' actually behaves like '>>') or truncation 417 triggers (you have to use '!' for each one add 418 419 6.2 Supported trigger commands 420 ------------------------------ 421 422 The following commands are supported: 423 424 - enable_event/disable_event 425 426 These commands can enable or disable another 427 the triggering event is hit. When these com 428 the other trace event is activated, but disa 429 That is, the tracepoint will be called, but 430 The event tracepoint stays in this mode as l 431 in effect that can trigger it. 432 433 For example, the following trigger causes km 434 traced when a read system call is entered, a 435 specifies that this enablement happens only 436 437 # echo 'enable_event:kmem:kmalloc:1' 438 /sys/kernel/tracing/events/sysca 439 440 The following trigger causes kmalloc events 441 when a read system call exits. This disable 442 read system call exit:: 443 444 # echo 'disable_event:kmem:kmalloc' 445 /sys/kernel/tracing/events/sysca 446 447 The format is:: 448 449 enable_event:<system>:<event>[:count] 450 disable_event:<system>:<event>[:count] 451 452 To remove the above commands:: 453 454 # echo '!enable_event:kmem:kmalloc:1 455 /sys/kernel/tracing/events/sysca 456 457 # echo '!disable_event:kmem:kmalloc' 458 /sys/kernel/tracing/events/sysca 459 460 Note that there can be any number of enable/ 461 per triggering event, but there can only be 462 triggered event. e.g. sys_enter_read can hav 463 kmem:kmalloc and sched:sched_switch, but can 464 versions such as kmem:kmalloc and kmem:kmall 465 bytes_req == 256' and 'kmem:kmalloc if bytes 466 could be combined into a single filter on km 467 468 - stacktrace 469 470 This command dumps a stacktrace in the trace 471 triggering event occurs. 472 473 For example, the following trigger dumps a s 474 kmalloc tracepoint is hit:: 475 476 # echo 'stacktrace' > \ 477 /sys/kernel/tracing/events/kme 478 479 The following trigger dumps a stacktrace the 480 request happens with a size >= 64K:: 481 482 # echo 'stacktrace:5 if bytes_req >= 483 /sys/kernel/tracing/events/kme 484 485 The format is:: 486 487 stacktrace[:count] 488 489 To remove the above commands:: 490 491 # echo '!stacktrace' > \ 492 /sys/kernel/tracing/events/kme 493 494 # echo '!stacktrace:5 if bytes_req > 495 /sys/kernel/tracing/events/kme 496 497 The latter can also be removed more simply b 498 the filter):: 499 500 # echo '!stacktrace:5' > \ 501 /sys/kernel/tracing/events/kme 502 503 Note that there can be only one stacktrace t 504 event. 505 506 - snapshot 507 508 This command causes a snapshot to be trigger 509 triggering event occurs. 510 511 The following command creates a snapshot eve 512 queue is unplugged with a depth > 1. If you 513 events or functions at the time, the snapsho 514 capture those events when the trigger event 515 516 # echo 'snapshot if nr_rq > 1' > \ 517 /sys/kernel/tracing/events/blo 518 519 To only snapshot once:: 520 521 # echo 'snapshot:1 if nr_rq > 1' > \ 522 /sys/kernel/tracing/events/blo 523 524 To remove the above commands:: 525 526 # echo '!snapshot if nr_rq > 1' > \ 527 /sys/kernel/tracing/events/blo 528 529 # echo '!snapshot:1 if nr_rq > 1' > 530 /sys/kernel/tracing/events/blo 531 532 Note that there can be only one snapshot tri 533 event. 534 535 - traceon/traceoff 536 537 These commands turn tracing on and off when 538 hit. The parameter determines how many times 539 turned on and off. If unspecified, there is 540 541 The following command turns tracing off the 542 request queue is unplugged with a depth > 1. 543 set of events or functions at the time, you 544 trace buffer to see the sequence of events t 545 trigger event:: 546 547 # echo 'traceoff:1 if nr_rq > 1' > \ 548 /sys/kernel/tracing/events/blo 549 550 To always disable tracing when nr_rq > 1:: 551 552 # echo 'traceoff if nr_rq > 1' > \ 553 /sys/kernel/tracing/events/blo 554 555 To remove the above commands:: 556 557 # echo '!traceoff:1 if nr_rq > 1' > 558 /sys/kernel/tracing/events/blo 559 560 # echo '!traceoff if nr_rq > 1' > \ 561 /sys/kernel/tracing/events/blo 562 563 Note that there can be only one traceon or t 564 triggering event. 565 566 - hist 567 568 This command aggregates event hits into a ha 569 more trace event format fields (or stacktrac 570 totals derived from one or more trace event 571 event counts (hitcount). 572 573 See Documentation/trace/histogram.rst for de 574 575 7. In-kernel trace event API 576 ============================ 577 578 In most cases, the command-line interface to t 579 sufficient. Sometimes, however, applications 580 more complex relationships than can be express 581 series of linked command-line expressions, or 582 commands may be simply too cumbersome. An exa 583 application that needs to 'listen' to the trac 584 maintain an in-kernel state machine detecting, 585 illegal kernel state occurs in the scheduler. 586 587 The trace event subsystem provides an in-kerne 588 or other kernel code to generate user-defined 589 will, which can be used to either augment the 590 and/or signal that a particular important stat 591 592 A similar in-kernel API is also available for 593 kretprobe events. 594 595 Both the synthetic event and k/ret/probe event 596 of a lower-level "dynevent_cmd" event command 597 available for more specialized applications, o 598 higher-level trace event APIs. 599 600 The API provided for these purposes is describ 601 following: 602 603 - dynamically creating synthetic event defin 604 - dynamically creating kprobe and kretprobe 605 - tracing synthetic events from in-kernel co 606 - the low-level "dynevent_cmd" API 607 608 7.1 Dyamically creating synthetic event defini 609 ---------------------------------------------- 610 611 There are a couple ways to create a new synthe 612 module or other kernel code. 613 614 The first creates the event in one step, using 615 In this method, the name of the event to creat 616 the fields is supplied to synth_event_create() 617 synthetic event with that name and fields will 618 call. For example, to create a new "schedtest 619 620 ret = synth_event_create("schedtest", sched_ 621 ARRAY_SIZE(sched_fi 622 623 The sched_fields param in this example points 624 synth_field_desc, each of which describes an e 625 name:: 626 627 static struct synth_field_desc sched_fields[ 628 { .type = "pid_t", .name 629 { .type = "char[16]", .name 630 { .type = "u64", .name 631 { .type = "u64", .name 632 { .type = "unsigned int", .name 633 { .type = "char[64]", .name 634 { .type = "int", .name 635 }; 636 637 See synth_field_size() for available types. 638 639 If field_name contains [n], the field is consi 640 641 If field_names contains[] (no subscript), the 642 be a dynamic array, which will only take as mu 643 is required to hold the array. 644 645 Because space for an event is reserved before 646 to the event, using dynamic arrays implies tha 647 in-kernel API described below can't be used wi 648 other non-piecewise in-kernel APIs can, howeve 649 arrays. 650 651 If the event is created from within a module, 652 must be passed to synth_event_create(). This 653 trace buffer won't contain unreadable events w 654 removed. 655 656 At this point, the event object is ready to be 657 events. 658 659 In the second method, the event is created in 660 allows events to be created dynamically and wi 661 and populate an array of fields beforehand. 662 663 To use this method, an empty or partially empt 664 first be created using synth_event_gen_cmd_sta 665 synth_event_gen_cmd_array_start(). For synth_ 666 the name of the event along with one or more p 667 representing a 'type field_name;' field specif 668 supplied. For synth_event_gen_cmd_array_start 669 event along with an array of struct synth_fiel 670 supplied. Before calling synth_event_gen_cmd_s 671 synth_event_gen_cmd_array_start(), the user sh 672 initialize a dynevent_cmd object using synth_e 673 674 For example, to create a new "schedtest" synth 675 fields:: 676 677 struct dynevent_cmd cmd; 678 char *buf; 679 680 /* Create a buffer to hold the generated com 681 buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERN 682 683 /* Before generating the command, initialize 684 synth_event_cmd_init(&cmd, buf, MAX_DYNEVENT 685 686 ret = synth_event_gen_cmd_start(&cmd, "sched 687 "pid_t", "ne 688 "u64", "ts_n 689 690 Alternatively, using an array of struct synth_ 691 containing the same information:: 692 693 ret = synth_event_gen_cmd_array_start(&cmd, 694 fields 695 696 Once the synthetic event object has been creat 697 populated with more fields. Fields are added 698 synth_event_add_field(), supplying the dyneven 699 type, and a field name. For example, to add a 700 "intfield", the following call should be made: 701 702 ret = synth_event_add_field(&cmd, "int", "in 703 704 See synth_field_size() for available types. If 705 the field is considered to be an array. 706 707 A group of fields can also be added all at onc 708 synth_field_desc with add_synth_fields(). For 709 just the first four sched_fields:: 710 711 ret = synth_event_add_fields(&cmd, sched_fie 712 713 If you already have a string of the form 'type 714 synth_event_add_field_str() can be used to add 715 also automatically append a ';' to the string. 716 717 Once all the fields have been added, the event 718 registered by calling the synth_event_gen_cmd_ 719 720 ret = synth_event_gen_cmd_end(&cmd); 721 722 At this point, the event object is ready to be 723 events. 724 725 7.2 Tracing synthetic events from in-kernel co 726 ---------------------------------------------- 727 728 To trace a synthetic event, there are several 729 option is to trace the event in one call, usin 730 with a variable number of values, or synth_eve 731 array of values to be set. A second option ca 732 need for a pre-formed array of values or list 733 synth_event_trace_start() and synth_event_trac 734 synth_event_add_next_val() or synth_event_add_ 735 piecewise. 736 737 7.2.1 Tracing a synthetic event all at once 738 ------------------------------------------- 739 740 To trace a synthetic event all at once, the sy 741 synth_event_trace_array() functions can be use 742 743 The synth_event_trace() function is passed the 744 representing the synthetic event (which can be 745 trace_get_event_file() using the synthetic eve 746 the system name, and the trace instance name ( 747 trace array)), along with an variable number o 748 synthetic event field, and the number of value 749 750 So, to trace an event corresponding to the syn 751 above, code like the following could be used:: 752 753 ret = synth_event_trace(create_synth_test, 7 754 444, /* 755 (u64)"clackers", /* 756 1000000, /* 757 1000, /* 758 smp_processor_id(),/ 759 (u64)"Thneed", /* 760 999); /* 761 762 All vals should be cast to u64, and string val 763 strings, cast to u64. Strings will be copied 764 the event for the string, using these pointers 765 766 Alternatively, the synth_event_trace_array() f 767 accomplish the same thing. It is passed the t 768 representing the synthetic event (which can be 769 trace_get_event_file() using the synthetic eve 770 the system name, and the trace instance name ( 771 trace array)), along with an array of u64, one 772 event field. 773 774 To trace an event corresponding to the synthet 775 above, code like the following could be used:: 776 777 u64 vals[7]; 778 779 vals[0] = 777; /* next_pid_ 780 vals[1] = (u64)"tiddlywinks"; /* next_comm 781 vals[2] = 1000000; /* ts_ns */ 782 vals[3] = 1000; /* ts_ms */ 783 vals[4] = smp_processor_id(); /* cpu */ 784 vals[5] = (u64)"thneed"; /* my_string 785 vals[6] = 398; /* my_int_fi 786 787 The 'vals' array is just an array of u64, the 788 match the number of field in the synthetic eve 789 the same order as the synthetic event fields. 790 791 All vals should be cast to u64, and string val 792 strings, cast to u64. Strings will be copied 793 the event for the string, using these pointers 794 795 In order to trace a synthetic event, a pointer 796 is needed. The trace_get_event_file() functio 797 it - it will find the file in the given trace 798 NULL since the top trace array is being used) 799 preventing the instance containing it from goi 800 801 schedtest_event_file = trace_get_event_ 802 803 804 Before tracing the event, it should be enabled 805 the synthetic event won't actually show up in 806 807 To enable a synthetic event from the kernel, t 808 can be used (which is not specific to syntheti 809 the "synthetic" system name to be specified ex 810 811 To enable the event, pass 'true' to it:: 812 813 trace_array_set_clr_event(schedtest_eve 814 "synthetic", 815 816 To disable it pass false:: 817 818 trace_array_set_clr_event(schedtest_eve 819 "synthetic", 820 821 Finally, synth_event_trace_array() can be used 822 event, which should be visible in the trace bu 823 824 ret = synth_event_trace_array(schedtest 825 ARRAY_SIZ 826 827 To remove the synthetic event, the event shoul 828 trace instance should be 'put' back using trac 829 830 trace_array_set_clr_event(schedtest_eve 831 "synthetic", 832 trace_put_event_file(schedtest_event_fi 833 834 If those have been successful, synth_event_del 835 remove the event:: 836 837 ret = synth_event_delete("schedtest"); 838 839 7.2.2 Tracing a synthetic event piecewise 840 ----------------------------------------- 841 842 To trace a synthetic using the piecewise metho 843 synth_event_trace_start() function is used to 844 event trace:: 845 846 struct synth_event_trace_state trace_st 847 848 ret = synth_event_trace_start(schedtest 849 850 It's passed the trace_event_file representing 851 using the same methods as described above, alo 852 struct synth_event_trace_state object, which w 853 used to maintain state between this and follow 854 855 Once the event has been opened, which means sp 856 reserved in the trace buffer, the individual f 857 are two ways to do that, either one after anot 858 the event, which requires no lookups, or by na 859 tradeoff is flexibility in doing the assignmen 860 lookup per field. 861 862 To assign the values one after the other witho 863 synth_event_add_next_val() should be used. Ea 864 same synth_event_trace_state object used in th 865 along with the value to set the next field in 866 field is set, the 'cursor' points to the next 867 by the subsequent call, continuing until all t 868 in order. The same sequence of calls as in th 869 this method would be (without error-handling c 870 871 /* next_pid_field */ 872 ret = synth_event_add_next_val(777, &tr 873 874 /* next_comm_field */ 875 ret = synth_event_add_next_val((u64)"sl 876 877 /* ts_ns */ 878 ret = synth_event_add_next_val(1000000, 879 880 /* ts_ms */ 881 ret = synth_event_add_next_val(1000, &t 882 883 /* cpu */ 884 ret = synth_event_add_next_val(smp_proc 885 886 /* my_string_field */ 887 ret = synth_event_add_next_val((u64)"th 888 889 /* my_int_field */ 890 ret = synth_event_add_next_val(395, &tr 891 892 To assign the values in any order, synth_event 893 used. Each call is passed the same synth_even 894 the synth_event_trace_start(), along with the 895 to set and the value to set it to. The same s 896 the above examples using this method would be 897 code):: 898 899 ret = synth_event_add_val("next_pid_fie 900 ret = synth_event_add_val("next_comm_fi 901 &trace_state) 902 ret = synth_event_add_val("ts_ns", 1000 903 ret = synth_event_add_val("ts_ms", 1000 904 ret = synth_event_add_val("cpu", smp_pr 905 ret = synth_event_add_val("my_string_fi 906 &trace_state) 907 ret = synth_event_add_val("my_int_field 908 909 Note that synth_event_add_next_val() and synth 910 incompatible if used within the same trace of 911 can be used but not both at the same time. 912 913 Finally, the event won't be actually traced un 914 which is done using synth_event_trace_end(), w 915 struct synth_event_trace_state object used in 916 917 ret = synth_event_trace_end(&trace_stat 918 919 Note that synth_event_trace_end() must be call 920 of whether any of the add calls failed (say du 921 being passed in). 922 923 7.3 Dyamically creating kprobe and kretprobe e 924 ---------------------------------------------- 925 926 To create a kprobe or kretprobe trace event fr 927 kprobe_event_gen_cmd_start() or kretprobe_even 928 functions can be used. 929 930 To create a kprobe event, an empty or partiall 931 should first be created using kprobe_event_gen 932 of the event and the probe location should be 933 or args each representing a probe field should 934 function. Before calling kprobe_event_gen_cmd 935 should create and initialize a dynevent_cmd ob 936 kprobe_event_cmd_init(). 937 938 For example, to create a new "schedtest" kprob 939 940 struct dynevent_cmd cmd; 941 char *buf; 942 943 /* Create a buffer to hold the generated com 944 buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERN 945 946 /* Before generating the command, initialize 947 kprobe_event_cmd_init(&cmd, buf, MAX_DYNEVEN 948 949 /* 950 * Define the gen_kprobe_test event with the 951 * fields. 952 */ 953 ret = kprobe_event_gen_cmd_start(&cmd, "gen_ 954 "dfd=%ax", 955 956 Once the kprobe event object has been created, 957 populated with more fields. Fields can be add 958 kprobe_event_add_fields(), supplying the dynev 959 with a variable arg list of probe fields. For 960 couple additional fields, the following call c 961 962 ret = kprobe_event_add_fields(&cmd, "flags=% 963 964 Once all the fields have been added, the event 965 registered by calling the kprobe_event_gen_cmd 966 kretprobe_event_gen_cmd_end() functions, depen 967 or kretprobe command was started:: 968 969 ret = kprobe_event_gen_cmd_end(&cmd); 970 971 or:: 972 973 ret = kretprobe_event_gen_cmd_end(&cmd); 974 975 At this point, the event object is ready to be 976 events. 977 978 Similarly, a kretprobe event can be created us 979 kretprobe_event_gen_cmd_start() with a probe n 980 additional params such as $retval:: 981 982 ret = kretprobe_event_gen_cmd_start(&cmd, "g 983 "do_sys_ 984 985 Similar to the synthetic event case, code like 986 used to enable the newly created kprobe event: 987 988 gen_kprobe_test = trace_get_event_file(NULL, 989 990 ret = trace_array_set_clr_event(gen_kprobe_t 991 "kprobes", " 992 993 Finally, also similar to synthetic events, the 994 used to give the kprobe event file back and de 995 996 trace_put_event_file(gen_kprobe_test); 997 998 ret = kprobe_event_delete("gen_kprobe_test") 999 1000 7.4 The "dynevent_cmd" low-level API 1001 ------------------------------------ 1002 1003 Both the in-kernel synthetic event and kprobe 1004 top of a lower-level "dynevent_cmd" interface 1005 meant to provide the basis for higher-level i 1006 synthetic and kprobe interfaces, which can be 1007 1008 The basic idea is simple and amounts to provi 1009 layer that can be used to generate trace even 1010 generated command strings can then be passed 1011 and event creation code that already exists i 1012 subsystem for creating the corresponding trac 1013 1014 In a nutshell, the way it works is that the h 1015 code creates a struct dynevent_cmd object, th 1016 functions, dynevent_arg_add() and dynevent_ar 1017 a command string, which finally causes the co 1018 using the dynevent_create() function. The de 1019 are described below. 1020 1021 The first step in building a new command stri 1022 initialize an instance of a dynevent_cmd. He 1023 create a dynevent_cmd on the stack and initia 1024 1025 struct dynevent_cmd cmd; 1026 char *buf; 1027 int ret; 1028 1029 buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KER 1030 1031 dynevent_cmd_init(cmd, buf, maxlen, DYNEVEN 1032 foo_event_run_command); 1033 1034 The dynevent_cmd initialization needs to be g 1035 buffer and the length of the buffer (MAX_DYNE 1036 for this purpose - at 2k it's generally too b 1037 on the stack, so is dynamically allocated), a 1038 is meant to be used to check that further API 1039 correct command type, and a pointer to an eve 1040 callback that will be called to actually exec 1041 command function. 1042 1043 Once that's done, the command string can by b 1044 calls to argument-adding functions. 1045 1046 To add a single argument, define and initiali 1047 or struct dynevent_arg_pair object. Here's a 1048 possible arg addition, which is simply to app 1049 a whitespace-separated argument to the comman 1050 1051 struct dynevent_arg arg; 1052 1053 dynevent_arg_init(&arg, NULL, 0); 1054 1055 arg.str = name; 1056 1057 ret = dynevent_arg_add(cmd, &arg); 1058 1059 The arg object is first initialized using dyn 1060 this case the parameters are NULL or 0, which 1061 optional sanity-checking function or separato 1062 the arg. 1063 1064 Here's another more complicated example using 1065 used to create an argument that consists of a 1066 together as a unit, for example, a 'type fiel 1067 expression arg e.g. 'flags=%cx':: 1068 1069 struct dynevent_arg_pair arg_pair; 1070 1071 dynevent_arg_pair_init(&arg_pair, dynevent_ 1072 1073 arg_pair.lhs = type; 1074 arg_pair.rhs = name; 1075 1076 ret = dynevent_arg_pair_add(cmd, &arg_pair) 1077 1078 Again, the arg_pair is first initialized, in 1079 function used to check the sanity of the args 1080 neither part of the pair is NULL), along with 1081 to add an operator between the pair (here non 1082 appended onto the end of the arg pair (here ' 1083 1084 There's also a dynevent_str_add() function th 1085 add a string as-is, with no spaces, delimiter 1086 1087 Any number of dynevent_*_add() calls can be m 1088 (until its length surpasses cmd->maxlen). Wh 1089 been added and the command string is complete 1090 do is run the command, which happens by simpl 1091 dynevent_create():: 1092 1093 ret = dynevent_create(&cmd); 1094 1095 At that point, if the return value is 0, the 1096 created and is ready to use. 1097 1098 See the dynevent_cmd function definitions the 1099 of the API.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.