~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/accounting/taskstats.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/accounting/taskstats.rst (Architecture i386) and /Documentation/accounting/taskstats.rst (Architecture sparc64)


  1 =============================                       1 =============================
  2 Per-task statistics interface                       2 Per-task statistics interface
  3 =============================                       3 =============================
  4                                                     4 
  5                                                     5 
  6 Taskstats is a netlink-based interface for sen      6 Taskstats is a netlink-based interface for sending per-task and
  7 per-process statistics from the kernel to user      7 per-process statistics from the kernel to userspace.
  8                                                     8 
  9 Taskstats was designed for the following benef      9 Taskstats was designed for the following benefits:
 10                                                    10 
 11 - efficiently provide statistics during lifeti     11 - efficiently provide statistics during lifetime of a task and on its exit
 12 - unified interface for multiple accounting su     12 - unified interface for multiple accounting subsystems
 13 - extensibility for use by future accounting p     13 - extensibility for use by future accounting patches
 14                                                    14 
 15 Terminology                                        15 Terminology
 16 -----------                                        16 -----------
 17                                                    17 
 18 "pid", "tid" and "task" are used interchangeab     18 "pid", "tid" and "task" are used interchangeably and refer to the standard
 19 Linux task defined by struct task_struct.  per     19 Linux task defined by struct task_struct.  per-pid stats are the same as
 20 per-task stats.                                    20 per-task stats.
 21                                                    21 
 22 "tgid", "process" and "thread group" are used      22 "tgid", "process" and "thread group" are used interchangeably and refer to the
 23 tasks that share an mm_struct i.e. the traditi     23 tasks that share an mm_struct i.e. the traditional Unix process. Despite the
 24 use of tgid, there is no special treatment for     24 use of tgid, there is no special treatment for the task that is thread group
 25 leader - a process is deemed alive as long as      25 leader - a process is deemed alive as long as it has any task belonging to it.
 26                                                    26 
 27 Usage                                              27 Usage
 28 -----                                              28 -----
 29                                                    29 
 30 To get statistics during a task's lifetime, us     30 To get statistics during a task's lifetime, userspace opens a unicast netlink
 31 socket (NETLINK_GENERIC family) and sends comm     31 socket (NETLINK_GENERIC family) and sends commands specifying a pid or a tgid.
 32 The response contains statistics for a task (i     32 The response contains statistics for a task (if pid is specified) or the sum of
 33 statistics for all tasks of the process (if tg     33 statistics for all tasks of the process (if tgid is specified).
 34                                                    34 
 35 To obtain statistics for tasks which are exiti     35 To obtain statistics for tasks which are exiting, the userspace listener
 36 sends a register command and specifies a cpuma     36 sends a register command and specifies a cpumask. Whenever a task exits on
 37 one of the cpus in the cpumask, its per-pid st     37 one of the cpus in the cpumask, its per-pid statistics are sent to the
 38 registered listener. Using cpumasks allows the     38 registered listener. Using cpumasks allows the data received by one listener
 39 to be limited and assists in flow control over     39 to be limited and assists in flow control over the netlink interface and is
 40 explained in more detail below.                    40 explained in more detail below.
 41                                                    41 
 42 If the exiting task is the last thread exiting     42 If the exiting task is the last thread exiting its thread group,
 43 an additional record containing the per-tgid s     43 an additional record containing the per-tgid stats is also sent to userspace.
 44 The latter contains the sum of per-pid stats f     44 The latter contains the sum of per-pid stats for all threads in the thread
 45 group, both past and present.                      45 group, both past and present.
 46                                                    46 
 47 getdelays.c is a simple utility demonstrating      47 getdelays.c is a simple utility demonstrating usage of the taskstats interface
 48 for reporting delay accounting statistics. Use     48 for reporting delay accounting statistics. Users can register cpumasks,
 49 send commands and process responses, listen fo     49 send commands and process responses, listen for per-tid/tgid exit data,
 50 write the data received to a file and do basic     50 write the data received to a file and do basic flow control by increasing
 51 receive buffer sizes.                              51 receive buffer sizes.
 52                                                    52 
 53 Interface                                          53 Interface
 54 ---------                                          54 ---------
 55                                                    55 
 56 The user-kernel interface is encapsulated in i     56 The user-kernel interface is encapsulated in include/linux/taskstats.h
 57                                                    57 
 58 To avoid this documentation becoming obsolete      58 To avoid this documentation becoming obsolete as the interface evolves, only
 59 an outline of the current version is given. ta     59 an outline of the current version is given. taskstats.h always overrides the
 60 description here.                                  60 description here.
 61                                                    61 
 62 struct taskstats is the common accounting stru     62 struct taskstats is the common accounting structure for both per-pid and
 63 per-tgid data. It is versioned and can be exte     63 per-tgid data. It is versioned and can be extended by each accounting subsystem
 64 that is added to the kernel. The fields and th     64 that is added to the kernel. The fields and their semantics are defined in the
 65 taskstats.h file.                                  65 taskstats.h file.
 66                                                    66 
 67 The data exchanged between user and kernel spa     67 The data exchanged between user and kernel space is a netlink message belonging
 68 to the NETLINK_GENERIC family and using the ne     68 to the NETLINK_GENERIC family and using the netlink attributes interface.
 69 The messages are in the format::                   69 The messages are in the format::
 70                                                    70 
 71     +----------+- - -+-------------+----------     71     +----------+- - -+-------------+-------------------+
 72     | nlmsghdr | Pad |  genlmsghdr | taskstats     72     | nlmsghdr | Pad |  genlmsghdr | taskstats payload |
 73     +----------+- - -+-------------+----------     73     +----------+- - -+-------------+-------------------+
 74                                                    74 
 75                                                    75 
 76 The taskstats payload is one of the following      76 The taskstats payload is one of the following three kinds:
 77                                                    77 
 78 1. Commands: Sent from user to kernel. Command     78 1. Commands: Sent from user to kernel. Commands to get data on
 79 a pid/tgid consist of one attribute, of type T     79 a pid/tgid consist of one attribute, of type TASKSTATS_CMD_ATTR_PID/TGID,
 80 containing a u32 pid or tgid in the attribute      80 containing a u32 pid or tgid in the attribute payload. The pid/tgid denotes
 81 the task/process for which userspace wants sta     81 the task/process for which userspace wants statistics.
 82                                                    82 
 83 Commands to register/deregister interest in ex     83 Commands to register/deregister interest in exit data from a set of cpus
 84 consist of one attribute, of type                  84 consist of one attribute, of type
 85 TASKSTATS_CMD_ATTR_REGISTER/DEREGISTER_CPUMASK     85 TASKSTATS_CMD_ATTR_REGISTER/DEREGISTER_CPUMASK and contain a cpumask in the
 86 attribute payload. The cpumask is specified as     86 attribute payload. The cpumask is specified as an ascii string of
 87 comma-separated cpu ranges e.g. to listen to e     87 comma-separated cpu ranges e.g. to listen to exit data from cpus 1,2,3,5,7,8
 88 the cpumask would be "1-3,5,7-8". If userspace     88 the cpumask would be "1-3,5,7-8". If userspace forgets to deregister interest
 89 in cpus before closing the listening socket, t     89 in cpus before closing the listening socket, the kernel cleans up its interest
 90 set over time. However, for the sake of effici     90 set over time. However, for the sake of efficiency, an explicit deregistration
 91 is advisable.                                      91 is advisable.
 92                                                    92 
 93 2. Response for a command: sent from the kerne     93 2. Response for a command: sent from the kernel in response to a userspace
 94 command. The payload is a series of three attr     94 command. The payload is a series of three attributes of type:
 95                                                    95 
 96 a) TASKSTATS_TYPE_AGGR_PID/TGID : attribute co     96 a) TASKSTATS_TYPE_AGGR_PID/TGID : attribute containing no payload but indicates
 97 a pid/tgid will be followed by some stats.         97 a pid/tgid will be followed by some stats.
 98                                                    98 
 99 b) TASKSTATS_TYPE_PID/TGID: attribute whose pa     99 b) TASKSTATS_TYPE_PID/TGID: attribute whose payload is the pid/tgid whose stats
100 are being returned.                               100 are being returned.
101                                                   101 
102 c) TASKSTATS_TYPE_STATS: attribute with a stru    102 c) TASKSTATS_TYPE_STATS: attribute with a struct taskstats as payload. The
103 same structure is used for both per-pid and pe    103 same structure is used for both per-pid and per-tgid stats.
104                                                   104 
105 3. New message sent by kernel whenever a task     105 3. New message sent by kernel whenever a task exits. The payload consists of a
106    series of attributes of the following type:    106    series of attributes of the following type:
107                                                   107 
108 a) TASKSTATS_TYPE_AGGR_PID: indicates next two    108 a) TASKSTATS_TYPE_AGGR_PID: indicates next two attributes will be pid+stats
109 b) TASKSTATS_TYPE_PID: contains exiting task's    109 b) TASKSTATS_TYPE_PID: contains exiting task's pid
110 c) TASKSTATS_TYPE_STATS: contains the exiting     110 c) TASKSTATS_TYPE_STATS: contains the exiting task's per-pid stats
111 d) TASKSTATS_TYPE_AGGR_TGID: indicates next tw    111 d) TASKSTATS_TYPE_AGGR_TGID: indicates next two attributes will be tgid+stats
112 e) TASKSTATS_TYPE_TGID: contains tgid of proce    112 e) TASKSTATS_TYPE_TGID: contains tgid of process to which task belongs
113 f) TASKSTATS_TYPE_STATS: contains the per-tgid    113 f) TASKSTATS_TYPE_STATS: contains the per-tgid stats for exiting task's process
114                                                   114 
115                                                   115 
116 per-tgid stats                                    116 per-tgid stats
117 --------------                                    117 --------------
118                                                   118 
119 Taskstats provides per-process stats, in addit    119 Taskstats provides per-process stats, in addition to per-task stats, since
120 resource management is often done at a process    120 resource management is often done at a process granularity and aggregating task
121 stats in userspace alone is inefficient and po    121 stats in userspace alone is inefficient and potentially inaccurate (due to lack
122 of atomicity).                                    122 of atomicity).
123                                                   123 
124 However, maintaining per-process, in addition     124 However, maintaining per-process, in addition to per-task stats, within the
125 kernel has space and time overheads. To addres    125 kernel has space and time overheads. To address this, the taskstats code
126 accumulates each exiting task's statistics int    126 accumulates each exiting task's statistics into a process-wide data structure.
127 When the last task of a process exits, the pro    127 When the last task of a process exits, the process level data accumulated also
128 gets sent to userspace (along with the per-tas    128 gets sent to userspace (along with the per-task data).
129                                                   129 
130 When a user queries to get per-tgid data, the     130 When a user queries to get per-tgid data, the sum of all other live threads in
131 the group is added up and added to the accumul    131 the group is added up and added to the accumulated total for previously exited
132 threads of the same thread group.                 132 threads of the same thread group.
133                                                   133 
134 Extending taskstats                               134 Extending taskstats
135 -------------------                               135 -------------------
136                                                   136 
137 There are two ways to extend the taskstats int    137 There are two ways to extend the taskstats interface to export more
138 per-task/process stats as patches to collect t    138 per-task/process stats as patches to collect them get added to the kernel
139 in future:                                        139 in future:
140                                                   140 
141 1. Adding more fields to the end of the existi    141 1. Adding more fields to the end of the existing struct taskstats. Backward
142    compatibility is ensured by the version num    142    compatibility is ensured by the version number within the
143    structure. Userspace will use only the fiel    143    structure. Userspace will use only the fields of the struct that correspond
144    to the version its using.                      144    to the version its using.
145                                                   145 
146 2. Defining separate statistic structs and usi    146 2. Defining separate statistic structs and using the netlink attributes
147    interface to return them. Since userspace p    147    interface to return them. Since userspace processes each netlink attribute
148    independently, it can always ignore attribu    148    independently, it can always ignore attributes whose type it does not
149    understand (because it is using an older ve    149    understand (because it is using an older version of the interface).
150                                                   150 
151                                                   151 
152 Choosing between 1. and 2. is a matter of trad    152 Choosing between 1. and 2. is a matter of trading off flexibility and
153 overhead. If only a few fields need to be adde    153 overhead. If only a few fields need to be added, then 1. is the preferable
154 path since the kernel and userspace don't need    154 path since the kernel and userspace don't need to incur the overhead of
155 processing new netlink attributes. But if the     155 processing new netlink attributes. But if the new fields expand the existing
156 struct too much, requiring disparate userspace    156 struct too much, requiring disparate userspace accounting utilities to
157 unnecessarily receive large structures whose f    157 unnecessarily receive large structures whose fields are of no interest, then
158 extending the attributes structure would be wo    158 extending the attributes structure would be worthwhile.
159                                                   159 
160 Flow control for taskstats                        160 Flow control for taskstats
161 --------------------------                        161 --------------------------
162                                                   162 
163 When the rate of task exits becomes large, a l    163 When the rate of task exits becomes large, a listener may not be able to keep
164 up with the kernel's rate of sending per-tid/t    164 up with the kernel's rate of sending per-tid/tgid exit data leading to data
165 loss. This possibility gets compounded when th    165 loss. This possibility gets compounded when the taskstats structure gets
166 extended and the number of cpus grows large.      166 extended and the number of cpus grows large.
167                                                   167 
168 To avoid losing statistics, userspace should d    168 To avoid losing statistics, userspace should do one or more of the following:
169                                                   169 
170 - increase the receive buffer sizes for the ne    170 - increase the receive buffer sizes for the netlink sockets opened by
171   listeners to receive exit data.                 171   listeners to receive exit data.
172                                                   172 
173 - create more listeners and reduce the number     173 - create more listeners and reduce the number of cpus being listened to by
174   each listener. In the extreme case, there co    174   each listener. In the extreme case, there could be one listener for each cpu.
175   Users may also consider setting the cpu affi    175   Users may also consider setting the cpu affinity of the listener to the subset
176   of cpus to which it listens, especially if t    176   of cpus to which it listens, especially if they are listening to just one cpu.
177                                                   177 
178 Despite these measures, if the userspace recei    178 Despite these measures, if the userspace receives ENOBUFS error messages
179 indicated overflow of receive buffers, it shou    179 indicated overflow of receive buffers, it should take measures to handle the
180 loss of data.                                     180 loss of data.
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php