1 =================== 2 Block IO Controller 3 =================== 4 5 Overview 6 ======== 7 cgroup subsys "blkio" implements the block io 8 a need of various kinds of IO control policies 9 both at leaf nodes as well as at intermediate 10 Plan is to use the same cgroup based managemen 11 and based on user options switch IO policies i 12 13 One IO control policy is throttling policy whi 14 specify upper IO rate limits on devices. This 15 generic block layer and can be used on leaf no 16 level logical devices like device mapper. 17 18 HOWTO 19 ===== 20 21 Throttling/Upper Limit policy 22 ----------------------------- 23 Enable Block IO controller:: 24 25 CONFIG_BLK_CGROUP=y 26 27 Enable throttling in block layer:: 28 29 CONFIG_BLK_DEV_THROTTLING=y 30 31 Mount blkio controller (see cgroups.txt, Why a 32 33 mount -t cgroup -o blkio none /sys/fs/ 34 35 Specify a bandwidth rate on particular device 36 for policy is "<major>:<minor> <bytes_per_sec 37 38 echo "8:16 1048576" > /sys/fs/cgroup/ 39 40 This will put a limit of 1MB/second on reads h 41 on device having major/minor number 8:16. 42 43 Run dd to read a file and see if rate is throt 44 45 # dd iflag=direct if=/mnt/common/zerof 46 1024+0 records in 47 1024+0 records out 48 4194304 bytes (4.2 MB) copied, 4.0001 49 50 Limits for writes can be put using blkio.throt 51 52 Hierarchical Cgroups 53 ==================== 54 55 Throttling implements hierarchy support; howev 56 throttling's hierarchy support is enabled iff 57 enabled from cgroup side, which currently is a 58 not publicly available. 59 60 If somebody created a hierarchy like as follow 61 62 root 63 / \ 64 test1 test2 65 | 66 test3 67 68 Throttling with "sane_behavior" will handle th 69 hierarchy correctly. For throttling, all limit 70 to the whole subtree while all statistics are 71 directly generated by tasks in that cgroup. 72 73 Throttling without "sane_behavior" enabled fro 74 practically treat all groups at same level as 75 following:: 76 77 pivot 78 / / \ \ 79 root test1 test2 tes 80 81 Various user visible config options 82 =================================== 83 84 CONFIG_BLK_CGROUP 85 Block IO controller. 86 87 CONFIG_BFQ_CGROUP_DEBUG 88 Debug help. Right now some additiona 89 if this option is enabled. 90 91 CONFIG_BLK_DEV_THROTTLING 92 Enable block device throttling suppo 93 94 Details of cgroup files 95 ======================= 96 97 Proportional weight policy files 98 -------------------------------- 99 100 blkio.bfq.weight 101 Specifies per cgroup weight. This is 102 on all the devices until and unless 103 (see `blkio.bfq.weight_device` below 104 105 Currently allowed range of weights i 106 see Documentation/block/bfq-iosched. 107 108 blkio.bfq.weight_device 109 Specifies per cgroup per device weig 110 weight. For more details, see Docume 111 112 Following is the format:: 113 114 # echo dev_maj:dev_minor weight > 115 116 Configure weight=300 on /dev/sdb (8: 117 118 # echo 8:16 300 > blkio.bfq.weight 119 # cat blkio.bfq.weight_device 120 dev weight 121 8:16 300 122 123 Configure weight=500 on /dev/sda (8: 124 125 # echo 8:0 500 > blkio.bfq.weight_ 126 # cat blkio.bfq.weight_device 127 dev weight 128 8:0 500 129 8:16 300 130 131 Remove specific weight for /dev/sda 132 133 # echo 8:0 0 > blkio.bfq.weight_de 134 # cat blkio.bfq.weight_device 135 dev weight 136 8:16 300 137 138 blkio.time 139 Disk time allocated to cgroup per de 140 two fields specify the major and min 141 third field specifies the disk time 142 milliseconds. 143 144 blkio.sectors 145 Number of sectors transferred to/fro 146 two fields specify the major and min 147 third field specifies the number of 148 group to/from the device. 149 150 blkio.io_service_bytes 151 Number of bytes transferred to/from 152 are further divided by the type of o 153 or async. First two fields specify t 154 device, third field specifies the op 155 specifies the number of bytes. 156 157 blkio.io_serviced 158 Number of IOs (bio) issued to the di 159 are further divided by the type of o 160 or async. First two fields specify t 161 device, third field specifies the op 162 specifies the number of IOs. 163 164 blkio.io_service_time 165 Total amount of time between request 166 for the IOs done by this cgroup. Thi 167 meaningful for flash devices too. Fo 168 this time represents the actual serv 169 that is no longer true as requests m 170 may cause the service time for a giv 171 of multiple IOs when served out of o 172 io_service_time > actual time elapse 173 the type of operation - read or writ 174 specify the major and minor number o 175 specifies the operation type and the 176 io_service_time in ns. 177 178 blkio.io_wait_time 179 Total amount of time the IOs for thi 180 scheduler queues for service. This c 181 elapsed since it is cumulative io_wa 182 measure of total time the cgroup spe 183 the wait_time for its individual IOs 184 this metric does not include the tim 185 the IO is dispatched to the device b 186 (there might be a time lag here due 187 device). This is in nanoseconds to m 188 devices too. This time is further di 189 read or write, sync or async. First 190 minor number of the device, third fi 191 and the fourth field specifies the i 192 193 blkio.io_merged 194 Total number of bios/requests merged 195 cgroup. This is further divided by t 196 write, sync or async. 197 198 blkio.io_queued 199 Total number of requests queued up a 200 cgroup. This is further divided by t 201 write, sync or async. 202 203 blkio.avg_queue_size 204 Debugging aid only enabled if CONFIG 205 The average queue size for this cgro 206 cgroup's existence. Queue size sampl 207 queues of this cgroup gets a timesli 208 209 blkio.group_wait_time 210 Debugging aid only enabled if CONFIG 211 This is the amount of time the cgrou 212 (i.e., went from 0 to 1 request queu 213 its queues. This is different from t 214 cumulative total of the amount of ti 215 waiting in the scheduler queue. This 216 read when the cgroup is in a waiting 217 will only report the group_wait_time 218 got a timeslice and will not include 219 220 blkio.empty_time 221 Debugging aid only enabled if CONFIG 222 This is the amount of time a cgroup 223 requests when not being served, i.e. 224 spent idling for one of the queues o 225 nanoseconds. If this is read when th 226 the stat will only report the empty_ 227 time it had a pending request and wi 228 229 blkio.idle_time 230 Debugging aid only enabled if CONFIG 231 This is the amount of time spent by 232 given cgroup in anticipation of a be 233 from other queues/cgroups. This is i 234 when the cgroup is in an idling stat 235 idle_time accumulated till the last 236 the current delta. 237 238 blkio.dequeue 239 Debugging aid only enabled if CONFIG 240 gives the statistics about how many 241 from service tree of the device. Fir 242 and minor number of the device and t 243 of times a group was dequeued from a 244 245 blkio.*_recursive 246 Recursive version of various stats. 247 same information as their non-recurs 248 include stats from all the descendan 249 250 Throttling/Upper limit policy files 251 ----------------------------------- 252 blkio.throttle.read_bps_device 253 Specifies upper limit on READ rate f 254 specified in bytes per second. Rules 255 the format:: 256 257 echo "<major>:<minor> <rate_bytes 258 259 blkio.throttle.write_bps_device 260 Specifies upper limit on WRITE rate 261 specified in bytes per second. Rules 262 the format:: 263 264 echo "<major>:<minor> <rate_bytes 265 266 blkio.throttle.read_iops_device 267 Specifies upper limit on READ rate f 268 specified in IO per second. Rules ar 269 the format:: 270 271 echo "<major>:<minor> <rate_io_per 272 273 blkio.throttle.write_iops_device 274 Specifies upper limit on WRITE rate 275 specified in io per second. Rules ar 276 the format:: 277 278 echo "<major>:<minor> <rate_io_pe 279 280 Note: If both BW and IOPS rules are 281 subjected to both the constraints. 282 283 blkio.throttle.io_serviced 284 Number of IOs (bio) issued to the di 285 are further divided by the type of o 286 or async. First two fields specify t 287 device, third field specifies the op 288 specifies the number of IOs. 289 290 blkio.throttle.io_service_bytes 291 Number of bytes transferred to/from 292 are further divided by the type of o 293 or async. First two fields specify t 294 device, third field specifies the op 295 specifies the number of bytes. 296 297 Common files among various policies 298 ----------------------------------- 299 blkio.reset_stats 300 Writing an int to this file will res 301 for that cgroup.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.