1 .. SPDX-License-Identifier: GPL-2.0 2 3 ======= 4 SCSI EH 5 ======= 6 7 This document describes SCSI midlayer error ha 8 Please refer to Documentation/scsi/scsi_mid_lo 9 information regarding SCSI midlayer. 10 11 .. TABLE OF CONTENTS 12 13 [1] How SCSI commands travel through the mi 14 [1-1] struct scsi_cmnd 15 [1-2] How do scmd's get completed? 16 [1-2-1] Completing a scmd w/ scsi_done 17 [1-2-2] Completing a scmd w/ timeout 18 [1-3] How EH takes over 19 [2] How SCSI EH works 20 [2-1] EH through fine-grained callbacks 21 [2-1-1] Overview 22 [2-1-2] Flow of scmds through EH 23 [2-1-3] Flow of control 24 [2-2] EH through transportt->eh_strateg 25 [2-2-1] Pre transportt->eh_strategy_ha 26 [2-2-2] Post transportt->eh_strategy_h 27 [2-2-3] Things to consider 28 29 30 1. How SCSI commands travel through the midlay 31 ============================================== 32 33 1.1 struct scsi_cmnd 34 -------------------- 35 36 Each SCSI command is represented with struct s 37 scmd has two list_head's to link itself into l 38 scmd->list and scmd->eh_entry. The former is 39 per-device allocated scmd list and not of much 40 discussion. The latter is used for completion 41 otherwise stated scmds are always linked using 42 discussion. 43 44 45 1.2 How do scmd's get completed? 46 -------------------------------- 47 48 Once LLDD gets hold of a scmd, either the LLDD 49 command by calling scsi_done callback passed f 50 invoking hostt->queuecommand() or the block la 51 52 53 1.2.1 Completing a scmd w/ scsi_done 54 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 55 56 For all non-EH commands, scsi_done() is the co 57 just calls blk_complete_request() to delete th 58 raise SCSI_SOFTIRQ 59 60 SCSI_SOFTIRQ handler scsi_softirq calls scsi_d 61 determine what to do with the command. scsi_d 62 looks at the scmd->result value and sense data 63 with the command. 64 65 - SUCCESS 66 67 scsi_finish_command() is invoked for t 68 function does some maintenance chores 69 scsi_io_completion() to finish the I/O 70 scsi_io_completion() then notifies the 71 the completed request by calling blk_e 72 friends or figures out what to do with 73 of the data in case of an error. 74 75 - NEEDS_RETRY 76 77 - ADD_TO_MLQUEUE 78 79 scmd is requeued to blk queue. 80 81 - otherwise 82 83 scsi_eh_scmd_add(scmd) is invoked for 84 [1-3] for details of this function. 85 86 87 1.2.2 Completing a scmd w/ timeout 88 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 89 90 The timeout handler is scsi_timeout(). When a 91 92 1. invokes optional hostt->eh_timed_out() cal 93 be one of 94 95 - SCSI_EH_RESET_TIMER 96 This indicates that more time is requi 97 command. Timer is restarted. 98 99 - SCSI_EH_NOT_HANDLED 100 eh_timed_out() callback did not handle 101 Step #2 is taken. 102 103 - SCSI_EH_DONE 104 eh_timed_out() completed the command. 105 106 2. scsi_abort_command() is invoked to schedul 107 issue a retry scmd->allowed + 1 times. As 108 for commands for which the SCSI_EH_ABORT_S 109 indicates that the command already had bee 110 retry which failed), when retries are exce 111 expired. In these cases Step #3 is taken. 112 113 3. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) 114 command. See [1-4] for more information. 115 116 1.3 Asynchronous command aborts 117 ------------------------------- 118 119 After a timeout occurs a command abort is sch 120 scsi_abort_command(). If the abort is success 121 will either be retried (if the number of retr 122 or terminated with DID_TIME_OUT. 123 124 Otherwise scsi_eh_scmd_add() is invoked for t 125 See [1-4] for more information. 126 127 1.4 How EH takes over 128 --------------------- 129 130 scmds enter EH via scsi_eh_scmd_add(), which d 131 132 1. Links scmd->eh_entry to shost->eh_cmd_q 133 134 2. Sets SHOST_RECOVERY bit in shost->shost_st 135 136 3. Increments shost->host_failed 137 138 4. Wakes up SCSI EH thread if shost->host_bus 139 140 As can be seen above, once any scmd is added t 141 SHOST_RECOVERY shost_state bit is turned on. 142 scmd to be issued from blk queue to the host; 143 the host either complete normally, fail and ge 144 time out and get added to shost->eh_cmd_q. 145 146 If all scmds either complete or fail, the numb 147 becomes equal to the number of failed scmds - 148 shost->host_failed. This wakes up SCSI EH thr 149 SCSI EH thread can expect that all in-flight c 150 are linked on shost->eh_cmd_q. 151 152 Note that this does not mean lower layers are 153 completed a scmd with error status, the LLDD a 154 assumed to forget about the scmd at that point 155 has timed out, unless hostt->eh_timed_out() ma 156 about the scmd, which currently no LLDD does, 157 active as long as lower layers are concerned a 158 occur at any time. Of course, all such comple 159 timer has already expired. 160 161 We'll talk about how SCSI EH takes actions to 162 forget about - timed out scmds later. 163 164 165 2. How SCSI EH works 166 ==================== 167 168 LLDD's can implement SCSI EH actions in one of 169 ways. 170 171 - Fine-grained EH callbacks 172 LLDD can implement fine-grained EH cal 173 midlayer drive error handling and call 174 This will be discussed further in [2-1 175 176 - eh_strategy_handler() callback 177 This is one big callback which should 178 handling. As such, it should do all c 179 performs during recovery. This will b 180 181 Once recovery is complete, SCSI EH resumes nor 182 calling scsi_restart_operations(), which 183 184 1. Checks if door locking is needed and locks 185 186 2. Clears SHOST_RECOVERY shost_state bit 187 188 3. Wakes up waiters on shost->host_wait. Thi 189 calls scsi_block_when_processing_errors() 190 (*QUESTION* why is it needed? All operati 191 anyway after it reaches blk queue.) 192 193 4. Kicks queues in all devices on the host in 194 195 196 2.1 EH through fine-grained callbacks 197 ------------------------------------- 198 199 2.1.1 Overview 200 ^^^^^^^^^^^^^^ 201 202 If eh_strategy_handler() is not present, SCSI 203 of driving error handling. EH's goals are two 204 device forget about timed out scmds and make t 205 commands. A scmd is said to be recovered if t 206 lower layers and lower layers are ready to pro 207 again. 208 209 To achieve these goals, EH performs recovery a 210 severity. Some actions are performed by issui 211 others are performed by invoking one of the fo 212 hostt EH callbacks. Callbacks may be omitted 213 considered to fail always. 214 215 :: 216 217 int (* eh_abort_handler)(struct scsi_cmnd 218 int (* eh_device_reset_handler)(struct scs 219 int (* eh_bus_reset_handler)(struct scsi_c 220 int (* eh_host_reset_handler)(struct scsi_ 221 222 Higher-severity actions are taken only when lo 223 cannot recover some of failed scmds. Also, no 224 highest-severity action means EH failure and r 225 all unrecovered devices. 226 227 During recovery, the following rules are follo 228 229 - Recovery actions are performed on failed sc 230 eh_work_q. If a recovery action succeeds f 231 scmds are removed from eh_work_q. 232 233 Note that single recovery action on a scmd 234 scmds. e.g. resetting a device recovers al 235 device. 236 237 - Higher severity actions are taken iff eh_wo 238 lower severity actions are complete. 239 240 - EH reuses failed scmds to issue commands fo 241 timed-out scmds, SCSI EH ensures that LLDD 242 before reusing it for EH commands. 243 244 When a scmd is recovered, the scmd is moved fr 245 local eh_done_q using scsi_eh_finish_cmd(). A 246 recovered (eh_work_q is empty), scsi_eh_flush_ 247 either retry or error-finish (notify upper lay 248 scmds. 249 250 scmds are retried iff its sdev is still online 251 EH), REQ_FAILFAST is not set and ++scmd->retri 252 scmd->allowed. 253 254 255 2.1.2 Flow of scmds through EH 256 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 257 258 1. Error completion / time out 259 260 :ACTION: scsi_eh_scmd_add() is invoked for 261 262 - add scmd to shost->eh_cmd_q 263 - set SHOST_RECOVERY 264 - shost->host_failed++ 265 266 :LOCKING: shost->host_lock 267 268 2. EH starts 269 270 :ACTION: move all scmds to EH's local eh_w 271 is cleared. 272 273 :LOCKING: shost->host_lock (not strictly n 274 consistency) 275 276 3. scmd recovered 277 278 :ACTION: scsi_eh_finish_cmd() is invoked t 279 280 - scsi_setup_cmd_retry() 281 - move from local eh_work_q to local e 282 283 :LOCKING: none 284 285 :CONCURRENCY: at most one thread per separ 286 keep queue manipulation lock 287 288 4. EH completes 289 290 :ACTION: scsi_eh_flush_done_q() retries sc 291 layer of failure. May be called c 292 a no more than one thread per sep 293 manipulate the queue locklessly 294 295 - scmd is removed from eh_done_q 296 - if retry is necessary, scmd is 297 scsi_queue_insert() 298 - otherwise, scsi_finish_command( 299 - zero shost->host_failed 300 301 :LOCKING: queue or finish function perform 302 303 304 2.1.3 Flow of control 305 ^^^^^^^^^^^^^^^^^^^^^^ 306 307 EH through fine-grained callbacks start from 308 309 ``scsi_unjam_host`` 310 311 1. Lock shost->host_lock, splice_init shos 312 eh_work_q and unlock host_lock. Note t 313 cleared by this action. 314 315 2. Invoke scsi_eh_get_sense. 316 317 ``scsi_eh_get_sense`` 318 319 This action is taken for each error-co 320 (!SCSI_EH_CANCEL_CMD) commands without 321 SCSI transports/LLDDs automatically ac 322 command failures (autosense). Autosen 323 performance reasons and as sense infor 324 sync between occurrence of CHECK CONDI 325 326 Note that if autosense is not supporte 327 contains invalid sense data when error 328 with scsi_done(). scsi_decide_disposi 329 FAILED in such cases thus invoking SCS 330 reaches here, sense data is acquired a 331 scsi_decide_disposition() is called ag 332 333 1. Invoke scsi_request_sense() which i 334 command. If fails, no action. Not 335 causes higher-severity recovery to 336 337 2. Invoke scsi_decide_disposition() on 338 339 - SUCCESS 340 scmd->retries is set to scmd-> 341 scsi_eh_flush_done_q() from re 342 scsi_eh_finish_cmd() is invoke 343 344 - NEEDS_RETRY 345 scsi_eh_finish_cmd() invoked 346 347 - otherwise 348 No action. 349 350 3. If !list_empty(&eh_work_q), invoke scsi 351 352 ``scsi_eh_abort_cmds`` 353 354 This action is taken for each timed ou 355 no_async_abort is enabled in the host 356 hostt->eh_abort_handler() is invoked f 357 handler returns SUCCESS if it has succ 358 all related hardware forget about the 359 360 If a timedout scmd is successfully abo 361 either offline or ready, scsi_eh_finis 362 the scmd. Otherwise, the scmd is left 363 higher-severity actions. 364 365 Note that both offline and ready statu 366 ready to process new scmds, where proc 367 immediate failing; thus, if a sdev is 368 states, no further recovery action is 369 370 Device readiness is tested using scsi_ 371 TEST_UNIT_READY command. Note that th 372 aborted successfully before reusing it 373 374 4. If !list_empty(&eh_work_q), invoke scsi 375 376 ``scsi_eh_ready_devs`` 377 378 This function takes four increasingly 379 make failed sdevs ready for new comman 380 381 1. Invoke scsi_eh_stu() 382 383 ``scsi_eh_stu`` 384 385 For each sdev which has failed scm 386 of which scsi_check_sense()'s verd 387 START_STOP_UNIT command is issued 388 as we explicitly choose error-comp 389 that lower layers have forgotten a 390 reuse it for STU. 391 392 If STU succeeds and the sdev is ei 393 all failed scmds on the sdev are E 394 scsi_eh_finish_cmd(). 395 396 *NOTE* If hostt->eh_abort_handler( 397 failed, we may still have timed ou 398 and STU doesn't make lower layers 399 scmds. Yet, this function EH-fini 400 if STU succeeds leaving lower laye 401 state. It seems that STU action s 402 a sdev has no timed out scmd. 403 404 2. If !list_empty(&eh_work_q), invoke 405 406 ``scsi_eh_bus_device_reset`` 407 408 This action is very similar to scs 409 instead of issuing STU, hostt->eh_ 410 is used. Also, as we're not issui 411 resetting clears all scmds on the 412 to choose error-completed scmds. 413 414 3. If !list_empty(&eh_work_q), invoke 415 416 ``scsi_eh_bus_reset`` 417 418 hostt->eh_bus_reset_handler() is i 419 with failed scmds. If bus reset s 420 scmds on all ready or offline sdev 421 EH-finished. 422 423 4. If !list_empty(&eh_work_q), invoke 424 425 ``scsi_eh_host_reset`` 426 427 This is the last resort. hostt->e 428 is invoked. If host reset succeed 429 all ready or offline sdevs on the 430 431 5. If !list_empty(&eh_work_q), invoke 432 433 ``scsi_eh_offline_sdevs`` 434 435 Take all sdevs which still have un 436 and EH-finish the scmds. 437 438 5. Invoke scsi_eh_flush_done_q(). 439 440 ``scsi_eh_flush_done_q`` 441 442 At this point all scmds are recove 443 put on eh_done_q by scsi_eh_finish 444 flushes eh_done_q by either retryi 445 layer of failure of the scmds. 446 447 448 2.2 EH through transportt->eh_strategy_handler 449 ---------------------------------------------- 450 451 transportt->eh_strategy_handler() is invoked i 452 scsi_unjam_host() and it is responsible for wh 453 On completion, the handler should have made lo 454 all failed scmds and either ready for new comm 455 it should perform SCSI EH maintenance chores t 456 SCSI midlayer. IOW, of the steps described in 457 except for #1 must be implemented by eh_strate 458 459 460 2.2.1 Pre transportt->eh_strategy_handler() SC 461 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 462 463 The following conditions are true on entry to 464 465 - Each failed scmd's eh_flags field is set ap 466 467 - Each failed scmd is linked on scmd->eh_cmd_ 468 469 - SHOST_RECOVERY is set. 470 471 - shost->host_failed == shost->host_busy 472 473 474 2.2.2 Post transportt->eh_strategy_handler() S 475 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 476 477 The following conditions must be true on exit 478 479 - shost->host_failed is zero. 480 481 - Each scmd is in such a state that scsi_setu 482 scmd doesn't make any difference. 483 484 - shost->eh_cmd_q is cleared. 485 486 - Each scmd->eh_entry is cleared. 487 488 - Either scsi_queue_insert() or scsi_finish_c 489 each scmd. Note that the handler is free t 490 ->allowed to limit the number of retries. 491 492 493 2.2.3 Things to consider 494 ^^^^^^^^^^^^^^^^^^^^^^^^ 495 496 - Know that timed out scmds are still active 497 lower layers forget about them before doing 498 those scmds. 499 500 - For consistency, when accessing/modifying s 501 grab shost->host_lock. 502 503 - On completion, each failed sdev must have f 504 active scmds. 505 506 - On completion, each failed sdev must be rea 507 offline. 508 509 510 Tejun Heo 511 htejun@gmail.com 512 513 11th September 2005
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.