1 NVMe Fault Injection 1 NVMe Fault Injection 2 ==================== 2 ==================== 3 Linux's fault injection framework provides a s 3 Linux's fault injection framework provides a systematic way to support 4 error injection via debugfs in the /sys/kernel 4 error injection via debugfs in the /sys/kernel/debug directory. When 5 enabled, the default NVME_SC_INVALID_OPCODE wi 5 enabled, the default NVME_SC_INVALID_OPCODE with no retry will be 6 injected into the nvme_try_complete_req. Users 6 injected into the nvme_try_complete_req. Users can change the default status 7 code and no retry flag via the debugfs. The li 7 code and no retry flag via the debugfs. The list of Generic Command 8 Status can be found in include/linux/nvme.h 8 Status can be found in include/linux/nvme.h 9 9 10 Following examples show how to inject an error 10 Following examples show how to inject an error into the nvme. 11 11 12 First, enable CONFIG_FAULT_INJECTION_DEBUG_FS 12 First, enable CONFIG_FAULT_INJECTION_DEBUG_FS kernel config, 13 recompile the kernel. After booting up the ker 13 recompile the kernel. After booting up the kernel, do the 14 following. 14 following. 15 15 16 Example 1: Inject default status code with no 16 Example 1: Inject default status code with no retry 17 ---------------------------------------------- 17 --------------------------------------------------- 18 18 19 :: 19 :: 20 20 21 mount /dev/nvme0n1 /mnt 21 mount /dev/nvme0n1 /mnt 22 echo 1 > /sys/kernel/debug/nvme0n1/fault_inj 22 echo 1 > /sys/kernel/debug/nvme0n1/fault_inject/times 23 echo 100 > /sys/kernel/debug/nvme0n1/fault_i 23 echo 100 > /sys/kernel/debug/nvme0n1/fault_inject/probability 24 cp a.file /mnt 24 cp a.file /mnt 25 25 26 Expected Result:: 26 Expected Result:: 27 27 28 cp: cannot stat ‘/mnt/a.file’: Input/out 28 cp: cannot stat ‘/mnt/a.file’: Input/output error 29 29 30 Message from dmesg:: 30 Message from dmesg:: 31 31 32 FAULT_INJECTION: forcing a failure. 32 FAULT_INJECTION: forcing a failure. 33 name fault_inject, interval 1, probability 1 33 name fault_inject, interval 1, probability 100, space 0, times 1 34 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4. 34 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc8+ #2 35 Hardware name: innotek GmbH VirtualBox/Virtu 35 Hardware name: innotek GmbH VirtualBox/VirtualBox, 36 BIOS VirtualBox 12/01/2006 36 BIOS VirtualBox 12/01/2006 37 Call Trace: 37 Call Trace: 38 <IRQ> 38 <IRQ> 39 dump_stack+0x5c/0x7d 39 dump_stack+0x5c/0x7d 40 should_fail+0x148/0x170 40 should_fail+0x148/0x170 41 nvme_should_fail+0x2f/0x50 [nvme_core] 41 nvme_should_fail+0x2f/0x50 [nvme_core] 42 nvme_process_cq+0xe7/0x1d0 [nvme] 42 nvme_process_cq+0xe7/0x1d0 [nvme] 43 nvme_irq+0x1e/0x40 [nvme] 43 nvme_irq+0x1e/0x40 [nvme] 44 __handle_irq_event_percpu+0x3a/0x190 44 __handle_irq_event_percpu+0x3a/0x190 45 handle_irq_event_percpu+0x30/0x70 45 handle_irq_event_percpu+0x30/0x70 46 handle_irq_event+0x36/0x60 46 handle_irq_event+0x36/0x60 47 handle_fasteoi_irq+0x78/0x120 47 handle_fasteoi_irq+0x78/0x120 48 handle_irq+0xa7/0x130 48 handle_irq+0xa7/0x130 49 ? tick_irq_enter+0xa8/0xc0 49 ? tick_irq_enter+0xa8/0xc0 50 do_IRQ+0x43/0xc0 50 do_IRQ+0x43/0xc0 51 common_interrupt+0xa2/0xa2 51 common_interrupt+0xa2/0xa2 52 </IRQ> 52 </IRQ> 53 RIP: 0010:native_safe_halt+0x2/0x10 53 RIP: 0010:native_safe_halt+0x2/0x10 54 RSP: 0018:ffffffff82003e90 EFLAGS: 00000246 54 RSP: 0018:ffffffff82003e90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd 55 RAX: ffffffff817a10c0 RBX: ffffffff82012480 55 RAX: ffffffff817a10c0 RBX: ffffffff82012480 RCX: 0000000000000000 56 RDX: 0000000000000000 RSI: 0000000000000000 56 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 57 RBP: 0000000000000000 R08: 000000008e38ce64 57 RBP: 0000000000000000 R08: 000000008e38ce64 R09: 0000000000000000 58 R10: 0000000000000000 R11: 0000000000000000 58 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff82012480 59 R13: ffffffff82012480 R14: 0000000000000000 59 R13: ffffffff82012480 R14: 0000000000000000 R15: 0000000000000000 60 ? __sched_text_end+0x4/0x4 60 ? __sched_text_end+0x4/0x4 61 default_idle+0x18/0xf0 61 default_idle+0x18/0xf0 62 do_idle+0x150/0x1d0 62 do_idle+0x150/0x1d0 63 cpu_startup_entry+0x6f/0x80 63 cpu_startup_entry+0x6f/0x80 64 start_kernel+0x4c4/0x4e4 64 start_kernel+0x4c4/0x4e4 65 ? set_init_arg+0x55/0x55 65 ? set_init_arg+0x55/0x55 66 secondary_startup_64+0xa5/0xb0 66 secondary_startup_64+0xa5/0xb0 67 print_req_error: I/O error, dev nvme0n1, s 67 print_req_error: I/O error, dev nvme0n1, sector 9240 68 EXT4-fs error (device nvme0n1): ext4_find_en 68 EXT4-fs error (device nvme0n1): ext4_find_entry:1436: 69 inode #2: comm cp: reading directory lblock 69 inode #2: comm cp: reading directory lblock 0 70 70 71 Example 2: Inject default status code with ret 71 Example 2: Inject default status code with retry 72 ---------------------------------------------- 72 ------------------------------------------------ 73 73 74 :: 74 :: 75 75 76 mount /dev/nvme0n1 /mnt 76 mount /dev/nvme0n1 /mnt 77 echo 1 > /sys/kernel/debug/nvme0n1/fault_inj 77 echo 1 > /sys/kernel/debug/nvme0n1/fault_inject/times 78 echo 100 > /sys/kernel/debug/nvme0n1/fault_i 78 echo 100 > /sys/kernel/debug/nvme0n1/fault_inject/probability 79 echo 1 > /sys/kernel/debug/nvme0n1/fault_inj 79 echo 1 > /sys/kernel/debug/nvme0n1/fault_inject/status 80 echo 0 > /sys/kernel/debug/nvme0n1/fault_inj 80 echo 0 > /sys/kernel/debug/nvme0n1/fault_inject/dont_retry 81 81 82 cp a.file /mnt 82 cp a.file /mnt 83 83 84 Expected Result:: 84 Expected Result:: 85 85 86 command success without error 86 command success without error 87 87 88 Message from dmesg:: 88 Message from dmesg:: 89 89 90 FAULT_INJECTION: forcing a failure. 90 FAULT_INJECTION: forcing a failure. 91 name fault_inject, interval 1, probability 1 91 name fault_inject, interval 1, probability 100, space 0, times 1 92 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4. 92 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.15.0-rc8+ #4 93 Hardware name: innotek GmbH VirtualBox/Virtu 93 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 94 Call Trace: 94 Call Trace: 95 <IRQ> 95 <IRQ> 96 dump_stack+0x5c/0x7d 96 dump_stack+0x5c/0x7d 97 should_fail+0x148/0x170 97 should_fail+0x148/0x170 98 nvme_should_fail+0x30/0x60 [nvme_core] 98 nvme_should_fail+0x30/0x60 [nvme_core] 99 nvme_loop_queue_response+0x84/0x110 [nvme_ 99 nvme_loop_queue_response+0x84/0x110 [nvme_loop] 100 nvmet_req_complete+0x11/0x40 [nvmet] 100 nvmet_req_complete+0x11/0x40 [nvmet] 101 nvmet_bio_done+0x28/0x40 [nvmet] 101 nvmet_bio_done+0x28/0x40 [nvmet] 102 blk_update_request+0xb0/0x310 102 blk_update_request+0xb0/0x310 103 blk_mq_end_request+0x18/0x60 103 blk_mq_end_request+0x18/0x60 104 flush_smp_call_function_queue+0x3d/0xf0 104 flush_smp_call_function_queue+0x3d/0xf0 105 smp_call_function_single_interrupt+0x2c/0x 105 smp_call_function_single_interrupt+0x2c/0xc0 106 call_function_single_interrupt+0xa2/0xb0 106 call_function_single_interrupt+0xa2/0xb0 107 </IRQ> 107 </IRQ> 108 RIP: 0010:native_safe_halt+0x2/0x10 108 RIP: 0010:native_safe_halt+0x2/0x10 109 RSP: 0018:ffffc9000068bec0 EFLAGS: 00000246 109 RSP: 0018:ffffc9000068bec0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04 110 RAX: ffffffff817a10c0 RBX: ffff88011a3c9680 110 RAX: ffffffff817a10c0 RBX: ffff88011a3c9680 RCX: 0000000000000000 111 RDX: 0000000000000000 RSI: 0000000000000000 111 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 112 RBP: 0000000000000001 R08: 000000008e38c131 112 RBP: 0000000000000001 R08: 000000008e38c131 R09: 0000000000000000 113 R10: 0000000000000000 R11: 0000000000000000 113 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88011a3c9680 114 R13: ffff88011a3c9680 R14: 0000000000000000 114 R13: ffff88011a3c9680 R14: 0000000000000000 R15: 0000000000000000 115 ? __sched_text_end+0x4/0x4 115 ? __sched_text_end+0x4/0x4 116 default_idle+0x18/0xf0 116 default_idle+0x18/0xf0 117 do_idle+0x150/0x1d0 117 do_idle+0x150/0x1d0 118 cpu_startup_entry+0x6f/0x80 118 cpu_startup_entry+0x6f/0x80 119 start_secondary+0x187/0x1e0 119 start_secondary+0x187/0x1e0 120 secondary_startup_64+0xa5/0xb0 120 secondary_startup_64+0xa5/0xb0 121 121 122 Example 3: Inject an error into the 10th admin 122 Example 3: Inject an error into the 10th admin command 123 ---------------------------------------------- 123 ------------------------------------------------------ 124 124 125 :: 125 :: 126 126 127 echo 100 > /sys/kernel/debug/nvme0/fault_inj 127 echo 100 > /sys/kernel/debug/nvme0/fault_inject/probability 128 echo 10 > /sys/kernel/debug/nvme0/fault_inje 128 echo 10 > /sys/kernel/debug/nvme0/fault_inject/space 129 echo 1 > /sys/kernel/debug/nvme0/fault_injec 129 echo 1 > /sys/kernel/debug/nvme0/fault_inject/times 130 nvme reset /dev/nvme0 130 nvme reset /dev/nvme0 131 131 132 Expected Result:: 132 Expected Result:: 133 133 134 After NVMe controller reset, the reinitializ 134 After NVMe controller reset, the reinitialization may or may not succeed. 135 It depends on which admin command is actuall 135 It depends on which admin command is actually forced to fail. 136 136 137 Message from dmesg:: 137 Message from dmesg:: 138 138 139 nvme nvme0: resetting controller 139 nvme nvme0: resetting controller 140 FAULT_INJECTION: forcing a failure. 140 FAULT_INJECTION: forcing a failure. 141 name fault_inject, interval 1, probability 1 141 name fault_inject, interval 1, probability 100, space 1, times 1 142 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5. 142 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0-rc2+ #2 143 Hardware name: MSI MS-7A45/B150M MORTAR ARCT 143 Hardware name: MSI MS-7A45/B150M MORTAR ARCTIC (MS-7A45), BIOS 1.50 04/25/2017 144 Call Trace: 144 Call Trace: 145 <IRQ> 145 <IRQ> 146 dump_stack+0x63/0x85 146 dump_stack+0x63/0x85 147 should_fail+0x14a/0x170 147 should_fail+0x14a/0x170 148 nvme_should_fail+0x38/0x80 [nvme_core] 148 nvme_should_fail+0x38/0x80 [nvme_core] 149 nvme_irq+0x129/0x280 [nvme] 149 nvme_irq+0x129/0x280 [nvme] 150 ? blk_mq_end_request+0xb3/0x120 150 ? blk_mq_end_request+0xb3/0x120 151 __handle_irq_event_percpu+0x84/0x1a0 151 __handle_irq_event_percpu+0x84/0x1a0 152 handle_irq_event_percpu+0x32/0x80 152 handle_irq_event_percpu+0x32/0x80 153 handle_irq_event+0x3b/0x60 153 handle_irq_event+0x3b/0x60 154 handle_edge_irq+0x7f/0x1a0 154 handle_edge_irq+0x7f/0x1a0 155 handle_irq+0x20/0x30 155 handle_irq+0x20/0x30 156 do_IRQ+0x4e/0xe0 156 do_IRQ+0x4e/0xe0 157 common_interrupt+0xf/0xf 157 common_interrupt+0xf/0xf 158 </IRQ> 158 </IRQ> 159 RIP: 0010:cpuidle_enter_state+0xc5/0x460 159 RIP: 0010:cpuidle_enter_state+0xc5/0x460 160 Code: ff e8 8f 5f 86 ff 80 7d c7 00 74 17 9c 160 Code: ff e8 8f 5f 86 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 69 03 00 00 31 ff e8 62 aa 8c ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 37 03 00 00 4c 8b 45 d0 4c 2b 45 b8 48 ba cf f7 53 161 RSP: 0018:ffffffff88c03dd0 EFLAGS: 00000246 161 RSP: 0018:ffffffff88c03dd0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdc 162 RAX: ffff9dac25a2ac80 RBX: ffffffff88d53760 162 RAX: ffff9dac25a2ac80 RBX: ffffffff88d53760 RCX: 000000000000001f 163 RDX: 0000000000000000 RSI: 000000002d958403 163 RDX: 0000000000000000 RSI: 000000002d958403 RDI: 0000000000000000 164 RBP: ffffffff88c03e18 R08: fffffff75e35ffb7 164 RBP: ffffffff88c03e18 R08: fffffff75e35ffb7 R09: 00000a49a56c0b48 165 R10: ffffffff88c03da0 R11: 0000000000001b0c 165 R10: ffffffff88c03da0 R11: 0000000000001b0c R12: ffff9dac25a34d00 166 R13: 0000000000000006 R14: 0000000000000006 166 R13: 0000000000000006 R14: 0000000000000006 R15: ffffffff88d53760 167 cpuidle_enter+0x2e/0x40 167 cpuidle_enter+0x2e/0x40 168 call_cpuidle+0x23/0x40 168 call_cpuidle+0x23/0x40 169 do_idle+0x201/0x280 169 do_idle+0x201/0x280 170 cpu_startup_entry+0x1d/0x20 170 cpu_startup_entry+0x1d/0x20 171 rest_init+0xaa/0xb0 171 rest_init+0xaa/0xb0 172 arch_call_rest_init+0xe/0x1b 172 arch_call_rest_init+0xe/0x1b 173 start_kernel+0x51c/0x53b 173 start_kernel+0x51c/0x53b 174 x86_64_start_reservations+0x24/0x26 174 x86_64_start_reservations+0x24/0x26 175 x86_64_start_kernel+0x74/0x77 175 x86_64_start_kernel+0x74/0x77 176 secondary_startup_64+0xa4/0xb0 176 secondary_startup_64+0xa4/0xb0 177 nvme nvme0: Could not set queue count (16385 177 nvme nvme0: Could not set queue count (16385) 178 nvme nvme0: IO queues not created 178 nvme nvme0: IO queues not created
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.