1 ============================================== 2 Softlockup detector and hardlockup detector (a 3 ============================================== 4 5 The Linux kernel can act as a watchdog to dete 6 lockups. 7 8 A 'softlockup' is defined as a bug that causes 9 kernel mode for more than 20 seconds (see "Imp 10 details), without giving other tasks a chance 11 stack trace is displayed upon detection and, b 12 will stay locked up. Alternatively, the kernel 13 panic; a sysctl, "kernel.softlockup_panic", a 14 "softlockup_panic" (see "Documentation/admin-g 15 details), and a compile option, "BOOTPARAM_SOF 16 provided for this. 17 18 A 'hardlockup' is defined as a bug that causes 19 kernel mode for more than 10 seconds (see "Imp 20 details), without letting other interrupts hav 21 Similarly to the softlockup case, the current 22 upon detection and the system will stay locked 23 behavior is changed, which can be done through 24 'hardlockup_panic', a compile time knob, "BOOT 25 and a kernel parameter, "nmi_watchdog" 26 (see "Documentation/admin-guide/kernel-paramet 27 28 The panic option can be used in combination wi 29 timeout is set through the confusingly named " 30 to cause the system to reboot automatically af 31 of time. 32 33 Implementation 34 ============== 35 36 The soft and hard lockup detectors are built o 37 perf subsystems, respectively. A direct conseq 38 in principle, they should work in any architec 39 subsystems are present. 40 41 A periodic hrtimer runs to generate interrupts 42 job. An NMI perf event is generated every "wat 43 (compile-time initialized to 10 and configurab 44 same name) seconds to check for hardlockups. I 45 does not receive any hrtimer interrupt during 46 'hardlockup detector' (the handler for the NMI 47 generate a kernel warning or call panic, depen 48 configuration. 49 50 The watchdog job runs in a stop scheduling thr 51 timestamp every time it is scheduled. If that 52 for 2*watchdog_thresh seconds (the softlockup 53 'softlockup detector' (coded inside the hrtime 54 will dump useful debug information to the syst 55 will call panic if it was instructed to do so 56 other kernel code. 57 58 The period of the hrtimer is 2*watchdog_thresh 59 two or three chances to generate an interrupt 60 detector kicks in. 61 62 As explained above, a kernel knob is provided 63 administrators to configure the period of the 64 event. The right value for a particular enviro 65 between fast response to lockups and detection 66 67 By default, the watchdog runs on all online co 68 kernel configured with NO_HZ_FULL, by default 69 on the housekeeping cores, not the cores speci 70 boot argument. If we allowed the watchdog to 71 the "nohz_full" cores, we would have to run ti 72 the scheduler, which would prevent the "nohz_f 73 from protecting the user code on those cores f 74 Of course, disabling it by default on the nohz 75 when those cores do enter the kernel, by defau 76 able to detect if they lock up. However, allo 77 to continue to run on the housekeeping (non-ti 78 that we will continue to detect lockups proper 79 80 In either case, the set of cores excluded from 81 may be adjusted via the kernel.watchdog_cpumas 82 nohz_full cores, this may be useful for debugg 83 kernel seems to be hanging on the nohz_full co
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.