1 ======== 2 hwpoison 3 ======== 4 5 What is hwpoison? 6 ================= 7 8 Upcoming Intel CPUs have support for recoverin 9 (``MCA recovery``). This requires the OS to de 10 kill the processes associated with it and avoi 11 12 This patchkit implements the necessary infrast 13 14 To quote the overview comment:: 15 16 High level machine check handler. Hand 17 hardware as being corrupted usually du 18 failure. 19 20 This focusses on pages detected as cor 21 When the current CPU tries to consume 22 running process can just be killed dir 23 that if the error cannot be handled fo 24 just ignore it because no corruption h 25 when that happens another machine chec 26 27 Handles page cache pages in various st 28 here is that we can access any page as 29 users, because memory failures could h 30 possibly violating some of their assum 31 has to be extremely careful. Generally 32 rules, as in get the standard locks, e 33 error handling takes potentially a lon 34 35 Some of the operations here are somewh 36 linear algorithmic complexity, because 37 been optimized for this case. This is 38 for the mapping from a vma to a proces 39 to be rare we hope we can get away wit 40 41 The code consists of a the high level handler 42 a new page poison bit and various checks in th 43 pages. 44 45 The main target right now is KVM guests, but i 46 of applications. KVM support requires a recent 47 48 For the KVM use there was need for a new signa 49 KVM can inject the machine check into the gues 50 address. This in theory allows other applicati 51 memory failures too. The expectation is that m 52 won't do that, but some very specialized ones 53 54 Failure recovery modes 55 ====================== 56 57 There are two (actually three) modes memory fa 58 59 vm.memory_failure_recovery sysctl set to zero: 60 All memory failures cause a panic. Do 61 62 early kill 63 (can be controlled globally and per pr 64 Send SIGBUS to the application as soon 65 This allows applications who can proce 66 way (e.g. drop affected object) 67 This is the mode used by KVM qemu. 68 69 late kill 70 Send SIGBUS when the application runs 71 This is best for memory error unaware 72 Note some pages are always handled as 73 74 User control 75 ============ 76 77 vm.memory_failure_recovery 78 See sysctl.txt 79 80 vm.memory_failure_early_kill 81 Enable early kill mode globally 82 83 PR_MCE_KILL 84 Set early/late kill mode/revert to sys 85 86 arg1: PR_MCE_KILL_CLEAR: 87 Revert to system default 88 arg1: PR_MCE_KILL_SET: 89 arg2 defines thread specific m 90 91 PR_MCE_KILL_EARLY: 92 Early kill 93 PR_MCE_KILL_LATE: 94 Late kill 95 PR_MCE_KILL_DEFAULT 96 Use system global defa 97 98 Note that if you want to have a dedica 99 the SIGBUS(BUS_MCEERR_AO) on behalf of 100 call prctl(PR_MCE_KILL_EARLY) on the d 101 the SIGBUS is sent to the main thread. 102 103 PR_MCE_KILL_GET 104 return current mode 105 106 Testing 107 ======= 108 109 * madvise(MADV_HWPOISON, ....) (as root) - Poi 110 process for testing 111 112 * hwpoison-inject module through debugfs ``/sy 113 114 corrupt-pfn 115 Inject hwpoison fault at PFN echoed in 116 some early filtering to avoid corrupte 117 118 unpoison-pfn 119 Software-unpoison page at PFN echoed i 120 a page can be reused again. This only 121 injected failures, not for real memory 122 memory failure happens, this feature i 123 124 Note these injection interfaces are not stab 125 kernel versions 126 127 corrupt-filter-dev-major, corrupt-filter-dev 128 Only handle memory failures to pages a 129 system defined by block device major/m 130 wildcard value. This should be only u 131 artificial injection. 132 133 corrupt-filter-memcg 134 Limit injection to pages owned by memg 135 number of the memcg. 136 137 Example:: 138 139 mkdir /sys/fs/cgroup/mem/hwpoi 140 141 usemem -m 100 -s 1000 & 142 echo `jobs -p` > /sys/fs/cgrou 143 144 memcg_ino=$(ls -id /sys/fs/cgr 145 echo $memcg_ino > /debug/hwpoi 146 147 page-types -p `pidof init` - 148 page-types -p `pidof usemem` - 149 150 corrupt-filter-flags-mask, corrupt-filter-fl 151 When specified, only poison pages if ( 152 value). This allows stress testing of 153 pages. The page_flags are the same as 154 flag bits are defined in include/linux 155 documented in Documentation/admin-guid 156 157 * Architecture specific MCE injector 158 159 x86 has mce-inject, mce-test 160 161 Some portable hwpoison test programs in mce- 162 163 References 164 ========== 165 166 http://halobates.de/mce-lc09-2.pdf 167 Overview presentation from LinuxCon 09 168 169 git://git.kernel.org/pub/scm/utils/cpu/mce/mce 170 Test suite (hwpoison specific portable 171 172 git://git.kernel.org/pub/scm/utils/cpu/mce/mce 173 x86 specific injector 174 175 176 Limitations 177 =========== 178 - Not all page types are supported and never w 179 objects cannot be recovered, only LRU pages 180 181 --- 182 Andi Kleen, Oct 2009
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.