1 ====================== 2 Firmware-Assisted Dump 3 ====================== 4 5 July 2011 6 7 The goal of firmware-assisted dump is to enabl 8 a crashed system, and to do so from a fully-re 9 to minimize the total elapsed time until the s 10 in production use. 11 12 - Firmware-Assisted Dump (FADump) infrastructu 13 the existing phyp assisted dump. 14 - Fadump uses the same firmware interfaces and 15 as phyp assisted dump. 16 - Unlike phyp dump, FADump exports the memory 17 in the ELF format in the same way as kdump. 18 kdump infrastructure for dump capture and fi 19 - Unlike phyp dump, userspace tool does not ne 20 interface while reading /proc/vmcore. 21 - Unlike phyp dump, FADump allows user to rele 22 for dump, with a single operation of echo 1 23 - Once enabled through kernel boot parameter, 24 started/stopped through /sys/kernel/fadump_r 25 sysfs files section below) and can be easily 26 service start/stop init scripts. 27 28 Comparing with kdump or other strategies, firm 29 dump offers several strong, practical advantag 30 31 - Unlike kdump, the system has been reset, an 32 with a fresh copy of the kernel. In partic 33 PCI and I/O devices have been reinitialized 34 in a clean, consistent state. 35 - Once the dump is copied out, the memory tha 36 is immediately available to the running ker 37 unlike kdump, FADump doesn't need a 2nd reb 38 the system to the production configuration. 39 40 The above can only be accomplished by coordina 41 and assistance from the Power firmware. The pr 42 as follows: 43 44 - The first kernel registers the sections of 45 Power firmware for dump preservation during 46 These registered sections of memory are res 47 kernel during early boot. 48 49 - When system crashes, the Power firmware wil 50 low memory regions (boot memory) from sourc 51 It will also save hardware PTE's. 52 53 NOTE: 54 The term 'boot memory' means size of 55 that is required for a kernel to boot 56 booted with restricted memory. By def 57 size will be the larger of 5% of syst 58 Alternatively, user can also specify 59 through boot parameter 'crashkernel=' 60 the default calculated size. Use this 61 boot memory size is not sufficient fo 62 boot successfully. For syntax of cras 63 refer to Documentation/admin-guide/kd 64 offset is provided in crashkernel= pa 65 ignored as FADump uses a predefined o 66 for boot memory dump preservation in 67 68 - After the low memory (boot memory) area has 69 firmware will reset PCI and other hardware 70 *not* clear the RAM. It will then launch th 71 normal. 72 73 - The freshly booted kernel will notice that 74 (rtas/ibm,kernel-dump on pSeries or ibm,opa 75 on OPAL platform) in the device tree, indic 76 there is crash data available from a previo 77 the early boot OS will reserve rest of the 78 boot memory size effectively booting with r 79 size. This will make sure that this kernel 80 to as second kernel or capture kernel) will 81 of the dump memory area. 82 83 - User-space tools will read /proc/vmcore to 84 of memory, which holds the previous crashed 85 format. The userspace tools may copy this i 86 network, nas, san, iscsi, etc. as desired. 87 88 - Once the userspace tool is done saving dump 89 '1' to /sys/kernel/fadump_release_mem to re 90 memory back to general use, except the memo 91 next firmware-assisted dump registration. 92 93 e.g.:: 94 95 # echo 1 > /sys/kernel/fadump_release_mem 96 97 Please note that the firmware-assisted dump fe 98 is only available on POWER6 and above systems 99 (PowerVM) platform and POWER9 and above system 100 or later firmware versions on PowerNV (OPAL) p 101 Note that, OPAL firmware exports ibm,opal/dump 102 FADump is supported on PowerNV platform. 103 104 On OPAL based machines, system first boots int 105 kernel (referred to as petitboot kernel) befor 106 capture kernel. This kernel would have minimal 107 userspace support to process crash data. Such 108 preserve previously crash'ed kernel's memory f 109 capture kernel boot to process this crash data 110 option CONFIG_PRESERVE_FA_DUMP has to be enabl 111 to ensure that crash data is preserved to proc 112 113 -- On OPAL based machines (PowerNV), if the ke 114 CONFIG_OPAL_CORE=y, OPAL memory at the time 115 exported as /sys/firmware/opal/mpipl/core f 116 helpful in debugging OPAL crashes with GDB. 117 used for exporting this procfs file can be 118 '1' to /sys/firmware/opal/mpipl/release_cor 119 120 e.g. 121 # echo 1 > /sys/firmware/opal/mpipl/relea 122 123 Implementation details: 124 ----------------------- 125 126 During boot, a check is made to see if firmwar 127 this feature on that particular machine. If it 128 we check to see if an active dump is waiting f 129 then everything but boot memory size of RAM is 130 early boot (See Fig. 2). This area is released 131 collecting the dump from user land scripts (e. 132 that are run. If there is dump data, then the 133 /sys/kernel/fadump_release_mem file is created 134 memory is held. 135 136 If there is no waiting dump data, then only th 137 hold CPU state, HPTE region, boot memory dump, 138 usually reserved at an offset greater than boo 139 This area is *not* released: this region will 140 reserved, so that it can act as a receptacle f 141 memory content in addition to CPU state and HP 142 a crash does occur. 143 144 Since this reserved memory area is used only a 145 there is no point in blocking this significant 146 production kernel. Hence, the implementation u 147 Contiguous Memory Allocator (CMA) for memory r 148 configured for kernel. With CMA reservation th 149 available for applications to use it, while ke 150 using it. With this FADump will still be able 151 kernel memory and most of the user space memor 152 that were present in CMA region:: 153 154 o Memory Reservation during first kernel 155 156 Low memory 157 0 boot memory size |<------ Reserved du 158 | | | Permanent Res 159 V V | 160 +-----------+-----/ /---+---+----+---------- 161 | | |///|////| DUMP 162 +-----------+-----/ /---+---+----+---------- 163 | ^ ^ ^ 164 | | | | 165 \ CPU HPTE / 166 -------------------------------- 167 Boot memory content gets transferred 168 to reserved area by firmware at the 169 time of crash. 170 FAD 171 (m 172 173 174 Metadata: This area hold 175 address is registered wi 176 second kernel after cras 177 tags (OPAL). Having such 178 to process the crashdump 179 180 Fig. 1 181 182 183 o Memory Reservation during second kernel af 184 185 Low memory 186 0 boot memory size 187 | |<------------ Crash preserved a 188 V V |<--- Reserved dump 189 +----+---+--+-----/ /---+---+----+-------+-- 190 | |ELF| | |///|////| DUMP | H 191 +----+---+--+-----/ /---+---+----+-------+-- 192 | | | | 193 ----- ------------------------------ 194 \ | 195 \ | 196 \ | 197 \ | ----------------- 198 \ | / 199 \ | / 200 \ | / 201 /proc/vmcore 202 203 204 +---+ 205 |///| -> Regions (CPU, HPTE & Metadata 206 +---+ figures are not always presen 207 does not have CPU & HPTE regi 208 not supported on pSeries curr 209 210 +---+ 211 |ELF| -> elfcorehdr, it is created in 212 +---+ 213 214 Note: Memory from 0 to the boot memory 215 216 Fig. 2 217 218 219 Currently the dump will be copied from /proc/v 220 user intervention. The dump data available thr 221 in ELF format. Hence the existing kdump infras 222 to save the dump works fine with minor modific 223 major Distro releases have already been modifi 224 user intervention in saving the dump) when FAD 225 KDump, as dump mechanism. 226 227 The tools to examine the dump will be same as 228 used for kdump. 229 230 How to enable firmware-assisted dump (FADump): 231 ---------------------------------------------- 232 233 1. Set config option CONFIG_FA_DUMP=y and buil 234 2. Boot into linux kernel with 'fadump=on' ker 235 By default, FADump reserved memory will be 236 Alternatively, user can boot linux kernel w 237 prevent FADump to use CMA. 238 3. Optionally, user can also set 'crashkernel= 239 to specify size of the memory to reserve fo 240 preservation. 241 242 NOTE: 243 1. 'fadump_reserve_mem=' parameter has be 244 use 'crashkernel=' to specify size of 245 for boot memory dump preservation. 246 2. If firmware-assisted dump fails to res 247 will fallback to existing kdump mechan 248 option is set at kernel cmdline. 249 3. if user wants to capture all of user s 250 reserved memory not available to produ 251 'fadump=nocma' kernel parameter can be 252 old behaviour. 253 254 Sysfs/debugfs files: 255 -------------------- 256 257 Firmware-assisted dump feature uses sysfs file 258 the control files and debugfs file to display 259 260 Here is the list of files under kernel sysfs: 261 262 /sys/kernel/fadump_enabled 263 This is used to display the FADump status. 264 265 - 0 = FADump is disabled 266 - 1 = FADump is enabled 267 268 This interface can be used by kdump init s 269 FADump is enabled in the kernel and act ac 270 271 /sys/kernel/fadump_registered 272 This is used to display the FADump registr 273 as to control (start/stop) the FADump regi 274 275 - 0 = FADump is not registered. 276 - 1 = FADump is registered and ready to ha 277 278 To register FADump echo 1 > /sys/kernel/fa 279 echo 0 > /sys/kernel/fadump_registered for 280 FADump. Once the FADump is un-registered, 281 be handled and vmcore will not be captured 282 easily integrated with kdump service start 283 284 /sys/kernel/fadump/mem_reserved 285 286 This is used to display the memory reserved 287 crash dump. 288 289 /sys/kernel/fadump_release_mem 290 This file is available only when FADump is 291 second kernel. This is used to release the 292 region that are held for saving crash dump 293 reserved memory echo 1 to it:: 294 295 echo 1 > /sys/kernel/fadump_release_m 296 297 After echo 1, the content of the /sys/kern 298 file will change to reflect the new memory 299 300 The existing userspace tools (kdump infras 301 enhanced to use this interface to release 302 dump and continue without 2nd reboot. 303 304 Note: /sys/kernel/fadump_release_opalcore sysf 305 /sys/firmware/opal/mpipl/release_core 306 307 /sys/firmware/opal/mpipl/release_core 308 309 This file is available only on OPAL based 310 active during capture kernel. This is used 311 used by the kernel to export /sys/firmware 312 release this memory, echo '1' to it: 313 314 echo 1 > /sys/firmware/opal/mpipl/release 315 316 Note: The following FADump sysfs files are dep 317 318 +----------------------------------+---------- 319 | Deprecated | Alternati 320 +----------------------------------+---------- 321 | /sys/kernel/fadump_enabled | /sys/kern 322 +----------------------------------+---------- 323 | /sys/kernel/fadump_registered | /sys/kern 324 +----------------------------------+---------- 325 | /sys/kernel/fadump_release_mem | /sys/kern 326 +----------------------------------+---------- 327 328 Here is the list of files under powerpc debugf 329 (Assuming debugfs is mounted on /sys/kernel/de 330 331 /sys/kernel/debug/powerpc/fadump_region 332 This file shows the reserved memory region 333 enabled otherwise this file is empty. The 334 is:: 335 336 <region>: [<start>-<end>] <reserved-size 337 338 and for kernel DUMP region is: 339 340 DUMP: Src: <src-addr>, Dest: <dest-addr>, 341 342 e.g. 343 Contents when FADump is registered during 344 345 # cat /sys/kernel/debug/powerpc/fadump_r 346 CPU : [0x0000006ffb0000-0x0000006fff001f 347 HPTE: [0x0000006fff0020-0x0000006fff101f 348 DUMP: [0x0000006fff1020-0x0000007fff101f 349 350 Contents when FADump is active during seco 351 352 # cat /sys/kernel/debug/powerpc/fadump_r 353 CPU : [0x0000006ffb0000-0x0000006fff001f 354 HPTE: [0x0000006fff0020-0x0000006fff101f 355 DUMP: [0x0000006fff1020-0x0000007fff101f 356 : [0x00000010000000-0x0000006ffaffff 357 358 359 NOTE: 360 Please refer to Documentation/filesystem 361 how to mount the debugfs filesystem. 362 363 364 TODO: 365 ----- 366 - Need to come up with the better approach to 367 accurate boot memory size that is required 368 boot successfully when booted with restrict 369 370 Author: Mahesh Salgaonkar <mahesh@linux.vnet.ib 371 372 This document is based on the original documen 373 374 assisted dump by Linas Vepstas and Manish Ahuj
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.