Kernel Crash Dumps

This article explains how to capture the kernel crash dumps (also known as kdumps). Kdumps are produced by kernel panic or lockup. To be simple, just a single kernel is used both for the ordinary system and recovery. The described method is almost distribution independent.

NoteCONFIG_PHYSICAL_START might need to be set greater than 2 MB (0x200000) on some motherboards to offset the kernel's memory space enough to avoid the BIOS clobber. Try setting 0x1000000 (16 MB) if the above Kernel options are not working as expected.

Note the kernel has to be readable. A typical Gentoo configuration leaves /boot unmounted, so either remove noauto from the fstab file or place a copy of the kernel in a place that is mounted during a crash.

Bootloader

Add the crashkernel=64M nokaslr argument to the kernel command-line via the bootloader (most likely GRUB2) for systems with up to around 12 GB of RAM.

Notenokaslr disables KASLR security feature. You can omit this option, but then you will have to manually load symbols from all kernel sections in gdb because kernel location is randomized.

Usage

First, run the above script:

root #/etc/local.d/kdump.start

It loads the rescue kernel image which is run after kernel crash.

Whenever a kernel panic or lockup (hard/soft if the kernel is set to detect them) occurs, kexec runs the kernel in crash mode, relocated to a reserved area of memory. The rest of RAM will be untouched. When the system boots up log in and copy /proc/vmcore to a file - this is the crash dump. Then reboot the system to get back to a normal configuration; the system might not be stable and should not continue to operate in this state.

A kernel panic can be forced on demand by executing the following command (do not forget to save all data, log-out other users, and leave the filesystems in a clean state by the invocation of the sync command before doing this):

root #echo c | tee /proc/sysrq-trigger

Troubleshooting

Kernel is not loading

If the kernel is not loading when kexec is called, check to to see if kernel compression was set to xz (lzma) format.

If xz compression is used the sys-apps/kexec-tools package will need to be re-emerged with the lzma USE flag enabled.

VGA not resetting

After loading a kexec crash kernel and after a kernel panic kexec does not appear to load the crash kernel. The output on the display freezes.

This might be caused by the VGA port not being reset. The solution may be to tell kexec to reset the display output on the VGA port. Something like the following could work (the important options being --reset-vga --console-vga):