Gracefully recovering from kernel panics: boot robustness & more

GRUB has two very important features that you can use to make your system more robust in the event of a kernel panic or other boot error: saved default boot entries and fallbacks.

The fallback command is extremely handy when installing and testing new kernels. Without fallbacks, testing a new kernel can be a big hassle. The new kernel may panic upon boot, so one would need to attach a KVM over IP to the server so that if the new kernel panics, it is possible to force the older [functioning] kernel to boot.

GRUB's fallback feature works around this problem by allowing you to specify a series of boot entries that will be attempted sequentially. Fallbacks become even more useful when combined with the saved default feature, which allows GRUB to store the default boot entry. Upon booting any kernel, the stored default boot target can be changed to the next fallback kernel. The workflow is like so:

GRUB reads the stored default boot target

GRUB sets the default boot target to the next designated fallback entry, then boots the kernel

If the kernel panics, reboot the machine and go to step 1.

Notice that in each iteration the stored boot target changes and we are certain that at least one kernel version is functional (since the machine is already running at this point) so we can have GRUB attempt to boot the most recent kernel, and should it fail, work its way towards the known working version until a successful boot completes.

So, how do we get this setup? First, let's create the grub-set-default script that will be used to update the default boot target:

Next, edit /etc/grub.conf and replace the default=N parameter with default saved. This will cause GRUB to boot the saved default entry which can be specified by running the grub-set-default command.

Next, GRUB needs to know which entries to boot in case of a failure. In most cases you will want this to be the second entry (n=1), so add on a new line after default saved:

fallback=1

Should you wish for GRUB to attempt a third entry in case the second also fails, you can specify instead:

fallback=1 2

In this case, just be sure that your system actually has 3 boot entries, ie 3 kernels installed simultaneously. Be warned though, GRUB will probably not like it if you reference a boot entry that has not yet been defined.

The key step to this setup is to have GRUB save the next fallback kernel as the default when booting an entry. This can be done by finding the line that looks like

kernel /path/to/vmlinuz various_boot_options

and adding directly underneath it on a new line:

savedefault fallback

On the final fallback boot entry, do not add fallback at the end and simply use savedefault instead.

The last thing is to inform GRUB of the default boot target:

grub-set-default 0

That's it! For your reference, here is a sample configuration using the saved default + fallback method with 3 kernels:

In this sample, the system would be currently running kernel version 2.6.18-164.9.1.el5 and the new kernel 2.6.18-164.10.1.el5 would need testing. Prior to rebooting, the system administrator would call grub-set-default 0 to make GRUB boot the first entry (2.6.18-164.10.1.el5). If it fails, savedefault fallback causes the next fallback entry (kernel 2.6.18-164.9.1.el5) to be booted. If it also fails, the second entry (2.6.18-164.el5) is booted. Because it is the last entry, no more fallback entries can be booted which is why the third entry uses savedefault and not savedefault fallback.

Note that in this sample, the fallback entries follow the order in which they are defined. However, it doesn't have to be so. If you would prefer specifying the last entry (2.6.18-164.el5) as the first fallback kernel, then all you would have to do is change fallback=1 2 to fallback=2 1 when savedefault fallback is executed, it will look to the fallback parameter and set the next boot entry to entry 2 first, then entry 1.

Now that the system can gracefully handle kernel panics while booting, an obvious question is what if a kernel panic occurs once the system is running? Fortunately, there is a sysctl parameter that can automatically reboot the system after a kernel panic:

Preventing repeated remote login attempts

SSH: Denyhosts

On any server with SSH exposed, it is a good idea to install Denyhosts. Denyhosts will periodically audit the /var/log/secure file periodically and ban any users over a given number of authentication failures within a set amount of time. It is extremely useful in combatting the inevitable automated brute force attacks that your system will be a victim of once it is connected to the Internet.

yum install denyhostschkconfig denyhosts onservice denyhosts start

Although the defaults should 'just work' for more systems, if you wish to configure the number of failed authentication attempts before a ban (or the ban time, for example) you can look in /etc/denyhosts.conf.

Denyhosts has a whitelist file located at /var/lib/denyhosts/allowed-hosts to which you should consider adding your home IP address/hostname to to prevent yourself from getting accidentally locked out. Denyhosts will resolve domain names, so if you have a dynamic DNS account, you can add it to the file and rest easy knowing that your home IP is always whitelisted.

POP3/IMAP: Fail2ban

Like Denyhosts, Fail2ban also locks out malicious users once they have tried (and failed) to authenticate too many times on your system within a configurable period of time. However, unlike Denyhosts which only analyses the logs for SSH authentication failures, Fail2ban supports multiple "jail" definitions which can be configured with different log files and different regular expression patterns to match against. The code below will setup a Fail2ban jail that will lock out users with too many SASL authentication failures over POP3/IMAP, but you are also free to implement additional jails for other services if the need be.

You should consider editing /etc/fail2ban/jail.conf and add your IP address to the ignoreip parameter so you do not accidentally lock yourself out while testing. Just like Denyhosts, DNS hostnames are also resolved so feel free to use your dynamic DNS hostname.

Hardening the system

Below are a list of code snippets that you can execute to help harden the server with little or no effect from a user perspective. Many of these tips were found in the NSA's guide to securing RHEL 5.

Set /home to mount as noexec,nosuid,nodev to prevent binaries from being run

Update password hashing to sha512 (md5 has some well-known vulnerabilities):

/usr/sbin/authconfig --passalgo=sha512 --update

Note that you will need to reset any existing passwords to take advantage of the new algorithm, even choose the same password. In other words, at a minimum reset the password for root and your user account by running password username for each.

Nobody should be trying to plug memory stick into your server when you're not around, so allow administrators to manually insmod usb-storage but disable autoloading:

Disable prelinking by changing PRELINKING=yes to PRELINKING=no in /etc/sysconfig/prelink. In order to apply the changes immediately, start the prelink cron job one last time:

sh /etc/cron.daily/prelink

Firewall: iptables/netfilter

SELinux

TODO

Intrusion detection

Intrusion detection software is a very useful warning system to detect an intrusion in a server. Although they will require your careful attention because any minute change on the server will result in a warning, for the same reason they can provide detailed information about exactly what was compromised during a break-in.

yum install rkhunter aide

Although the defaults for AIDE are fine, rkhunter needs a bit more configuration. Near line 200, you will find:

ALLOW_SSH_ROOT_USER=yes

This setting needs to match the PermitRootLogin setting in sshd_config, so therefore the value needs to be changed to no. Near line 540, you will see:

XINETD_CONF_PATH=/etc/xinetd.conf

This line needs to be commented out, as xinetd is not installed at all. Near line 610, you will find:

APP_WHITELIST=""

rkhunter needs to be told that it is OK to be running the older versions of Apache, PHP and OpenSSL that the server will be using (Red Hat backports security patches), so change the line to:

APP_WHITELIST="httpd:2.2.3 php:5.2.9 sshd:5.4p1"

Now that rkhunter is fully configured, have rkhunter analyse the system: