2015년 6월 14일 일요일

Setting up mcelog to work with systemd

If you regularly observe system logs such as /var/log/messages, dmesg, or journalctl (systemd) you will eventually encounter a Machine Check Event (mce) which warns you that some kind of hardware error has occurred. For example, a common mce is caused when an incorrect bit is flipped in RAM. For server ECC memory, this is less of a problem because such bit errors can be fixed automatically. When encountering servers in the field with uptime greater than 365 days, it is not hard to find mce errors logged here and there. In the case of RHEL 5/6 machines I encounter in the field, mce errors are logged in the file /var/log/mcelog and the mcelog service runs by default.

Recently I noticed that every few days the kernel ring buffer dmesg on my work laptop gives the following error:

[Jun11 23:49] mce: [Hardware Error]: Machine check events logged

However, when I navigate to /var/log/ I cannot see any file named mcelog. Some old posts floating around the Internet recommend redirecting mcelog to some output file, i.e. /usr/sbin/mcelog > mcelog.out but this didn't work for me. Make sure you have the mcelog package (as it is called in Arch) installed . To enable the daemon in systemd, systemctl enable mcelog. When running mcelog on a Linux machine running systemd instead of the old syslog, you need to make some changes to /etc/mcelog/mcelog.conf

If you are running systemd you do NOT want the above setting! The problem is that systemd handles system logging through journalctl. You can follow the other suggestions in the Archwiki to run mcelog as a daemon (daemon = yes), but make sure the syslog lines are commented out. Also you need to specify an output log file for mce errors by uncommenting the following in /etc/mcelog/mcelog.conf:

logfile = /var/log/mcelog

Also uncomment the following in /etc/mcelog/mcelog.conf

run-credentials-user = root

Restart the mcelog service

systemctl restart mcelog

Next time a Machine Check Event occurs, it will be written to /var/log/mcelog. Here is some sample output: