Recently I’ve been working on some interesting projects involving the eventually consistent Riak key-value database. Today I encountered a puzzling issue with a fresh cluster I was deploying. I had seemingly done everything identically to in the past except something was causing it to fail to startup when in service mode.

[administrator@riak01 ~]$ sudo service riak start
Starting Riak: Riak failed to start within 15 seconds,
see the output of 'riak console' for more information.
If you want to wait longer, set the environment variable
WAIT_FOR_ERLANG to the number of seconds to wait.
[FAILED]

Bizarrely, in console mode it would start up fine. This led to me to believe it was some sort of user or permissions issue but I wasn’t totally sure. Perhaps I had accidentally executed Riak as another user and some sort of locking file was created? Or was it perhaps a performance issue with the “Shared Core” Azure VM’s I was using.

First I followed Riak’s advice by increasing the WAIT_FOR_ERLANG environment variable, I tried first 30 and then 60 seconds. But this made no difference at all. I’m not even sure if Riak was even using my new value as it still kept on printing out “15 seconds” as its reason for failing to start.

I researched some more and many places on the interweb were suggesting to purge the /var/lib/riak/ring/ directory (don’t do this if you have valuable data stored on the Riak instance). I tried this, but it also had no effect.

But it turned out that the solution was incredibly simple. Riak had created some sort of temporary lock directory at /tmp/riak. All I had to do was delete this directory and, hey presto, Riak would now start perfectly fine as a service!

Recently I deployed a CentOS 5.5 x64 guest on a Server 2008 R2 Hyper-V host for running a basic mail server. However I quickly noticed that the guest was losing time sync very rapidly, in the order of several positive minutes per hour. This was a surprise as I had already installed the Hyper-V Linux Integration components from Microsoft which I had assumed would just take care of everything for me. Not so, apparently.

I might also add that Dovecot (an IMAP/POP3 server) kept crashing due to the periodic NTP sync being so far out that it resulted in the guest’s clock actually going BACK in time! This appears to be an acknowledged bug in Dovecot but it is understandable why they don’t feel a pressing need to fix it. Though I understand the 2.0 release has. For the record, I was running the 1.0.7.7.el5 release.

Anyway, it turned out that the solution was very simple.

Modify the /boot/grub/grub.conf as follows:

Editing the /boot/grub/grub.conf file.

﻿﻿﻿Essentially you need to modify the lines that start with the word “kernel” and add two extra options onto the end:

clock=pit

This sets the clock source to use the Programmable Interrupt Timer (PIT). This is a fairly low level way for the kernel to track time and it works best with Hyper-V and Linux.

notsc

This is included more as belt-and-braces than anything. Because setting the PIT clock source (above) should already imply this setting really. But I include it for pure expressiveness 🙂

divider=10

This adjusts the PIT frequency resolution to be accurate to 10 milliseconds (which is perfectly sufficient for most applications). This isn’t strictly required but it will reduce some CPU load caused by the VM. If the VM will be running time sensitive calculations a lot (such as say a VoIP server or gaming server) then you probably shouldn’t include this option.

Once you’ve done this, save the file and reboot the box.

The time should now be synchronized precisely with the Hyper-V host!

PS: I’ve not tested whether this solution will work without the Hyper-V Linux Integration Components installed, but I believe that it will. As it operates independently of Hyper-V’s clock synchronisation mechanism and relies purely on the virtualised Programmable Interrupt Timer that is exposed by Hyper-V. And the PIT clock source will remain unchanged whether you have the Linux Integration Components installed or not.

PPS: The VMware knowledge base has a great article on this subject: http://kb.vmware.com/kb/1006427. Although take it with a slight pinch of salt (because the subject is Hyper-V) but it certainly gives several more options and ideas to try for different Linux distributions and 32-bit vs 64-bit etc environments.