Sponsor

This site it’s a sysadmin relative of the Perl Advent Calendar: One article for each day of December, ending on the 25th article. With the goals of of sharing, openness, and mentoring, the authors aim to provide great articles about systems administration topics written by fellow sysadmins.

In past sysadvents, I’ve talked about babysitting services and showed how to use supervisord to achieve it. This year, Ubuntu started shipping its release with a new init system called Upstart that has babysitting built in, so let’s talk about that. I’ll be doing all of these examples on Ubuntu 10.04, but any upstart-using system should work.

For me, the most important two features of Upstart are babysitting and events. Upstart supports the simple runner scripts that daemontools, supervisord, and other similar-class tools support. It also lets you configure jobs to respond to arbitrary events.

Diving in, let’s take a look the ssh server configuration Ubuntu ships for Upstart (I edited for clarity). This file lives as /etc/init/ssh.conf:

Events

Upstart supports simple messages. That is, you can create messages with ‘initctl emit [KEY=VALUE] …’ You can subscribe to an event in your config by specifying ‘start on …’ and same for ‘stop.’ A very simple example:

You can also conditionally accept events with key/value settings, too. See the init(5) manpage for more details.

Additionally, you can start jobs and pass parameters to the job withstart helloworld key1=value1 ...

Problems

Upstart has issues.

First: Debugging it sucks. Why is your pre-start script failing? There’s no built-in way to capture the output and log it. You’re best doing ‘exec 2> /var/log/upstart.${UPSTART_JOB}.log‘ or something similar. Your only option for capturing output otherwise is the ‘console‘ setting which lets you send output to /dev/console, but that’s not useful.

Second: The common ‘graceful restart’ idiom (test then restart) is hard to implement directly in Upstart. I tried one way, which is to in the ‘pre-start’ perform a config test, and on success, copy the file to a ‘good’ file and running on that, but that doesn’t work well for things like Nagios that can have many config files:

The above solution kind of sucks. The right way to implement graceful restart , with upstart, is to implement the ‘test’ yourself and only callinitctl restart nagios on success – that is, keep it external to upstart.

Third: D-Bus (the message backend for Upstart) has very bad user documentation. The system seems to support access control, but I couldn’t find any docs on the subject. Upstart doesn’t seem to mention how, but you can see access control in action when you try to ‘start’ ssh as non-root:

So, there’s access control, but I’m not sure anyone knows how to use it.

Fourth: There’s no “died” or “exited” event to otherwise indicate that a process has exited unexpectedly, so you can’t have event-driven tasks that alert you if a process is flapping or to notify you otherwise that it died.

Fifth: Again on the debugging problem, there’s no way to watch events passing along to upstart. strace doesn’t help very much:

Lastly, the system feels like it was built for desktops: lack of ‘exited’ event, confusing or missing access control, stopped state likely being lost across reboots, no slow-starts or backoff, little/no output on failures, etc.

CONCLUSION

Anyway, despite some problems, Upstart seems like a promising solution to the problem of babysitting your daemons. If it has no other benefit, the best benefit is that it comes with Ubuntu 10.04 and beyond, by default, so if you’re an Ubuntu infrastructure, it’s worth learning.