There’s nothing better than graphs, and for a web site there are few better things to graph than the response times for your web pages. While there are plenty of external services out there that will probe your web site and graph the results, it’s a good idea to do this on your own too.

Munin is a monitoring tool that can provide graphs in plentiful for your servers. Out of the box, or at least out of its Ubuntu box, it monitors a variety of system metrics and applications, but there is no bundled support for response-time monitoring.

Luckily, it’s really easy to extend Munin with new plugins, so I decided to write my own plugin for monitoring response times, which you can download from my bitbucket repository. It will produce graphs like the one below:

Here the plugin has been configured to monitor three URLs in the same graph. Unlike normal Munin probes these URLs are external to the actual server running the plugin, but you could just as well monitor localhost URLs too.

To get the plugin up and running you first need to install Munin, if you haven’t already. For a one-server setup under Ubuntu, with master and client both running on the same machine, you do:

sudo apt-get install munin munin-node

This will start the munin-node service in the background, and also add the master-node cron jobs to /etc/cron.d/munin. To install the plugin itself you add the Python script to a folder of your choice, make it executable, and then symlink it from the Munin plugin folder, like so:

If you want to monitor more than one set of URLs, and thus have more than one graph, you can accomplish that by creating one symlink for each graph that you need. The names of the symlinks are used as section titles when configuring the plugin in /etc/munin/plugin-conf.d/munin-node. For the graph shown above, the configuration would look something like this:

The plugin requires that the following environment variables are specified for each URL to be monitored:

urlX_url — the URL that should be monitored

urlX_name — Munin field name for the probe

urlX_label — legend description of the URL

The ‘X’ in the variable names above should be replaced with an incremental index for each URL, e.g. url1 and url2. In addition, the following environment variables are also supported:

graph_title — the title of the graph (default is “Response time”)

graph_category — the category Munin should show the graph in

request_timeout — the socket request timeout (same for all URLs)

urlX_warning — warning level for the probe (for Nagios)

urlX_critical — critical level for the probe (for Nagios)

Note that Munin uses its own timeout when fetching plugin data. The default value is 10 seconds, which also is the default value for the URL request timeout. Because of this it might be appropriate to increase the Munin fetch timeout so that it equals the number of URLs being monitored times the request timeout, to make sure all probes have time to run.

Once you have configured the plugin to your satisfaction you need to restart the Munin node to make it discover the new plugin:

I was optimizing a Django application I’m working on the other day using Simon Willison’s excellent DebugFooter middleware, which adds a footer to each page showing which SQL queries were executed by Django when generating the page.

I’m a bit of a caching addict so I had already added a caching layer on top of my models, and thus I was quite surprised to find that the most important page on the site still generated 5-15 SQL queries on every access, even though the objects it was accessing supposedly were cached.

The objects were indeed cached, but every time I was accessing one of the ForeignKey fields on the model objects Django generated a SQL query to find the data for the related object. This could quickly turn nasty if you follow such relationships in a loop on a high-traffic web site.

The solution was the select_related() QuerySet method. Borrowing from the Django documentation, a normal ORM lookup would look like this:

e = Entry.objects.get(id=5)
b = e.blog

This would generate two SQL queries, one to fetch the entry object and one to fetch the blog object once it’s referred to from the entry object. The same example with select_related() becomes:

e = Entry.objects.select_related().get(id=5)
b = e.blog

This example only generates one SQL query, albeit bigger and slower than each of the individual queries in the first example because of the necessary join between the model tables to find all the data in one go. However, this doesn’t matter if the fetched object will go directly into a cache anyway and stay there for a possibly rather long time, which was the case for me.

Today Amazon announced yet another new name for its excellent product feed API: Product Advertising API. It was formerly called the Amazon Associates Web Service, and it provides access to Amazon’s vast database of product offers and related content such as user reviews. It has been around since 2004 so it’s certainly not a new service, but I recently decided it was time I tried it out.

Choosing an API implementationImplementing the API manually wouldn’t be too hard, but I figured someone would have done it for me already. Surprisingly, I did not find anything for Python that seemed mature or up-to-date. PyAWS was last touched in 2007, and pyecs only implements a subset of the API operations.

Although I’m sure something could be built on pyecs or PyAWS, I found that both PHP and Ruby had more mature packages available, in the shape of Tarzan and ruby-aws. Having also wanted to look into PHP web frameworks for a while, I decided this was a good opportunity so I went with Tarzan.

Choosing a PHP web framework
There’s a gazillion PHP frameworks out there, and most of them seem to have their fair share of loud-spoken supporters. Coming from Django, naturally my first thought was that I wanted the PHP framework equivalent of it. After some fruitless googling I decided there was no such thing, and instead I decided the selection criteria to be something fast, light-weight, and PHP5-based.

With the help of posts like this, this, and this, I somewhat arbitrarily narrowed down the contenders to Kohana, Zend, and Yii, and ultimately picked Yii since it was the new kid on the block.

A few snags…
I ran into a few snags during my brief foray into PHP land with Tarzan and Yii so I thought I’d write them down here, just in case it might help someone facing the same issues as I did, or in case I run into them again myself.🙂

There’s a bug in the stable version on Tarzan that makes it ignore any Amazon Associate IDs you supply in the configuration or the class constructors. Product links returned by the Amazon API will because of this not be tracked, which means you won’t get any revenue share from Amazon to your affiliate account.

Tarzan returns SimpleXML objects, but apparently it’s not possible to serialize PHP built-in objects. I learned this when I tried to put the data I got back from Tarzan uncasted into memcached, and got this perplexing error message on retrieval:

unserialize() [function.unserialize]: Node no longer exists

I first had multiple class definitions per PHP file, and this worked fine with Yii and its autoload support. However, when I tried to put these objects into memcached I again got confusing errors when they were unserialized, e.g.:

YiiBase::include(BrowseNode.php) [yiibase.include]: failed to open stream: No such file or directory

The BrowseNode class was not defined in a file of its own, and I suppose that’s why it couldn’t be found. When I moved it into a separate BrowseNode.php file, things started to work.

Update: this note is not quite correct, please see the comments below this post!Something really weird happens when you try to use the CHtml class in Yii to output an image with an empty or missing URL. This will make the controller action execute twice! I have no idea why this happens but it took me a good while to track down the cause. To reproduce, add the line below to a view in your Yii application and add a log statement to the corresponding controller action:

Final words
Amazon’s product API really is a solid, fast, and comprehensive service that deserves all the praise it has received. With the new API name, Amazon today also announced that API requests in the future will have to be authenticated through a hash calculated for every request based on your AWS identifiers and the parameters of the request. This requirement is phased in over a period ending on August 15, 2009.

Tarzan obviously lacks support for this, but at least the author is aware of the change. Apart from this and the annoying Associate ID bug I mentioned previously, Tarzan worked great for me and I wouldn’t hesitate to use it again, seeing that it’s actively maintained and tries to stay on top of Amazon’s evolution of services.

As for Yii, I did not use it enough to give a proper verdict — I barely tested the ORM support for instance — but it was easy getting started and its MVC structure seemed logical enough, although the relative youth of the framework is visible in some rough edges here and there. Yii markets itself as a high-performance framework and although I don’t have that many reference points, the execution speed was more than satisfactory. Would I use it again? Probably, but I’ll check out Kohana too at some point.

I’ve watched two presentations lately that I enjoyed, so I thought I’d link to them here.

The first one is by Cal Henderson at DjangoCon 2008. Cal is an engineering manager at Flickr, which not surprisingly is written in PHP, and he delivered a keynote address on why he hates Django.

Although made tongue-in-cheek, it contains a bunch of very valid points about Django. One of the main ones being Django’s monolithic database approach. This is probably also my own biggest concern with Django. I have first-hand experience of making this design mistake for a web site that grew rather big, and it can easily turn into a major and prolonged headache.

The other presentation is by Aditya Agarwal, an engineering director at Facebook, at QCon SF 2008. Aditya talks about the Facebook software stack, which somewhat crudely described is a normal LAMP stack, albeit heavily tuned, backed by memcached and a number of backend services. Facebook is obviously a very extreme environment but many of the design choices and observations in this presentation are valid for smaller sites too.

Continuing on the path of getting my VPS up and running, the time has come to install the actual mail services. For this I’ll go with the standard Ubuntu choices of Postfix as MTA and basic MDA, and Dovecot as IMAP server.

This launches a dialog where you need to choose what kind of mail server you intend to operate, and which domain your server will maintain the mail for. I chose Internet Site and entered my domain name (e.g. example.com). The package installer automatically sets a number of default settings, which you can override by launching another interactive dialog:

sudo dpkg-reconfigure postfix

However, dialogs are pretty annoying so I’ll edit the settings manually instead through the convenient postconf utility, and then tell Postfix to reload its config:

You obviously need to replace all occurrences of example.com with your actual domain name. These settings control which domains this server is the final destination for, which networks to relay mail for (only this host), and what mailbox format to use. I like Maildir since it stores each mail in a separate file, which is both robust and convenient. See the official Postfix documentation for more configuration options.

The /etc/aliases file controls which email aliases Postfix should use when delivering mail locally on the machine. By default, it will contain a mapping for postmaster to root, but we should also add a mapping for root to the user who should read root’s mail (e.g. johndoe), and make that active for Postfix through the newaliases command:

Step 2 — create an SSL certificate
To use IMAP over SSL we need to create an SSL certificate to use with Dovecot. I’ll use a self-signed certificate since this is just my personal server with few users. First we generate a private key for the certificate and make it readable only by root, and then we create the certificate itself:

Make sure to provide the actual domain name of your mail server, e.g. mail.example.com, when asked about the “Common Name”. Otherwise email clients will complain every time they connect to the server. Since this is a self-signed certificate not backed by a Certification Authority clients will complain the first time anyway, but if you save the certificate subsequent connects will go through silently. You can read more about openssl here.

Step 3 — install Dovecot
Time for Dovecot. It can act both as an IMAP server and a POP3 server, but I will only use its IMAP capability. Who uses POP3 nowadays anyway? Install the package and open the main config file:

sudo apt-get install dovecot-imapd
sudo vi /etc/dovecot/dovecot.conf

Change the following options to enable IMAP over SSL, tell Dovecot to use the Maildir mailbox format, and point it to the SSL key and certificate:

I mentioned in my last post that I had got a new VPS to host my “my mail and some other things” on. One of those other things is a Jabber / XMPP server for instant messaging.

There are many XMPP servers out there, and at least three of them are available in Ubuntu’s software repository: jabberd14, jabberd2, and ejabberd. There is also the popular Openfire server, which isn’t available in the Ubuntu repository but a breeze to install nevertheless through its web-based configuration UI.

My VPS does not come with a whole lot of memory, so I needed a Jabber server with a small memory footprint. Although you can trim the Java-based Openfire down to fairly low levels of memory usage, it’s at a disadvantage compared to the C/C++ based jabberd14 and jabberd2. The same goes for the Erlang-based ejabberd, so it came down to jabber14 or jabberd2. Considering that jabberd14 seems pretty dead with no updates since 2007, I chose jabberd2. This is how I installed it:

Step 1 — add universe to sources.list
The universe repository component needs to be enabled in /etc/apt/sources.list, see step 4 in my last post for how to do this.

Step 2 — install MySQL
jabberd2 can be used with several different storage and authentication backends. I prefer the default Ubuntu choices of MySQL for both storage and authentication, as I use MySQL for different things too and I like simplicity. If you don’t have it installed already, get it by:

sudo apt-get install mysql-server

Still hunting for a small memory footprint, I also switched the default MySQL config to the stock config for systems with little memory:

Somewhat disappointingly, MySQL still hogs more than 100 MB of precious memory, which is a bit funny since the config claims it’s meant to be used for systems with less than 64 MB of memory.😉 However, if you don’t plan to use InnoDB tables, the magic-wand solution is to add the skip-innodb directive to the [mysqld] section of /etc/mysql/my.cnf. This alone brought the memory usage down to just 15 MB after startup for me.

Step 3 — install jabberd2
Get jabberd2 from universe by:

sudo apt-get install jabberd2

For some reason it seems to start up by default even though it’s not properly set up yet, so let’s shut it down while we’re configuring it:

Additionally, it makes sense to also change the password of the jabberd2 backbone router component in router-users.xml and each of the component configuration files.

Although there are other configuration options available too, the above steps should be enough for a basic installation. Start jabberd2 up again through:

sudo /etc/init.d/jabberd2 restart

At this point, you should be able to connect to the server and register with an IM client of your choice, e.g. Miranda. Don’t forget to open up the Jabber ports in your firewall: port 5222 for normal connections, 5223 for SSL connections, and 5269 for server-to-server connections.

I recently got a new VPS for hosting my mail and some other things. The server was installed with a minimal Ubuntu 8.10 distribution, which basically meant that nothing except init, syslogd, and sshd was running after boot. Before doing anything else with it, here’s what I did to lock it down a bit security-wise:

Step 1 — Add user account
It’s good practice to not do stuff logged in as the root user, so the first step is to add a user account from which we can sudo. Log in as root only this time and add the user (e.g. johndoe) and sudo access as follows:

useradd -m johndoe -s /bin/bash
passwd johndoe
visudo

Running visudo will edit the /etc/sudoers file with the default editor of your environment. Add this line to allow johndoe full sudo permissions:

johndoe ALL=(ALL) ALL

Step 2 — Install iptables
Still as root, we’ll set up the iptables firewall to make sure only specifically allowed inbound network traffic is allowed to the server:

apt-get install iptables

Configuring iptables can seem pretty complex at first, but here’s a decent tutorial. Firewall rules can be added directly from the command line:

Essentially, the above rules allow all outbound traffic, block all inbound traffic by default, and specifically allow ssh, smtp, http, and imaps traffic, which is what I need to begin with. To make sure the rules are persistent after e.g. a server reboot we add them as a script hook to the network interface:

Step 3 — Update sshd config
Now that iptables and a user account with sudo rights are set up, it’s time to leave the root shell, log in as the new user, and then edit the sshd config to lock down future ssh access a bit:

sudo vi /etc/ssh/sshd_config

Add/change the following settings:

PermitRootLogin no
AllowUsers johndoe

This disallows root from logging in through ssh (console login is still allowed though) and restricts ssh access to only be allowed for the johndoe user. Make the changes active by reloading the config:

sudo /etc/init.d/ssh reload

Step 4 — Update sources.list
This being a very minimal install, the stock /etc/apt/sources.list file only included the main repository component. To install the denyhosts package (see next step) the universe component is needed, so we need to add it to sources.list:

Once done, update the package lists from the newly added sources and upgrade all currently installed packages to the latest versions:

sudo apt-get update
sudo apt-get upgrade

Step 5 — Install DenyHosts
Even with iptables and the sshd configuration changes we made, we still allow some users to log in through ssh, which makes us vulnerable to remote brute-force attempts to gain access through these accounts. One good way to do away with this threat is to only allow public-key authentication, or restrict access to only a list of specified IPs through iptables or /etc/hosts.deny, but if this is not practical for whatever reason the DenyHosts package comes to the rescue.

DenyHosts monitors the sshd authentication log to detect evil login attempts and adds suspicious IPs automatically to the /etc/hosts.deny file. It’s available in Ubuntu from the universe repository component (see previous step), and is easily installed like this:

sudo apt-get install denyhosts

This will automatically start a python daemon in the background, which also is persistent on reboot through a symlink in /etc/rc3.d/. The default settings are pretty decent, but should you want to review or change them you can do so in /etc/denyhosts.conf.