I was very lucky with this: the format is utterly consistent because it is spat out of their database. I was able to complete it with about 160 lines of code.

After getting the code working, I cleaned it up by removing all the leftover print statements until the only thing that would be produced at runtime was a message when a new month became available in the database. The email generating code is on line 34 and it has so far worked once by sending me an email which looked like:

Who gets this email? Those who are listed as doing so in the editors list (see the image above). Maybe if you are a a journalist with international conflicts on your beat, you ought to get on this list. The emailer technology was outlined in an earlier blog-post. There is no UI for it, so it can only be enabled by request [send your request to feedback].

Phase Two: The analysis

What we have now is a table of over 86000 records stretching back to January 2003. The important columns in the table are:

month text,
country text,
mission text,
people integer

It turns out there are hundreds of relevant timeline graphs which you can make from this data with a little bit of SQL.

For example, what are the three top countries in terms of maximum deployment to any mission? Find it using:

[The reporter who encouraged me to scrape this dataset had a theory that these peacekeeping missions are a clever way for nations to get their troops battle-hardened before the inevitable conflict on their own territory. In other words, they also serve as war-training missions.]

Now let’s have a look at the just the deployment of peacekeepers from India, Bangladesh and Pakistan to MONUC (United Nations Organization Mission in the Democratic Republic of the Congo) over time.

[There is no easy way to embed this google’s dynamic javascript timeline object into a blog, so I have to present a bitmap image, which is quite annoying.]

As you can see, the pattern of deployment tends to remain at a constant quota over many years, with sudden jumps, probably due to requirements on the ground. Pakistan appeared to supply both of these peacekeeping surges, once in 2003 and once in 2005, while Bangladesh surged at one and India surged at the other.

The picture for UNOCI (United Nations Operation in Côte d’Ivoire) is different:

There is none from India, but a fixed contingent between Bangladesh and Pakistan; 600 peacekeepers were swapped between them in August 2006.

Now, you could ask who are the other countries which make up the bulk of this mission, and you could answer the question by developing the necessary SQL statement yourself, but it’s a little unfair to expect everyone who is interested in this data to already have mastered SQL, isn’t it?

Phase Three: Presentation

This is the very hard part, and is usually the point where most promising projects get abandoned, because “someone else better than me at design will come along and finish it.”

Except they never ever do.

As you’re really the only one in the world who comprehends the contents and the potential of this dataset, it is your job to prove it.

Here is my attempt at a user interface for generating graphs of the queries that people might be interested in. It has taken me two hard hacking sessions to get it into this form — or twice as long as it took to write the original scraper.

It is almost as time-consuming as producing video marketing.

This is also usually the phase where all those design geniuses come out of the woodwork and start getting critical and disparaging of your efforts, so you can’t blame programmers who don’t go this far. It’s like sweating all month learning to play a new piece of music on the piano, only to get reminded again and again that you don’t have the talent.

This used to bug me big time. Until I realized that it’s actually a positive sign.

What’s infinitely worse than criticism is no criticism at all because nobody has any idea about you are trying to achieve.

Now they think they know what you are trying to do — which is why they can be critical.

The next step is for them to actually know what you are trying to do. This ought to be a small step — and if they can’t make it, and don’t even try to make it, then by definition they cannot very good designers at all.

Look, you have just got all this way starting from nothing, from finding something out in the world, to recognizing its potential, all the way to pulling in and transforming the original raw data and struggling for a way to analyse it. It’s like you have prospected for the diamonds, found them in the earth, cut a mine tunnel to it with your bare hands, separated it from the rock, roughed out its edges, glued it onto a steel washer for a ring, and oh, it doesn’t look very professional and polished now does it? Come on, give us a break! We’ve applied bags of essential skills which hardly anyone else is capable of, so why should we expect to be especially good at this phase? Does your horse have table manners? No. But it works for its hay doesn’t it?

So anyway, here is what the current result looks like:

[Question: Does the Nepalese deployment react to events that were reported in the news during the course of the Haiti mission?]

When the page initializes there are three ajax call-backs to the database to obtain the lists of countries, missions, and top contributions from countries to specific missions. You can multiple select from the countries and the missions lists to create timeline graphs of numbers of people. If you select only from the countries list it shows the troop contributions from those countries to all UN missions. If you additionally select a single mission as well it will graph those country contributions to that specific mission. And it works the other way, vice versa, for lists of missions v countries. The top contributors table helps identify who are the top countries (or missions), so you know which ones to select to make an interesting graph that is not all zeros. (eg no point in graphing the number of Italians deployed to Nepal, because there aren’t any.)

Where do the Italians go? You can find that out by selecting “Italy” from the “Contributor nations” column and clicking on the “Refresh” button on the “Top contributions” column. And you can also click on “Make timeline” to discover that Italy never sent anyone anywhere until late 2006, when they suddenly started deploying two to three thousand peacekeepers to Lebanon. What happened then? Did something change in Italian politics around that point? Maybe people who write Italian newspapers ought to know.

Okay, the user interface is not great, but it achieves the objective of facilitating the formulation of relevant questions, and answering them — which is more than can be said of a lot of artistic user interfaces that crop up around the place, like so many empty bottles of wine.

Phase four: Publishing and promoting

There is no point in doing all this work if the people who would be interested never get to see it.

This bit I cannot do at all, so I don’t even try. I do know that throwing up a long rambling technical blog about the project does not constitute effective publication. In fact, according to the news rules, “once it’s told, it’s old”, so I have just completely ruined everything, because it can now never get onto the New York Times or The Guardian on their data blog section for its 15 hours of fame, before being lost into the past archive where no one is interested at all while it steadily goes out of date through the coming months and years.

Except this dataset, with the infrastructure behind is different, because it remains in date for the foreseeable future. So it really ought to have a home somewhere, like those stock market indicators, ever present on the business pages, like the daily crossword or cartoon.

Who knows how to get this done? It’s not my bag and I am quite exhausted.

What I do know is that I had to keep looking up what all those acronyms mean until I decided I should copy them down in the code and use them for tool-tips. It took quite a bit of work, and was repetitive, and maybe should have been scraped from somewhere. But was probably extremely well worth doing, so I am repeating it here.