Search

The Times Online’s redesign, and a word about performance testing

Late last night (UK time) the Times Online launched their new design, and jolly nice it is, too. It’s clean and spacious, and there’s an interview with the designers who are very excited about the introduction of lime green into the logo. Personally, I like the columnists’ pictures beside their pull-quotes. That’s a nice touch. You can also read about it from the Editor in Chief, Anne Spackman.

However, not everyone at the website will have been downing champagne as the moment the new site went live, because in the first few hours it was clearly having some performance issues. We’ve all had those problems some time in our careers (it’s what employers call “experience”), and it’s never pleasant. As I write it looks as though the Times Online might be getting back to normal, and no doubt by the time you read this the problems will all be ancient history. So while we give their techies some breathing space to get on with their jobs, here are three reasons why making performant software is not as easy as non-techies might think…

1. Software design

A lot of scalability solutions just have to be built in from the start. But you can’t do that unless you know what the bottlenecks are going to be. And you won’t know what the bottlenecks are going to be until you’ve built the software and tested it. So the best you can do from the start is make some good judgements and decide how you’d like to scale up the system if needed.

Broadly speaking you can talk of “horizontal scaling” and “vertical scaling”. The latter is when you can scale by keeping the same boxes but beefing them up — giving them more memory/CPU/etc. The former is where you can scale by keeping the same boxes end-to-end, but add more alongside them. Applications are usually designed for one or the other (deliberately or not) and it’s good to know which before you start.

Vertical scaling seems like an obvious solution generally, but if you’re at the limit of what your hardware will allow then you’re at the limit of your scalability. Meanwhile a lot has been made of Google’s MapReduce algorithm which explicitly allowed parallelisation for the places it was applied — it allowed horizontal scaling, adding more machines. That’s very smart, but they’ll have needed to apply that up-front — retrofitting it would be very difficult.

You can also talk about scaling on a more logical level. For example, sometimes an application would do well to split into two distinct parts (keeping its data store separate from its logic, say) but if that was never considered when the application was built then it will be too late once the thing has been build — there will be too many inter-dependencies to untangle.

That can even happen on a smaller scale. It’s a cliche that every programming problem can be solved with one more level of indirection, but you can’t build in arbitrary levels of indirection at every available point “just in case”. At Guardian Unlimited we make good use of the Spring framework and its system of Inversion of Control. It gives us more flexibility over our application layering, and one of our developers recently demonstrated to me a very elegant solution to one particular scaling problem using minimally-invasive code precisely because we’d adopted that IoC strategy — effectively another level of indirection. Unfortunately we can’t expect such opportunities every time.

2. Devising the tests

Before performance testing, you’ve got to know what you’re actually testing. Saying “the performance of site” is too broad. There’s likely to be a world of difference between:

Testing the code that pulls an article out of the database;

Testing the same code for 10,000 different articles in two minutes;

Testing 10,000 requests to the application server;

Testing 10,000 requests to the application server via the web server;

Testing the delivery of a page which includes many inter-linked components.

Even testing one thing is not enough. It’s no good testing the front page of the website and then testing an article page, because in reality requests come simultaneously to the front page and many article pages. It’s all very well testing whether they can work smoothly alone — it’s whether they work together in practice that counts. This is integration testing. And in practice many, many combinations of things happen together in an integrated system. You’ve got to make a call on what will give the best indicators in the time available.

Let me give two examples of integration testing from Guardian Unlimited. Kevin Gould’s article on eating in Barcelona is very easy to extract from the database — ask for the ID and get the content. But have a look down the side of the page and you’ll see a little slot that shows the tags associated with the article. In this case it’s about budget travel, Barcelona, and so on. That information is relatively expensive to generate. It involves cross referencing data about the article with data about our tags. So testing the article is fine, but only if we test it with the tags (and all the other things on the page) will we get half an idea about the performance in practice.

A second example takes us further in this direction. Sometimes you’ve got to test different operations together. When we were testing one version of the page generation sub-system internally we discovered that it slowed down considerably when journalists tried to launch their content. There was an interaction between reading from the database, updating the cache, and changing the content within the database. This problem was caught and resolved before anything went live, but we wouldn’t have found that if we hadn’t spent time dry-running the system with real people doing real work, and allowing time for corrections.

3. Scaling down the production environment

Once you’ve devised the tests, you’ll want to run them. Since performance testing is all about having adequate resources (CPU, memory, bandwidth, etc) then you really should run your tests in an exact replica of the production environment, because that’s the only environment which can show you how those resources work together. However, this is obviously going to be very expensive, and for all but the most cash-rich of organisations prohibitively so.

So you’ll want to make a scaled down version of the production environment. But that has its problems. Suppose your production environment has four web servers with two CPUs and six application servers with 2GB of RAM each. What’s the best way to cut that down? Cutting it down by a half might be okay, but if that’s still too costly then cutting it further is tricky. One and half application servers? Two application servers with different amounts of RAM or CPU?

None of these options will be a real “system in miniature”, so you’re going to have to compromise somewhere. It’s a very difficult game to play, and a lot of the time you’ve got to play to probabilities and judgement calls.

And that’s only part of it

So next time you fume at one of your favourite websites going slow, by all means delve deep into your dictionary of expletives, but do also bear in mind that producing a performant system is not as easy as you might think. Not only does it encompass all the judgement calls and hard thinking above (and usually judgement calls under pressure), but it also includes a whole lot of really low-level knowledge both from software developers and systems administrators. And then, finally, be thankful you’re not the one who has to fix it. To those people we are truly grateful.

2 thoughts on “The Times Online’s redesign, and a word about performance testing”

The site was certainly taking quite the beating yesterday and unfortunately I wasn’t able to get any of the pages to load. Today it seems to be holding up a little better. They posted a little explanation of what went wrong which I thought was good to see.