Having a website crash due to high traffic is a failure of management, not load

Today has provided an interesting lesson for several organisations, with the crash of both the David Jones and ClickFrenzy websites in Australia.

But first, some background.

ClickFrenzy is a new 24-hour sale for Australian online retailers starting from 7pm on Tuesday 20 November.

Based on the US ‘Cyber Monday’ sale, which now attracts over 10 million buyers, ClickFrenzy was designed to entice Australian online shoppers to buy from local online retailers by offering massive discounts on product prices for a short period of time.

The event was announced over a month before it was due to start and has been promoted through newspapers, online and in some retail stores, with the ClickFrenzy team expecting thousands of shoppers to log on, likening it to a “digital boxing day sale”.

I’ve kept an eye on the ClickFrenzy site and signed up to receive an email alert when the sale began.

Just before the sale started I hopped back onto the ClickFrenzy site to see how it was going, and only saw a basic page of text, with no graphics or formatting. Puzzled I tried reloading – and the site wouldn’t load at all.

This meant that the list of participating retailers (many of whom had been kept secret) was inaccessible. No shopper knew who had the specials, meaning few sales could occur. Of the retailers that were known to be participating, many of their sites crashed too (such as Priceline and Myers).

In competition with ClickFrenzy, David Jones had decided to run its own independent 24-hour sale over a similar time period. Their sale, named ‘Christmas Frenzy’, was to be run from their main website.

How did their launch go? Their site also crashed, and was down for several hours, taking down not only the shopping site but all their corporate information.

So we had two major online sames on the same day from Australian retails, and both experienced crashes due to the volume of traffic.

What was to blame? Both claimed the failure was due to unprecedented demand. So many people tried to get onto both sites that their servers could not cope (the same reason given for the mySchools website issues at launch in 2010 and the CFA website issues during the Victorian fires in 2009).

Let’s unpick that reasoning.

The world wide web is twenty years old. Amazon.com is 18 years old. The US ‘Cyber Monday’ sale is six years old.

David Jones is an experienced retailer, with significant IT resources and has been operating an online store for some time. Their Christmas Frenzy sale was planned and well promoted.

Click Frenzy is being run by experienced retailers as well. They built an emailing list of people interested in the event and also widely promoted the sale. The retailers supporting them are large names and operate established online shopping sites as well.

In both cases the organisers had a wealth of experience to draw on. The growth of Amazon, the US Cyber Monday sales, their own website traffic figures and email list sign-ups, not to mention a host of public examples of how to manage web server load well, and badly, from media sites, social networks and even government sites (such as mySchools and CFA examples above).

There are many IT professionals with experience on how to manage rapid load changes on web servers.

There’s scalable hosting solutions which respond almost instantly to fast-increasing loads, such as during an emergency or with breaking news, and ‘scale up’ the site to support much larger numbers of simultaneous users. (Though in the case of Christmas Frenzy and Click Frenzy a large increase in load was expected, rather than unexpected.)

There’s even automated processes for testing how much load a website will be able to bear by simulating the impact of thousands or millions of visitors.

In other words, there’s no longer any technical reason why any organisation should have their website fail due to expected or anticipated load.

Load is not a reason, it is a justification.

We have the experience, knowledge and technology to manage load changes.

What the Click Frenzy and Christmas Frenzy failures illustrate is that some organisations fail to plan for load. They haven’t learnt from the experience of others, don’t invest in the right infrastructure and may not even test their sites.

They are literally crossing their fingers and praying that their website won’t crash.

A website crashing when it receives a high level of load that could be expected or planned for is crashing due to a failure of management.

The next time your agency’s management asks you to build a website which is expected to have a big launch or large traffic spikes, ask them if they’re prepared to invest the funds necessary for a scalable and tested website, built on the appropriate infrastructure to mitigate the risk of sudden large increases in traffic.

If they aren’t then let them know to cross their fingers and pray – and that a website crash due to high traffic is a failure of management, not load.

You might even get a Downfall parody video to memorialise the failure – as Click Frenzy received within two hours of their launch crash.

eGov AU

Craig Thomler’s personal Gov 2.0 and eGovernment thoughts and speculations from an Australian perspective

Discussion

I’m going to resurrect an old topic here! Coming from two different government agencies that are typically male-oriented and traditionally run, I would say that anybody working in those environments needs to pay attention to their behavior. Behavior that is seen as outside of the standard is not going to get any of us to… Read more »

@Jarrod Breuer excellent point about considering rural areas. That is so true, and you might even end up getting relocation expenses paid. That can be much easier than trying to compete for the first time in a large or popular market where there are likely to be several current government employees looking to transfer.

A flexibile designed, developed for State of Virginia (anchored with County of Fairfax, VA, urban community of 1.2 Million) connecting urban & rural, to help build community vitality & social capital from a bridge of arts and science . Key focus is to contribute to community vitality building social capital while connecting family or friends__community… Read more »

In one month, it’ll be three years since I published this article. Let’s see how we’re doing: According to the CIA World Fact Book, the U.S. HAS DROPPED IN RANK FROM 17 to 169 IN INFANT MORTALITY in the world. One click below Poland and one click above Serbia. According to the CIA World Fact… Read more »