How two Uni Students built a better Census site in just 54 hours for $500

One of the fundamental problems with the 2016 online census was the architecture. Not the building the ABS works in, but the way the computer system built to handle millions of Australians was designed. Turns out two uni students designed a better way to do it in just 54 hours on the weekend – at a cost of just $500.

If there’s one thing a computer programming student loves, it’s a hack-a-thon. Now, for the uninitiated, this is not an event where smart people hack innocent people’s computers over and over again – it’s a concentrated period of time within which teams are required to come up with an idea and build it.

Austin Wilshire and Bernd Harzer are both from the Queensland University of Technology. Austin studying IT, Majoring in Computer Science, while Bernd is studying Creative Industries and Information Technology.

And their approach – vastly different to the ABS and their contracted developer IBM.

Scale. That’s right, Austin and Bernd wanted to design for scale.

The traditional approach to designing web services is “on-premise” – this means that somewhere there are a bunch of computers all built to serve up the content – in this case, census forms. This is what IBM and the ABS did with the actual Census.

But at the Code Network “winter hack-a-thon” on the weekend, these two smart cookies went for a “cloud-first” design which can quite simply “infinitely scale”.

What this means is, you use a service like AWS (Amazon Web Services) and the software is built to simply grow, as load increases, it re-deploys itself to continually be able to cope with the demand.

Think about it – does Amazon.com go down often?

They built the site, and even “load tested” it – remember the ABS spent almost half-a-million dollars on Load testing their failed site? In addition to the $9.6million to design and build it?

On the weekend “Make Census Great Again” was load tested to 4 million page views per hour. And 10,000 submissions per second – insane numbers.

For the record, Code Network is a volunteer student-run organisation based at the QUT. It was founded last year and it’s aim is to help produce the best software developers on the planet and has 1500 members.

We all know who to ask for help on the next big government project don’t we.

As for Austin and Bernd – they won a Microsoft Surface Pro 4 donated by event sponsor Technology One.

Trev produces two of the most popular technology podcasts in Australia, Your Tech Life and Two Blokes Talking Tech. He hosts a nightly radio show on Talking Lifestyle, 8pm Monday to Friday in Sydney, Melbourne and Brisbane, appears on over 50 radio stations across Australia weekly, and is the Tech Expert on Channel 9’s Today Show and A Current Affair. Father of three, he is often found down in his Man Cave.

I agree that it’s worth investigating, and that perhaps there are far better choices than IBM. However, the comparison is not fair since it’s not apple-to-apple.

As a disclaimer, I am not affiliated with IBM, its subsidiaries, nor the Census project. I am however, making a living working on large and scalable web systems (although not Twitter or Facebook-scale).

Note that I can’t find the source code. Normally, they should be publicly available as part of a hack-a-thon, so my comments below are based on just inspecting the HTML source code and its public network traffic of the site linked in the article (http://makecensusgreatagain.com/). My opinion might change if I have access to the source code.

The claimed handling (x) times more load does not hold water, simply because it is not grounded in reality.

TL,DR; The primary reason why it can handle much more traffic as opposed to the Census website is because it doesn’t seem to be doing much at all.

The front-end looks to be a static HTML site hosted from AWS S3 bucket, with a number of assets leeched from the Census CDN. This form simply does JSON AJAX POST submission and redirect once a response is received.

The back-end form processing looks to be a NodeJs application (through hints on the endpoint address, but can’t be 100% sure), with an AWS CloudFront for caching. I can’t tell what kind of persistence technologies they are using or even if they are using any at all.

Using static HTML front-end and/or heavy caching is a good strategy to increase throughput. However, the student’s project does not deliver various functionalities often needed with lengthy / multi-page form websites such as the Census website, for example:

– session management, e.g. I got disconnected and want to resume from last point
– conditional form branching, e.g. if I answer A, I can skip section 5
– data validation, e.g. date must be in dd/mm/yyyy format

Once you added the above and some other essential functionalities, you’ll see the performance drops significantly, mainly because the web server will have to do a lot more work other than passing data to store.

In addition, most of the time, the developers do not have full control over what kind of infrastructure or other piece of software they can deploy their code to. Other commenters have mentioned restrictions due to regulation, auditability, available commercial support, and there might also be many client-imposed constraints.

Pile on constraints after constraints (they can all be valid constraints, mind you), various parts of the system can easily become a bottleneck, and delivering scalable systems becomes a really tough job.

I hope you get my point.

PS. Note for the students: simply absorbing DDoS traffic is not a good strategy because it’s far cheaper for the attacker to launch, increase, and sustain the attack volume than for the client to absorb the increasing attack (inbound) traffic. Hence, mitigation strategy at the network layer is really important, which is why IBM and ABS’s biggest blunder was to reject this offer, IMO.

Governments do not over engineer. In this case it’s a woeful example of incompetence! Since this is Australia it’s hard to fathom corruption, bribery or underhandedness but certainly this whole thing stinks of a pitiful attempt at something they knew the scale of by its very definition! Its the census for god’s sake.

While that is possible it solution lacks the operational support necessary for the overall solution once it has gone live. On the other hand, the solution will not meet privacy act and national archive act standards which are the real reasons why IBM didn’t use widely accepted technologies to cater for this load.

Kudos to them for winning, but that’s some… interesting code. It seems extremely boilerplate (made from a template); has a fraction of the fields; doesn’t validate fields (I can click Submit without having entered anything); backend seems to likely be some boilerplate/sample code. Not trying to knock them or make myself seem better… just seems strangely cobbled together even for a hackathon. These would no doubt add more necessary processing on the server, skewing their load test results.

Fortunately for very valid security reasons we have legislation in place that prevents private citizen data from being hosted (or even passing through) unsecured data centres, let alone foreign hosted data centres. hackathons are great, but thinking they’re useful for anything beyond producing proof of concepts is bad reporting. One of the key features of cloud data centres is that they’re shared, when you’re not using the compute power someone else is, and having someone else running code on the same metal that the census was running on is a risk no security expert in the right mind would be willing to accept. A private cloud (which is probably what they would have run) is about as good as you could go.

If a scalable cloud environment could enable online activity like this, how do we as a nation work to build or certify such an environment to deal with our requirements? Why are we just brushing these ideas aside – we have to move forward, the Census set us back a decade

Yep, AWS have Australian servers. But in reality – it’s a bigger question of being a bit more innovative, doing better planning and not going the “safe” old-school route with things. Governments must be agile if they are asking us to be?

Off the top of my head the first major stumbling block to achieving the purported efficiency and affordability of the solution is a little thing called the law.

Rightly so: the legislation within both the Privacy and National Archives Acts categorically prevents private citizen data from being hosted (on or through) data centers that aren’t weapons-grade secured.
One of the key features of cloud data centers is that they’re shared, when one user is not using the processing power another Is, the concept of having someone else running code on the same metal that the census was running on is a risk no security expert in their right mind would be willing to accept.

The effort required to actually build a scale-able architecture of those proportions, not forgetting the safeguards and measures for redundancy that meed to be in place AND ensuring reliable availability of qualified, proficient operational support once it goes live. Know of a team of specialists, trained in national security protocols, that amasses a diversity of knowledge flexible enough required to coordinate the variety of architectures, hardware, and procedures likely to be encountered?

Facebook would shit itself. Reckon if these kids knew wifi is metered in jail; they would too.

There is AWS served out of Sydney. However for their redundancy they backup across their various sites. Also as pointed out, it is an American company. It is unfortunately not currently an option allowed within government services from the short stint of work I’ve been involved in their innovation department.

Also according to the IT guy I was working with on a government project, IE7 still needs to be supported. This makes me cry and I don’t even have to do the coding. It does however add much more work to a project.

The government is trying to be more agile, but it is going to be a long journey. This census has it quite a bit easier then other services though, because it doesn’t have to tie in or depend on other services that are likely not yet setup to be nimble.

Even with all the loops and jumps and departments and people such endeavours have to pass through, it still is disappointing what appears to be a webform with a secure database has cost so much and delivered so terribly. This is before going into the appalling communication of the state of the website and dubious claims of being DDOSed.

While there was obviously a gross technological mishandling of the census by IBM (and I can’t understand those blaming the Gov btw, what do you want, Turnbull up late provisioning autoscaling groups?), I can understand why they would want to keep the data and servers in house. There are huge data security concerns for a service that sensitive beyond just having to scale. Sounds like that’s what they were prioritising?

An interesting idea, equally interesting calls to make!
Off the top of my head the first major stumbling block to achieving the purported efficiency and affordability of the solution is a little thing called the law.
Rightly so: the legislation within both the Privacy and National Archives Acts categorically prevents private citizen data from being hosted (on or through) data centers that aren’t weapons-grade secured.
One of the key features of cloud data centers is that they’re shared, when one user is not using the processing power another Is, the concept of having someone else running code on the same metal that the census was running on is a risk no security expert in their right mind would be willing to accept.
The effort required to actually build a scale-able architecture of those proportions, not forgetting the safeguards and measures for redundancy that meed to be in place AND ensuring reliable availability of qualified, proficient operational support once it goes live. Know of a team of specialists, trained in national security protocols, that amasses a diversity of knowledge flexible enough required to coordinate the variety of architectures, hardware, and procedures likely to be encountered?

MSM would shit itself. I reckon if they knew wifi is metered in jail; these kids would too lol

Mate, as a guy who looks after data for a living. I don’t think you know what you’re talking about. If insurance and finance companies don’t want to make the leap yet because of APP concerns then govt (especially census that now stores individual details) will not touch cloud with a barge pole. In your rush to praise students, don’t trivialise an important concern.

APprecaite your concern, but you’re missing the bigger point here – these guys performed a load testing to four times the capacity the ABS assumed they would need and paid half-a-million dollars for! Insane

I think it’s all summed up by “Although IBM have obviously done an awful job” – that’s what we’re trying to address here – there are many more ways to cut this one, and the ABS failed to innovate in that sense.

Yes, someone else’s computer could do the work. (Remember folks, there is no cloud out there; it all resolves to a real computer controlled by someone else.)

If the ABS got some other party to handle confidential information – mandated from every Australian resident by law – how would they ensure data sovereignty? How much would it cost to ensure that, compared to keeping the computers and the data where they can be verified secure?

They may have done some interesting architectural stuff on the backend, but as others have mentioned the website itself is terrible for what I imagine the main goals would be, security, scalability & data validity. No form validation, at all, like load page press submit no problems, come on now. HTML & CSS code not optimised with lots of commented out code blocks? Pulling external JS library? There was probably 100+ requirements for a project like this, these guys provided a non-workable solution to one or two (the cloud) and we have an article claiming it is great.

Trevor if as you say you are trying to address IBM did a terrible job, do that, this website does nothing advance that argument at all.

Perhaps fix the label on the email field to be email instead of name also on this website first though.

“…was load tested to 4 million page views per hour. And 10,000 submissions per second – insane numbers.”

Surely the author is aware that DDOS attacks don’t use HTTP/HTTPS, load testing is quite irrelevent here. Apparently AWS would have held up “fine” to a DDOS attack though. Very scientific insight there. No worries mate, she’ll be right.

The question of “Why didn’t ABS have stronger DDOS protection in place?” can and should be asked, but the idea that the entire census form could be replaced with a basic HTML form on AWS is rubbish.

The government has way too many regulations and legacy systems to make simple elegant solutions possible. The consultants had their hands tied and were unable to deliver an optimal solution. On top of that, add the usual laziness, politics and incompetence, you have the recipe for a disaster.

The census was just a simple form capturing a number of text fields, with some conditional logic that can live on the client side, it does not require any sessions since anyone can re-fill that form in less than 5 minutes. Scalability was the only issue here, using on-premise infrastructure is the main issue, the cloud had everything needed for this to work, but privacy acts and shit loads of other regulations prevented that from being part of the equation.

2 students take 54 hours to build a dumb static web form, and host it on public cloud infrastructure. This is not impressive at all. This isn’t even 0.1% of what IBM did for the census project, and it took them 54 hours to do what an average web developer could do in 4 hours. Also, 2×54 hours for $500 comes to $4.62/hour, so their estimates are at best wrong.

hahahahahahahahahahahahahahahahaha, two boys with no experience in providing secure web forms managed to win a $500 crappy tablet. There is no validation either client side or server side, and the load testing claims are just claims you could probably load test the census site using the same method and it would come out in front. It is a big claim to say they have beaten the ABS, IBM and the Federal Government at something that could not even match the complexity that involves protecting valid user data.
Good luck to the boys, but I think you have drawn a long bow and this article is fantasy- your fantasy.