On Surviving The Verge, Gamespot, IGN, and Reddit

Written by andy on {{ "2015-03-30T00:45:00+00:00" | date "longDate" }}

Depending on who you speak with, getting featured by big players in the tech sector is either the best thing that happened to their company, the worst thing that happened to their server, or sometimes both of the above. While I am not in charge of the infrastructure at work, I was fortunate enough to have the opportunity to try my hands at taking on such an assult. While the onslaught is still happening (Google Analytics suggests 742 active users as of the time of writing, averaging about 15 new visitors per second), I think I am finally happy with the setup and can share some things I've learned.

The Opportunity

I was commuting in on Friday, March 27th, when I noticed Erik Ross made a remake of the first level of Super Mario 64 in Unity, and supposedly playable in the browser. Unfortunately for me, by the time I've noticed it, Reddit Hug have already taken the web player down; twice, once on Dropbox, once on another shared hosting provider. I did some detective work, and found out that the author not also went to Simon Fraser University like myself, but have also worked with some of my friends. Being overwhelmed by how we're connected, I reached out to offer some hosting for him. Within a few quick e-mails, and some quick setup, the game was play-able in browser by anyone online again.

The Original Setup

Since I had some extra AWS credits, I decided it was a great idea to just put the file to S3 and see how it goes. I was able to follow the Example: Setting Up a Static Website Using a Custom Domain documentation, and go from zero-to-running in a matter of minutes. We put the file online, and watched the traffic go up in Google Analytics. Shortly after, Erik's original article was featured on The Verge, and the traffic seriously started to pour in. I've noticed peak time with as much as 2,500 concurrent users. As the surge of adrenaline wears off, analytical brain kicked in, I started to do some math, and reality starts to kick in. At 9 cents per Gb of data transfer out of S3, and 16Mb payload for the unity3d project, I can only serve around 75,000 visitors before my $100 credit are burnt through completely... and at 2,500 concurrent visitors (about 30 visitors per second), I'll probably run through that credit in a matter of hours. Not to be beaten, I dropped Erik a quick note, and started making adjustments.

The Second Setup

Being the hosting addict that I am, I've got a wide range of hosting accounts under my name sitting around doing nothing. As such, I threw together a quick DNS round-robin based mesh myself consisting of:

This worked moderately well. I was able to tank the onslaught even with IGN and Gamespot joining the onslaught. However, I was easily burning through hundreds of Gb of bandwidth at an astounding 350Mbits across all hosting accounts. By the end of Saturday I was sitting at around 2.7TB of total bandwidth used.

The Final Hack / Setup

Back in 2006, I had a run-in with bandwidth overage, which set me back $293. I'm not particularly fond of history repeating itself, so after seeing the stats, I went back to the drawing board and tried again. Since CloudFlare tends to cache static content better, the final hack was renaming the .unity3d file to .png, which tricks CloudFlare thinking the 16MB unity web export as a static image, so it can cache the whole thing without ever hitting the server. This resulted in a drastic drop in out going bandwidth from my servers, and with the drop in outgoing bandwidth, came drop in server load. In the end, I was able to scale back to just:

Both VMs was able to stay at around 1~1.5 load average... considering they were serving only one static HTML using nginx, I guess it is not too bad. The party didn't last long, though. About two days later, Nintendo sent a DMCA, which required me to take the whole thing down. Oh well, it was fun while it lasted.

On Scaling Dynamic Content

This exercise was mainly on static content, which means the CDN can handle large amount of the stress for me. However, had this been dynamic content, scaling strategy would have to be very different. Traditional web apps can scale so much vertically, and when it inevitably hit the vertical scaling limit (max CPU at max CPU cores possible in one chassis, max RAM at max RAMs possible on the motherboard), it will hit a performance ceiling. Scaling horizontally with load balancers would be a bit better, but the database becomes the new bottleneck. New approaches must be considered, or the same success problem will become the scaling problem that hinders your growth.

I'll close the post with a list of things to consider to scale dynamic contents: