Lessons Learned: March Madness Bracket Buster

The NCAA men’s basketball tournament kicked off last week, and like many other people, I was looking forward to building my bracket. My colleagues and I enjoy some friendly competition, so we used CBS Sports as our bracket challenge host. The cut-off for submissions was Thursday at noon—being busy with work and a procrastinator by nature, I, along with many of my colleagues, waited until the hour before brackets closed to finalize my picks. Unfortunately, CBS Sports’ March Madness website crashed at this time, preventing users (myself included) from accessing their accounts and causing them to miss the submission deadline.

During this downtime, users were experiencing an “Internal Server Error” as they attempted to login.

With less than an hour until the brackets locked, I setup a transaction test to monitor the CBS March Madness website to try to find the root cause of the issue. The test started around 11:30 AM, and I was able to see consistent failures when attempting to login.

As revealed in our data, it appears that the CBS March Madness site was unable to handle the increased load capacity from so many users trying to make last-minute changes to their brackets. The timing of the issue prevented many users from being able to submit a bracket for the tournament at all.

As we’ve seen with outages in the past, social media becomes a very public platform for users to vent their frustration. While CBS Sports was trying to recover, users took to Twitter to complain about the issue.

CBS Sports managed to bring the page back up shortly around 1:00 PM EST, but by then the bracket deadline has passed and those who couldn’t login were unable to submit their picks for the tournament.

This incident isn’t an uncommon performance problem, but that doesn’t mean it can’t be avoided. Paying particular attention to load testing your website or webpage prior to an event that is known to cause increased traffic volumes can help identify areas for optimization and mitigate the potential for an outage. Of course, there is no way to predict and prevent every single performance problem, so monitoring real-user experiences during the actual event will reveal problems immediately and allow you to fix them before your user experience is negatively affected.