The Narrative

In 2010, I was fresh out of business school and starting a new part of my career at Yahoo!. Having consulted to dozens of big companies on information technology, Yahoo! woke me up to one undeniable fact: No one outside of a handful of companies really understand how to scale technology, especially websites.

In my years of IT consulting, dealing with internal systems that had to bear the brunt of 10,000 concurrent users seemed daunting. Now, I was at Yahoo! where some properies would see several hundred million hits on a normal day. On days where something crazy happened (e.g. Michael Jackson dying, a new Pope, the Biebs doing something stupid), site traffic could go up by 10x. Now, we're dealing with billions of page views and hundreds of millions concurrent users.

I got two big learnings at Yahoo! that are particularly relevant here:

How do you launch new product to a market as large as the United States without it being a disaster like healthcare.gov?

How do you deal with traffic spikes (like they dealt with the first time they made the site available)?

On launching new product

Lesson One: Alpha/Beta Testing

Apparently, no one beta-tested healthcare.gov. This is insane. President Obama was hoping that there was going to be huge demand for a healthcare insurance exchange. He got his wish, but thanks to site problems, it turned into a nightmare. The testimony to Congress says it wasn't tested enough. I would contend it wasn't tested at all. Automated testing tools and internal load testing tools are nothing like real-world usage. That's why companies like Yahoo! do the following:

Release the product to a small group (aka alpha release). Within that small group, you can also do user interviews to gather direct feedback. However, with good web analytics, you can see where these users are having problems.

Fix the problems.

Release to a larger group of people (maybe now this leaves alpha and enters a beta period)

Fix more problems

Rinse repeat until you've covered everyone.

Through good PR, you can even tell that initial group of people why they should be excited to be guinea pigs. Of course, you sell it by explaining they're going to get exclusive access to something that the whole US is going to get.

Lesson Two: Minimally Viable Product (MVP)

If you've ever used really complex software, it'd be a daunting experience. Let's say you're using an ERP (enterprise resource planning) suite for the first time. Why is it overwhelming? Because there's an HR module, a MRM module, a GL module, a CRM module, etc. What if I now only exposed you to a little at a time? Better, right?

Healthcare.gov should have done the same thing. They could've broken it down into many product "releases", but the two that are no-brainers would be:

Release 1: People could go in and look, and do their research. No actual shopping.

Release 2: Turn on shopping capability.

(Following the rules of alpha/beta testing of course...maybe at first, you only offer shopping to one state or a set of select cities, as an example.)

Lesson Three: Learn by Observing, not Asking

I made this mistake a lot in my enterprise days.

Me: "So, will that work for you?" (after walking through a software workflow)
Customer: "Yes, I guess."
Me: "Great!" (puts check in box)

Yeah...come to find out, that's not going to build great software.

Better is to set up yet another bucket test (or A/B test) and give a small set of your users the "new experience", or in this case, that new button. Track how they react. Are they clicking on it? What are they doing after? What happens when you move it elsewhere, change its size, change the wording, etc?

People aren't good at knowing what they need. They only know after they experience something that it's either working for them or not working for them.

On Spikes in Traffic

I'm not as knowledgeable about this as doing real-world testing before letting the hounds loose, but I can tell you that releasing your software in waves through proper testing processes will decrease spikes in traffic.

If healthcare.gov was first just an informational website at MVP, then that would have reduced traffic when the shopping experience was activated, right?

Plenty of technologies from cloud-based computing help solve specifically for spikes. In Amazon Web Services, you can spin up new servers on-demand in a matter of seconds. And, one you're done serving your spike, you can spin those back down.

Companies like Yahoo!, Google, Amazon, Facebook, and Microsoft have embraced cloud-based computing because it saves on resources. Why?

Old way (before cloud computing)
Figure out the total amount of traffic you could get, try to predict the amount of traffic you'll get at steady state, and try to predict your spikes. Buy enough gear to handle the spikes and have a bunch of equipment do nothing most of the time, or draw a line in the sand on what you will support and then pray the Biebs doesn't do something stupid tomorrow.

The New Way (after cloud computing)
You still do all that figuring, but the workload is distributed across a set of shared resources. If your stuff spikes, chances are that other machines won't be doing much, so the traffic will get pushed to those servers that have bandwidth. Then, you pray that the Biebs does do something stupid because more traffic = more ad dollars!