Automating complexity: The future of website performance optimization

Five years ago, when we started Strangeloop, being able to automatically apply the Yahoo and Google performance best practices universally across a website was a monumental achievement. Little did we realize this was just the beginning of a learning curve that has grown infinitely more nuanced.

Browsers began to innovate, which created a new dimension to the problem. Each browser required its own interpretation of the performance rules, with variations existing even between different versions of the same browser. (Domain sharding remains a good example of this, in that it can hurt performance in modern browsers and help performance in older browsers.)

Then last year we went through a process where our customers — having internalized the idea that every second matters — drove us to optimize for user flows and experience, with a hawkish eye on business metrics like revenues and conversions. It became clear that optimizing isolated pages, out of the context of their place in a user’s flow through your site, does not automatically improve performance.

It also became clear that we can’t treat landing pages the same as we treat non-landing pages. As a result, we found ourselves once again changing how we applied performance best practices. (In December, I wrote a case study about this for Stoyan Stefanov’s performance calendar.)

Now we are seeing the next step of performance evolution.

Our collective fixation on the mantra “every second counts” — combined with trying to anticipate user behaviour — is being taken to the next level.

We are moving from the “simple” world of universally applying 15-20 performance treatments across a site to a world in which these 15-20 techniques are applied uniquely to each page of a site according to the following parameters:

To say that this is an exponential increase in complexity may be an understatement.

Thinking about the evolution of this problem reminds me of one of my favourite childhood books, now long out of print, called The Magic Well by Piero Ventura. In it, a small town becomes overwhelmed by the complexity created by a magic well that produces yellow balls. When I think about the complexity of the web performance problem for app developers and IT departments, these images keep coming up.

I see the early days when the problem was small but manageable, and today where the problem has become so complex that it is overwhelming.

Are we tackling an impossible problem?

Trying to divine user behaviour in a very complex world is not unique to our industry. I think it’s instructive to look at the evolution of the internet marketing industry. In the old days, you would come up with a good campaign and execute. Then the internet provided marketers with a cheap, easy platform to start testing campaigns, and A/B testing became the norm. This is essentially the stage the performance industry is at right now, as we focus on accelerated versus unaccelerated page results.

As marketing matured, some marketers realized that A/B tests were not good enough and moved to a better solution: multivariate testing (also known as MVT). With MVT, marketers identified a fixed set of variables, created a large number of combinations of those variables, and then tracked users’ preferences to determine which combination was most effective.

The advantage of multivariate testing over simple A/B testing is that it helps website owners take a much more granular approach to figuring out what works and doesn’t work, allowing them to fine-tune a page and squeeze every last drop of value from it. (If you want more background on this fascinating aspect of marketing, Elastic Path has an excellent post about A/B and multivariate testing on their blog.)

Multivariate testing is a critical process when we realize that a problem has become so complex that our intuition is not reliable. We may think we can anticipate how users will interact with a page, but we can’t trust our gut feelings. For a humbling reminder of this, check out Anne Holland’s blog, Which Test Won. Each week, Anne shows you two landing pages and asks you to guess which test won in an A/B test. I like to think I’m pretty marketing savvy, and my success rate on Anne’s blog is only about 30%.

So what does this have to do with website optimization?

I feel that, like marketers, we are at a place in the web page optimization world where we are unable to intuitively make decisions about performance optimizations without applying various combinations of rules for each browser and then measuring real-world results. Marketers use multivariate testing to find the precise combination that works best. We should do the same for performance, to know what combination of performance best practices applied to all aspects of a site (pages, sessions, workflows, browsers, caches, etc) work best.

Obviously, if you apply the generic set of best practices across your site, you’ll get a significant advantage, but this advantage will not be enough in the long run. We know that revenue, conversions, page views and customer satisfaction are not just affected by each second you shave from page load times — they’re affected by every millisecond. In an increasingly competitive online world, the winners are going to be those companies that work to shave off every last millisecond.

The counterargument is this: Why do you need to try hundreds of combinations? Isn’t the best one the one that makes your site the fastest? Marketers don’t have a metric to measure by, other than how well people react in the real world (i.e. there’s no Webpagetest for marketing). Performance is measurable — there are lots of tools. Why not just use the combination that makes your site fastest?

I would answer this by saying that my experience over the last year has suggested that, like marketers, we cannot figure out the best combination in the lab because we cannot predict real user behaviour. I am continually surprised at how bad we here at Strangeloop are at predicting outcomes when we turn various Site Optimizer features on and off.

Admittedly, 20% of the effort gets you 80% there, but you need 80% effort to hit the last 20% — and it’s the last 20% that actually matters. That’s the key part. The easy stuff ultimately doesn’t matter. Using a marketing analogy, it’s like putting the menu bar at the bottom of the page, below the fold. Of course you don’t do that. You, like most people, know that the menu belongs above the fold. But it’s not enough to plunk your navigation at the top of the page. The nuances of its layout and design are what will ultimately determine its success, and these nuances are the hardest thing to figure out.

So how do we take care of this last 20% without losing our minds or blowing the bank?

Multivariate testing works on the internet because the platform provides a cheap way to automate it, and it is possible to incorporate feedback in real time and adjust accordingly.

In a similar way, I believe that transformation-based performance solutions should be able to take page performance and user-based behaviour metrics into account, then perform our own version of ongoing multivariate testing. I also believe that we should be able to perform these tasks quickly and cost-effectively.

I am working to move my company to a place where we can test hundreds of different acceleration combinations on different browsers on any given page, flow, and website — all in an automated way. I see this as the future of our industry.