Trendwatching on the HTTP Archive: Interesting findings about payload, social content, and core best practices

Steve put up a great post about the HTTP Archive last month that I’ve been meaning to pile onto. As one of the archive’s financial supporters, Strangeloop is obviously a big fan and I’m always talking it up with our customers. (I was on the phone last week, pimping our mobile product with one of my favourite analysts — another data geek — who didn’t know it existed. She was very interested when I pointed out the incredibly exciting database we are creating.)

A few trends jumped out at me when I compared the first run in November 2010 to the latest run September 15, 2011.

As Steve pointed out in his post, payload is going up… and up… and up.

When I dug into this, I focused my attention on the top 100 sites because these guys represent my customers and I am very familiar with them. I wasn’t surprised to see total payload going up by 26% in just under a year — a pretty amazing number when you think about it. Images grew by close to 30% and scripts by close to 26%. It is tough to make pages fast when they grow this quickly.

When I see payload going up, my first instinct is to blame the unconverted — the big guys who just don’t get it yet. To test my assumption, I took a look at the players who do really get it. I was surprised by my findings:

On the one hand, Amazon decreased total payload by almost 15%. But they were the only site that showed improvement. Every other major player I checked increased their total payload: Google by 34.5%, Gmail by 25%, Yahoo by 18%, and Microsoft by 30%.

Not surprisingly, the number of requests increased across the board as well:

Overall, I was really surprised to see the big guys not practicing what they preach.

Social content is growing, and Google+ is neck-and-neck with Facebook.

I was also surprised at the growth in social on the top 100 sites. I was most surprised by the growth in Google+ and the fact that it is equal to Facebook. See below:

Popularity of JavaScript libraries in 2010:

Popularity of JavaScript libraries in 2011:

Twitter has pulled ahead, from 2% to 8%. Facebook has grown from 2% to 5%. And right out of the blocks, Google+ has surged to a tie with Facebook. Some people say Google+ is a flash in the pan, others say it’s a serious contender. I’ll be very interested in seeing where these numbers are at next year.

1 out of 4 of the top 100 sites still don’t use cache headers.

This is a core best practice, but about 1 out of 4 of the top 100 sites still don’t use it. This is a humbling reminder that, despite the great strides front-end optimization has made in the past couple of years, we can’t assume everyone is on the same page.

Correlations to render time and load time have inverted.

Both of these sets of graphs intrigued me. It’s interesting to see the decrease across the board in all of the items as a contributor to render time. At the same time, we see an increase in correlation to load time. The fact that these two graphs seem inverted makes me wonder if there’s a connection between them.

I asked Hooman Beheshti, our VP of Product, about this, and here are his thoughts:

Round trips correlate to load time a lot more this year, and are in front. With all the 3rd party and social networking tags, this matches what we see with our customers. Round trips continue to be a massive contributor to load time, maybe now more than ever.

Transfer size may be second, which may fool us into thinking we’re getting things from point A to point B faster, but their impact on total load time has gone up. So, it may not have as big an impact as roundtrips, but it matters more now than it did before.

The fact that domains used on a page is a new big-boy contributor to load time (and leads the charge now in render time) may point to the fact that, collectively, we may not be doing as well as we thought with modern browsers and parallelism. And by that, I don’t mean concurrent connections to the same domain – just concurrent connections, period. Either that, or the domains-per-page is increasing (by 30% according to this, and by 20%+ for the top 100) and so is its impact on performance. Third-party tags could also be a contributor to this.

That’s all I can think of. I don’t have general theories on why the numbers are bigger for one and smaller for the other. It’s interesting, though, that the trend for render and load times themselves is not a part of the comparison and analysis. It would be interesting to see if these metrics are going up or down on average.

I had a blast digging into the HTTP Archive, and I strongly encourage you to do the same, if you haven’t already. And if you have any theories about my findings, or findings of your own, I’d love to hear them.