Concatenate, Compress, Cache

When trying to optimize the performance of your website, there are three main
elements that should be on your top priority list. Three very easy-to-implement
steps that can have a great impact on your website load time.

These three methods are named concatenation, compression and cache. I've
already talked about them in a previous
talk (in French), but we'll now
cover them in full detail.

Concatenate

The main goal of concatenation is to merge several files of the same type in
one final file. Doing so will allow us to transfer less files over the wire.
CSS and JavaScript files are the one that yields the better results.

The nature of downloading assets itself makes your browser pay some costs on
every request, and in the end all those lost milliseconds adds up. Let's have
a look at the different costs we're paying:

TCP Slow start

TCP is the underlying protocol used by HTTP. It uses a mechanism known as
slow-start to get to the optimal transfer speed. To do so, it must do a few
round trips between the client and the server, sending more and more data,
until some are dropped, in order to get the maximum possible sending/receiving
throughput.

If we send a lot of small files, the mechanism never has time to reach the
optimal speed, and must start its round trips again on the next file. By
grouping files together in one bigger file, the cost of calculating the max
speed is only paid once and the remaining of the file can be downloaded at
full speed.

Note that maintaining Keep-Alive connections on your server can let you reuse
a previous connection, and thus only pay this cost once. Unfortunately,
activating Keep-Alive on an Apache server will also limit the number of
parallel connection your server can handle.

SSL

Similarly, the same kind of cost is paid when serving the site using https.
In order to prove that both client and server really are who they claiming to
be, an exchange of keys is done through a secure handshake. Once again, the
cost of the handshake is paid on each downloaded asset. Putting all your files
in one lets you pay the cost only one.

Parallel connections

Finally, the last limiting factor is purely in the client browser. Each browser
keeps track of a max number of parallel connections it can maintain opened to
any given server. The HTTP spec officially set this number to 2, but in the
real world browsers usually have a higher value, ranging from 8 to 12.

This means that if you ask your browser to download 5 stylesheets, 5 scripts
and 10 images, it will only launch the download of the 12 first
elements. The 13th asset download will only be started once one of the 12th
first will be finished. Once again, grouping your files together will
allow you to have more opened channels that will be used to download important
assets of your page.

CSS and JavaScript files can be very easily concatenated. You only have to
create one final file that simply contains the content of all your initial
files. Your build process can easily take care of that, but a very simple
solution can be written in a few lines:

cat ./src/*.css > ./dist/styles.css
cat ./js/*.js > ./dist/scripts.js

Please note that merging image files is also possible (and is named CSS
Spriting) but is out of the scope of this article.

Compression

Now that we've reduced the number of files, the next logical step is to make this
files smaller, so they can be downloaded faster.

Fortunately, the web has a magic word named Gzip that will reduce the size of
each textual asset by an average of 66%.

The good news is that most of the assets that we use to build a website are
actually made from text. Main bricks like HTML, CSS and JS of course but also
output from your API (JSON and XML). A lot of other format are in fact XML in
disguise, like RSS, web fonts or SVG images.

It is rare enough to be pointed out, but Gzip is perfectly supported by all
major browsers and servers (even as far back as IE5.5). There are absolutely
no reason to not use it.

If a browser does support Gzip, it will send an Accept-Encoding: gzip header
to the server. If the server find this header in the request, it will compress
the file on the fly before returning it to the client. It will also add the
Content-Encoding: gzip header, and the browser will uncompress once received.

The main point here is to have a smaller file moving across the wire. Server
and client will respectively compress and decompress the data, but the added
overhead is negligible on any machine built in the last decade. Having much
less data to transfer over the wire will give you tremendous speed
improvements.

Gzip compression modules are already available on all kind of servers, all you
have to do is enable them. You simply configure which kind of files must be
compressed, referencing them by their mimetype. You'll find below a few
examples on the most common servers:

Nginx

Enabling Gzip is really easy to set and it greatly improves loading time. You
do not have to change anything on the served files, all the configuration
occurs in the servers.

Minification

If you want to go even farther, you can invest on minifying your assets. Once
again, HTML, CSS and JavaScript are the best languages for minification.

Minification is a process that will rewrite all your assets in a lighter
version, using less characters, and thus being smaller to download. In essence,
it will remove comments and new lines, but language-specific minification tool
can also rename variable in your JS to shorter ones, group CSS selectors or
remove useless HTML attributes.

Adding a minification tool to your build process is more complex than enabling
Gzip and yields less impressive results. That's why we highly recommend that
you do not try to tackle it until you've enabled Gzip.

Cache

Now that we've limited the number of files and slimmed them down, the next step
is to download them as seldom as possible.

The main idea here is that there is no need to download something that your user
already has on its hard drive.

We're going to start by explaining how the HTTP cache mechanism works. This is
an area that is usually not very well understood by developers, so we'll try to
make it clearer. The main element is that there are two very different parts in
the HTTP cache : freshness and validation.

Freshness

Freshness is best understood as a "best-before" date for your assets. When
downloading an asset, the server send it with a header telling us until when
the asset will be considered fresh.

If the client needs the same asset again, it first checks the freshness of the
one in its cache. If it is still fresh, it does not start a request to the
server and directly use it. Nothing can be faster than that, because there is
absolutely no network involved.

On the other hand, if the freshness date is overdue, the browser will start
a new connection to get the latest version.

In HTTP 1.0, the server returns an Expires header including the max freshness
date. For example: Expires: Thu, 04 May 2015 20:00:00 GMT. This means that
when the client asks for the same asset again before May 4th 2015 at 8pm, it
will simply read it from its cache.

This header has a major flaw in the fact that dates are absolutely fixed and
thus the cache of all your clients will be invalidated at the same time. On May
4th, all your clients will request the new version of your asset at the same
time and your server might not be able to cope with all those connections.

To limit this effect and give a bit more flexibility in handling the cache,
HTTP 1.1 introduced a new header. Cache-Control accepts several arguments,
but the one we're interested in here is max-age. It lets you define a cache
duration in seconds.

Your server can now answer with a Cache-Control: max-age=3600 header. It
tells the client that the asset will still be fresh for the next minute (3600
seconds). By using this header, you can space your calls over a longer period.

Validation

The second part of caching is named validation. Let's imagine that our asset
is no longer fresh. We'll need to grab the latest version on
the server. But it is perfectly possible that the asset on the server hasn't
been updated since the last time the client fetched it. In that case it would
be useless to download it all over again.

That's when the validation kicks in. If the asset on the client is identical
than the one on the server, the client can keep its local version. If the two
assets are different, then the client downloads the new version from the
server.

How does it work ? When the client got the asset for the first time, it fetched
it along with a Last-Modified header. For example Last-Modified: Mon, 04 May
2015 02:28:12 GMT. This means that the next time the client will make
a request to get the same asset, it will send this date in the
If-Modified-Since request header: If-Modified-Since: Mon, 04 May 2015
02:28:12 GMT.

The server will then compare the sent date with the one it has on its side. If
the two dates matches, it will return a 304 Not Modified status, telling the
client that the content has not changed. The client in turn will use its local
version, and we avoided transmitting useless data over the wire.

On the other hand, if the server file is newer than the client file, the server
will with a 200 OK alongside the new content. That way, the client will now
download and use the new version.

By correctly using the validation mechanism, we avoid downloading a content we
already have.

In both cases, the server sends the freshness information again.

The HTTP spec allows us to choose between two couples of headers. We can either
use the Last-Modified / If-Modified-Since headers, as we just saw, or use
ETags.

An ETag is a unique identifier hash for a file. Whenever a file is updated, its
ETag will change too. For example, on the first call the server will return an
ETag: "3e86-410-3596fbbc" header. When the client will ask for this asset
again, it will send an If-None-Match: "3e86-410-3596fbbc" header. The server
will in turn compare the two ETags and either return a 304 Not Modified if
they are the same, or a 200 OK with the new content if they are different.

Last-Modified and ETag use very similar mechanism, but we advise you to use
Last-Modified over ETag.

Indeed, the HTTP spec tells us that in the case of receiving both
a Last-Modified and an ETag, the client should use the Last-Modified. In
addition, most servers generate their ETag using the inode of the file on disk,
so anytime the file is changed, it is reflected in its ETag.

Unfortunately, this can cause issues if you have several servers serving the
same content behind a load-balancer. Each server will have a copy of your
files, but on different filesystems which will result on different inodes, thus
different ETags for the same file. And your whole validation system will not
work as soon as your user gets redirected to a new server.

Note that nginx does not have this issue as it does not use the inode when
generating ETags. If using Apache, you can fix it with the FileETag
MTime Size option, or with etag.use-inode = "disable" under lighttpd.

Summary

The client makes a first request to get an asset. It gets a Cache-Control:
max-age header indicating freshness, and a Last-Modified for validation.

If requesting the same asset again while this one is still fresh, no network
connection is made and the asset is directly read from the local disk.

If requesting the same asset after its freshness date, the client makes
a call to the server, sending along an If-Modified-Since header.

If the file on the server has a modification date equal to the one sent, it
then returns with a 304 Not Modified.

If the file has been modified since the last time, the server answers with
a 200 OK alongside the new content.

In both cases, the server returns a new Cache-Control and Last-Modified.

Cache invalidation

Caching is a hard to tame beast, and we all know that:

There are two hard things in Computer Science: cache invalidation and naming
things.

And that's right, invalidating the cache of our clients when we need to push
a modification is extremely difficult. It's actually so difficult that
we're not going to do it at all.

Browsers cache stuff according to their URL. So if we need to update some
content, we only need to change its URL. URLs are cheap, we have them in
unlimited quantity, we can create as many as we need. We can add a version
number, a timestamp or a hash to our original filename and generate a whole new
URL.

By putting a very long cache period on these assets (1 year is the official
max the spec allows), it's like having them in cache forever. We just need
to put a shorter cache on the file that reference them (usually the HTML file).

That way, when pushing to production a modification to a stylesheet or to
a script, we just have to update the references to those files in our HTML
sources so our clients can download the new content. The cache period on HTML
files is much shorter, so updates pushed to production will be
quickly taken into account by clients.

Old content will stay in our clients cache, but this does not matter because we
will never request them again and unused items in cache are regularly erased.

This technique is actually quite close to the ETag we saw earlier, with one
big difference. Here, we can choose when we want to invalidate our client
cache.

In the end, we use a mix of both techniques to handle and optimal caching
strategy.

When the URL of an element is important (like HTML pages, or API endpoints), we
set a short freshness period (in seconds or minutes, depending on the average
update rate). This way we're sure that our clients will have the new pushed
version quickly after we deployed it, while still limiting the number of
requests the server must handle.

When the url of an element is not important (like CSS, JavaScript or image
files), we'll use the maximum freshness duration (1 year). This will let the
client keep the element in its cache forever, and avoid requesting the server
ever again for this asset. Whenever we update the file on our end, we'll
generate a new URL for it and simply update the reference to it in the HTML
source.

Conclusion

We saw how three very simple actions could greatly lower the total number of
files to download, make them smaller and download them less often.

Automatic file concatenation should be integrated into your build process, so
you can keep your development environment clean. Gzip compression only needs a few
configuration switches on your servers. Setting an optimal caching strategy
will require some work both on the build process and on the servers.

All those modifications are quite cheap to put in place and are not tied to any
back-end or front-end specific language, they can be applied whatever your
technical stack is. There is no reason why you couldn't deploy them today.