Efficiently compressing dynamically generated web content

I originally wrote this article for the Web Performance Calendar
website,
which is a terrific resource of expert opinions on making your website
as fast as possible. We thought CloudFlare users would be interested so
we reproduced it here. Enjoy!

Efficiently compressing dynamically generated web content

With the widespread adoption of high bandwidth Internet connections
in the home, offices and on mobile devices, limitations in available
bandwidth to download web pages have largely been eliminated.

At the same time latency remains a major problem. According to a recent
presentation by Google, broadband Internet latency is 18ms for fiber
technologies, 26ms for cable-based services, 43ms for DSL and
150ms-400ms for mobile devices. Ultimately, bandwidth can be expanded
greatly with new technologies but latency is limited by the speed of
light. The latency of an Internet connection directly affects the speed
with which a web page can be downloaded.

The latency problem occurs because the TCP protocol requires round trips
to acknowledge received information (since packets can and do get lost
while traversing the Internet) and to prevent Internet congestion TCP
has mechanisms to limit the amount of data sent per round trip until it
has learnt how much it can send without causing congestion.

The collision between the speed of light and the TCP protocol is made
worse by the fact that web site owners are likely to choose the cheapest
hosting available without thinking about its physical location. In fact,
the move to ‘the cloud' encourages the idea that web sites are simply
‘out there' without taking into account the very real problem of latency
introduced by the distance between the end user's web browser and the
server. It is not uncommon, for example, to see web sites aimed at UK
consumers being hosted in the US. A web user in London accessing a
.co.uk site that is actually hosted in Chicago incurs an additional 60ms
round trip time because of the distance traversed.

Dealing with speed-of-light induced latency requires moving web content
closer to user who are browsing, or making the web content smaller so
that fewer round trips are required (or both).

The caching challenge

Caching technologies and content delivery services mean that static
content (such as images, CSS, JavaScript) can be e cached close to end
users helping to reduce latency when they are loaded. CloudFlare sees on
average that about 65% of web content is cacheable.

But the most critical part of a web page, the actual HTML content is
often dynamically generated and cannot be cached. Because none of the
relatively fast to load content that's in cache cannot even be loaded
before the HTML, any delay in the web browser receiving the HTML affects
the entire web browsing experience.

Thus being able to deliver the page HTML as quickly as possible even in
high latency environments is vital to ensuring a good browsing
experience. Studies have shown that the slower the page load time the
more likely the user is to give up and move elsewhere. A recent Google
study said that a response time of less than 100ms is perceived by a
human as ‘instant' (a human eye blink is somewhere in the 100ms to 400ms
range); less than 300ms the computer seems sluggish; above 1s and the
user's train of thought is lost to distraction or other thoughts. TCP's
congestion avoidance algorithm means that many round trips are necessary
when downloading a web page. For example, getting just the HTML for the
CNN home page takes approximately 15 round trips; it's not hard to see
how long latency can quickly multiply into a situation where the
end-user is losing patience with the web site.

Unfortunately, it is not possible to cache the HTML of most web pages
because it is dynamically generated. Dynamic pages are commonplace
because the HTML is programmatically generated and not static. For
example, a news web site will generate fresh HTML as news stories change
or to show a different page depending on the geographical location of
the end user. Many web pages are also dynamically generated because they
are personalized for the end user — each person's Facebook page is
unique. And web application frameworks, such as WordPress, encourage the
use dynamically generate HTML by default and mark the content as
uncachable.

Compression to the rescue

Given that web pages need to be dynamically generated the only viable
option is to reduce the page size so that fewer TCP round trips are
needed minimizing the effect of latency. The current best option for
doing this is the use of the gzip encoding. On typical web page content
gzip encoding will reduce the page size to about 20-25% of the original
size. But this still results in multiple-kilobytes of page data being
transmitted incurring the TCP congestion avoidance and latency penalty;
in the CNN example above there were 15 round-trips even though the page
was gzip compressed.

Gzip encoding is completely generic. It does not take into account any
special features of the content it is compressing. It is also
self-referential: a gzip encoded page is entirely self-contained. This
is advantageous because it means that a system that uses gzipped content
can be stateless, but it means that even larger compression ratios that
would be possible with external dictionaries of common content are not
possible.

External dictionaries increase compression ratios dramatically because
the compressed data can refer to items from the dictionary. Those
references can be very small (a few bytes each) but expand to very large
content from the dictionary.

For example, imagine that it's necessary to transmit The King James
Bible to a user. The plain text version from Project Gutenberg is
4,452,097 bytes and compressed with gzip it is 1,404,452 bytes (a
reduction in size to 31%). But imagine the case where the compressor
knows that the end user has a separate copy of the Old Testament and New
Testament in a dictionary of useful content. Instead of transmitting a
megabyte of gzip compressed content they can transmit an instruction of
the form \<Insert Old Testament>\<Insert New Testament>. That
instruction will just be a few bytes long.

Clearly, that's an extreme and unusual case but it highlights the
usefulness of external shared dictionaries of common content that can be
used to reconstruct an original, uncompressed document. External
dictionaries can be applied to dynamically generated web content to
achieve compression that exceeds that possible with gzip.

Caching page parts

On the web, shared dictionaries make sense because dynamic web content
contains large chunks that's the same for all users and over time.
Consider, for example the BBC News homepage which is approximately 116KB
of HTML. That page is dynamically generated and the HTTP caching headers
are set so that it is not cached. Even though the news stories on the
page are frequently updated a large amount of boilerplate HTML does not
change from request to request (or even user to user). The first 32KB of
the page (28% of the HTML) consists of embedded JavaScript, headers,
navigational elements and styles. If that ‘header block' were stored by
web browsers in a local dictionary then the BBC would only need to send
a small instruction saying \<Insert BBC Header> instead of 32KB of
data. That would save multiple round-trips. And throughout the BBC News
page there are smaller chunks of unchanging content that could be
referenced from a dictionary.

It's not hard to imagine that for any web site there are large parts of
the HTML that are the same from request to request and from user to
user. Even on a very personalized site like Facebook the HTML is similar
from user to user.

And as more and more applications use HTTP for APIs there's an
opportunity to increase API performance through the use of shared
dictionaries of JSON or XML. APIs often contain even more common,
repeated parts than HTML as they are intended for machine consumption
and change slowly over time (whereas the HTML of a page will change more
quickly as designers update the look of a page).

Two different proposals have tried to address this in different ways:
SDCH and ESI. Neither have achieved acceptance as Internet standards
partly because of the added complexity of deploying them.

SDCH

In 2008, a group working at Google proposed a protocol for negotiating
shared dictionaries of content so that a web server can compress a page
in the knowledge that a web browser has chunks of the page in its cache.
The proposal is known as
SDCH
(Shared Dictionary Compression over HTTP). Current versions of Google
Chrome use SDCH to compress Google Search results.

This can be seen in the Developer Tools in Google Chrome. Any search
request will contain an HTTP header specifying that the browser accepts
SDCH compressed pages:

Accept-Encoding: gzip,deflate,sdch

And if SDCH is used then the server responds indicating the dictionary
that was used. If necessary Chrome will retrieve the dictionary. Since
the dictionary should change infrequently it will be in local web
browser cache most of the time. For example, here's a sample HTTP header
seen in a real response from a Google Search:

Get-Dictionary: /sdch/60W93cgP.dct

The dictionary file simply contains HTML (and JavaScript etc.) and the
compressed page contains references to parts of the dictionary file
using the VCDIFF format specified
in RFC 3284. The compressed page
consists mostly of COPY and ADD VCDIFF functions. A COPY x, y
instruction tells the browser to copy y bytes of data from position x in
the dictionary (this is how common content gets compressed and expanded
from the dictionary). The ADD instruction is used to insert uncompressed
data (i.e. those parts of the page that are not in the dictionary).

In a Google Search the dictionary is used to locally cache infrequently
changing parts of a page (such as the HTML header, navigation elements
and page footer).

SDCH has not achieved widespread acceptance because of the difficulty of
generating the shared dictionaries. Three problems arise: when to update
the dictionary, how to update the dictionary and prevention of leakage
of private information.

For maximum effectiveness it's desirable to produce a shared dictionary
that will be useful in reducing page sizes across a large number of page
views. To do this it's necessary to either implement an automatic
technique that samples real web traffic and identifies common blocks of
HTML, or to determine which pages are most viewed and compute
dictionaries for them (perhaps based on specialised knowledge of what
parts of the page are common across requests).

When automated techniques are used it's important to ensure that when
sampling traffic that contains personal information (such as for a
logged in user) that personal information does not end up in the
dictionary.

Although SDCH is powerful when used, these dictionary generation
difficulties have prevented its widespread use. The Apache mod_sdch
project is inactive and the Google SDCH group has been largely inactive
since 2011.

ESI

In 2001 a consortium of companies proposed addressing both latency and
common content with
ESI (Edge Side
Includes). Edge Side Includes work by having a web page creator identify
unchanging parts of the page and then making these available as separate
mini-pages using HTTP.

For example, if a page contains a common header and navigation, a web
page author might place that in a separate nav.html file and then in a
page they are authoring enter the following XML in place of the header
and navigation HTML:

ESI is intended for use with HTML content that is delivered via a
Content Delivery Network and major CDNs were the sponsor of the original
proposal.

When a user retrieves a CDN managed page that contains ESI components
the CDN reconstructs the complete page from the component parts (which
the CDN will either have to retrieve, or, more likely, have in cache
since they change infrequently).

The CDN delivers the complete, normal HTML to the end user, but because
the CDN has access nodes all over the world the latency between the end
user web browser and the CDN is minimized. ESI tries to minimize the
amount of data sent between the origin web server and the CDN (where the
latency may be high) while being transparent to the browser.

The biggest problem with adoption of ESI is that it forces web page
authors to break pages up into blocks that can be safely cached by a CDN
adding to the complexity of web page authoring. In addition, a CDN has
to be used to deliver the pages as web browsers do not understand the
ESI directives.

The time dimension

The SDCH and ESI approaches rely on identifying parts of pages that are
known to be unchanging and can be cached either at the edge of a CDN or
in a shared dictionary in a web browser.

Another approach is to consider how web pages evolve over time. It is
common for web users to visit the same web pages frequently (such as
news sites, online email, social media and major retailers). This may
mean that a user's web browser has some previous version of the web page
they are loading in its local cache. Even though that web page may be
out of date it could still be used as a shared dictionary as components
of it are likely to appear in the latest version of the page.

For example, a daily visit to a news web site could be speeded up if a
web server were only able to send the differences between yesterday's
news and today's. It's likely that most of the HTML of a page like the
BBC News homepage will have remained unchanged; only the stories will be
new and they will only make up a small portion of the page.

CloudFlare looked at how much dynamically generated pages change over
time and found that, for example, reddit.com changes by about 2.15% over
five minutes and 3.16% over an hour. The New York Times home page
changes by about 0.6% over five minutes and 3% over an hour. BBC News
changes by about 0.4% over five minutes and 2% over an hour. With delta
compression it would be possible to turn those figures directly into a
compression ratio by only sending the tiny percentage of the page that
has changed. Compressing the BBC News web site to 0.4% is an enormous
improvement compared to gzip's 20-25% compression ratio meaning that
116KB would result in just 464 bytes transmitted (which would likely all
fit in a single TCP packet requiring a single round trip).

This delta method is the essence of RFC
3229 which was written in 2002.

RFC 3229

This RFC proposes an extension to HTTP where a web browser can indicate
to a server that it has a particular version of a page (using the value
from the ETag HTTP header that was supplied when the page was previously
downloaded). The receiving web server can then apply a delta compression
technique (encoded using VCDIFF discussed above) to send only the parts
that have changed since that particular version of the page.

The RFC also proposes that a web browser be able to send the identifiers
of multiple versions of a single page so that the web server can choose
among them. That way, if the web browser has multiple versions in cache
there's an increased chance that the server will have one of the
versions available to it for delta compression.

Although this technique is powerful because it greatly reduces the
amount of data to be sent from a web server to browser it has not been
widely deployed because of the enormous resources needed on web servers.

To be effective a web server would need to keep copies of versions of
the pages it generates in order that when a request comes in it is able
to perform delta compression. For a popular web site that would create a
large storage burden; for a site with heavy personalization it would
mean keeping a copy of the pages served to every single user. For
example, Facebook has around 1 billion active users, just keeping a copy
of the HTML of the last time they viewed their timeline would require
250TB of storage.

CloudFlare's Railgun

CloudFlare's Railgun is a
transparent delta compression technology that takes advantage of
CloudFlare's CDN network to greatly accelerate the transmission of
dynamically generated web pages from origin web servers to the CDN node
nearest end user web surfers. Unlike SDCH and ESI it does not require
any work on the part of a web site creator and unlike RFC 3229 it does
not require caching a version of each page for each end user.

Railgun consists of two components: the sender and the listener. The
sender is installed at every CloudFlare data center around the world.
The listener is a software component that customers install on their
network.

The sender and listener establish a permanent TCP connection that's
secured by TLS. This TCP connection is used for the Railgun protocol.
It's an all binary multiplexing protocol that allows multiple HTTP
requests to be run simultaneously and asynchronously across the link. To
a web client the Railgun system looks like a proxy server, but instead
of being a server it's a wide-area link with special properties. One of
those properties is that it performs compression on non-cacheable
content by synchronizing page versions.

Each end of the Railgun link keeps track of the last version of a web
page that's been requested. When a new request comes in for a page that
Railgun has already seen, only the changes are sent across the link. The
listener component make an HTTP request to the real, origin web server
for the uncacheable page, makes a comparison with the stored version and
sends across the differences.

The sender then reconstructs the page from its cache and the difference
sent by the other side. Because multiple users pass through the same
Railgun link only a single cached version of the page is needed for
delta compression as opposed to one per end user with techniques like
RFC 3229.

For example, a test on a major news site sent 23,529 bytes of gzipped
data which when decompressed become 92,516 bytes of page (so the page is
compressed to 25.25% of its original size). Railgun compression between
two version of the page at a five minute interval resulted in just 266
bytes of difference data being sent (a compression to 0.29% of the
original page size). The one hour difference is 2,885 bytes (a
compression to 3% of the original page size). Clearly, Railgun delta
compression outperforms gzip enormously.

For pages that are frequently accessed the deltas are often so small
that they fit inside a single TCP packet, and because the connection
between the two parts of Railgun is kept active problems with TCP
congestion avoidance are eliminated.

Conclusion

The use of external dictionaries of content is a powerful technique that
can achieve much larger compression ratios that the self-contained gzip
method. But only CloudFlare's Railgun implements delta compression in a
manner that is completely transparent to end users and website owners.