HTTP Compression

HTTP compression, otherwise known as content encoding, is a publicly defined way to compress textual content transferred from web servers to browsers. HTTP compression uses public domain compression algorithms, like gzip and compress, to compress XHTML, JavaScript, CSS, and other text files at the server. This standards-based method of delivering compressed content is built into HTTP 1.1, and most modern browsers that support HTTP 1.1 support ZLIB inflation of deflated documents. In other words, they can decompress compressed files automatically, which saves time and bandwidth.

Stephen Pierzchala, Senior Technical Performance Analyst with Gomez, said this about HTTP compression:

"When tied to other methods, such as proper caching configurations and the use of persistent connections, HTTP compression can greatly improve Web performance. In most cases, the total cost of ownership of implementing HTTP compression (which for users of some Web platforms is nothing!) is extremely low, and it will pay for itself in reduced bandwidth usage and improved customer satisfaction."

The Browser / Server Conversation

Browsers and servers have brief conversations over what they'd like to receive and send. Using HTTP headers, they zip messages back and forth over the ether with their content shopping lists. A compression-aware browser tells servers it would prefer to receive encoded content with a message in the HTTP header like this:

Now the client knows that the server supports gzip content encoding, and it also knows the size of the file (content-length). The client downloads the compressed file, decompresses it, and displays the page. At least, that is the way it is supposed to work.

Browsers Can Lie

Unfortunately, some early versions of Netscape 4 say they support ZLIB inflation when they really can't. Rather than rely on the content negotiation built into Apache and IIS, most webmasters install software specifically designed to make this conversation an amicable one. Products like mod_gzip, Vigos' Website Accelerator, PipeBoost, httpZip, and others offer configurable compression that can avoid browser quirks.

Average Compression Ratios

So what can you expect to save using HTTP compression? In tests that we ran on twenty popular sites we found that on average, content encoding saved 75% off of text files (HTML, CSS, and JavaScript) and 37% overall.1 A similar study of 9,281 HTML pages of popular sites by Destounis et. al found a mean compression gain of 75.2%.2 On average, web compression reduced the text files tested to one-fourth of their original size.3 The more text-based content you have, the higher the savings.

Joe Lima, COO and Head of Product Development at Port80 Software, said this about HTTP compression:

"HTTP compression provides such a clear benefit that it appeals to all kinds of users. Our customers include consumer sites that want to improve end-users' experience, hosting providers seeking to differentiate their offering, Fortune 500's looking to make a specific extranet application as bandwidth-efficient as possible, and many others. Simply put, compression is easy to deploy, widely supported, and saves money. Who could say no to that?"

File Size Savings for Sites Using HTTP Compression

Here are three examples from popular sites that use HTTP compression. Google and Orbitz both use gzip compression to deliver compressed versions of their pages to HTTP 1.1-compliant browsers. Table 1 shows the size of their home pages plus one search results page before and after compression.

Table 1: HTTP Compression with Google and Orbitz (file size in bytes)

Note: These figures do not include HTTP header size, just the HTML size.

Typical savings on compressed text files range from 60% to 85%, depending on how redundant the code is. Some JavaScript files can actually be compressed by over 90%. Webmasters who have deployed HTTP compression on their servers report savings of 30 to 50% off of their bandwidth bills. Compressed content also speeds up your site by requiring smaller downloads. The cost of decompressing compressed content is small compared to the cost of downloading uncompressed files. On narrowband connections with faster computers CPU speed trumps bandwidth every time.

3Compression efficiency depends on the repetition of content within a given file. Smaller files have fewer bytes, and therefore a lower probability of repeated bytes. As file size increases compression ratios improve because more characters means more opportunities for similar patterns. The above tests ranged from a 13,540 byte mean (Destounis 2001) to 44,582 bytes per HTML page (King 2003). Smaller files (5,000 bytes or less) typically compress less efficiently, while larger files typically compress more efficiently. The more redundancy you can build into your textual data (HTML, CSS, and JavaScript) the higher your potential compression ratio. That's why using all lowercase letters improves compression in XHTML.

Further Reading

Chapter 18 of Speed Up Your Site shows how to set up HTTP compression on Apache and IIS servers and evaluates the available compression software. Lists software and hardware compression tools for web compression.