Google Offers Better Web Compression—at a CPU Price

Google’s new Zopfli compression algorithm promises to compress Web pages further than the commonly-used gzip algorithm, but with a correspondingly higher computational load.

Thanks to those attributes, Google is recommending the new algorithm for compressing static and highly-trafficked data such as static Webpage elements, although Zopfli (named for a Swiss bread recipe) could also be used for other purposes. The algorithm is being released for free, open-sourced under the Apache 2.0 license. A PDF describing the algorithm’s tradeoffs is available on Google’s Website.

Some might find the tradeoffs disproportionate: Zopfli only offers additional compression of 3 to 8 percent, but with an additional computational overhead of two to three orders of magnitude. Google found that, on average, Zopfli required 454 seconds to compress its corpus of data, as opposed to 5.60 seconds for the more popular gzip -9 algorithm. The time needed to un-compress the data by the client browser remains roughly static, however, with just 2 percent differences between the algorithms—meaning that any changes will be transparent to a Web site’s customers.

“We could achieve faster results with gzip and other algorithms by specifying lower compression density options,” the paper stated. “In this study we are interested on finding the smallest possible compressed size, and because of this we have only run every algorithm with maximum compression options. Zopfli also can run even longer to achieve slightly higher compression density, but we chose to run it with default settings.”

Zopfli is based on the Deflate compression algorithm, used in and bit-stream compatible with gzip, Zip, PNG, HTTP requests, and other algorithms, Lode Vandevenne, the Google software engineer who developed Zopfli in his “20 percent” time within Google, wrote in a blog post.

The “exhaustive method is based on iterating entropy modeling and a shortest path search algorithm to find a low bit cost path through the graph of all possible deflate representations,” Vandevenne added.

Google used several corpora to test the algorithms: downloading the home pages of the 10,000 most popular Websites; the Calgary Corpus, a collection of small text and binary data files; the Canterbury Corpus, designed for lossless data compression; and the enwik8 collection of 100 Mbytes of Wikipedia data.

Minimizing data traffic naturally saves cost, especially in both power and throughput. But will data center operators be willing to trade off additional CPU utilization? It’s not a simple question, and will have to be answered by on a company-by-company basis. If nothing else, though, Google has added another tool by which data centers can be further optimized.