CloudFlare blows hole in laws of Web physics with Go and Railgun

4Chan, Imgur bandwidth needs halved by new compression written in Google's Go.

Today, a large collection of Web hosting and service companies announced that they will support Railgun, a compression protocol for dynamic Web content. The list includes the content delivery network and Web security provider CloudFlare, cloud providers Amazon Web Services and Rackspace, and thirty of the world’s biggest Web hosting companies.

Railgun is said to make it possible to double the performance of websites served up through Cloudflare’s global network of data centers. The technology was largely developed in the open-source Go programming language launched by Google; it could significantly change the economics of hosting high-volume websites on Amazon Web Services and other cloud platforms because of the bandwidth savings it provides. It has already cut the bandwidth used by 4Chan and Imgur by half. “We've seen a ~50% reduction in backend transfer for our HTML pages (transfer between our servers and CloudFlare's),” said 4Chan’s Chris Poole in an e-mail exchange with Ars. “And pages definitely load a fair bit snappier when Railgun is enabled, since the roundtrip time for CloudFlare to fetch the page is dramatically reduced. We serve over half a billion pages per month (and billions of API hits), so that all adds up fairly quickly.”

Rapid cache updates

Like most CDNs, CloudFlare uses caching of static content at its data centers to help overcome the speed of light. But prepositioning content on a forward server typically hasn’t helped performance much for dynamic webpages and Web traffic such as AJAX requests and mobile app API calls, which have relatively little in the way of what’s considered static content. That has created a problem for Internet services because of the rise in traffic for mobile devices and dynamic websites.

“The Web is changing,” said CloudFlare CEO Matthew Prince, “with more sophisticated websites that are API-driven like true applications. And more and more content is being viewed through mobile apps—it’s not a traditional webpage that people are developing now. And the last-generation CDN tricks become less effective for API-driven sites and mobile applications.”

Railgun attacks that problem by replacing the standard HTTP connection between CloudFlare’s customer websites and their data centers with a binary protocol. The protocol is used to communicate between a “listener” application that sits on the customer’s end and “sender” software that resides at whichever CloudFlare data center is closest to the end user that is the source of the API or page request.

When the CloudFlare software receives the request, it takes the last version of the page or API response it has kept in memory and creates a hash value from it. The hash value gets sent back to the Listener, which finds the version of the page or API response that matches the hash value. It then compares that version to the current page or API response, sending back binary instructions on what has changed. This lets the CloudFlare software construct the appropriate response and send it to the end user.

The hosting provider partners will offer the Listener capability to their customers with a single click, hosting the protocol for them without the need for any changes to servers; CloudFlare will provide software images for Amazon and RackSpace customers to install.

Over time, the two components build a dictionary representing chunks of content that “can get crazy compression ratios of 99 percent,” according to by John Graham-Cumming, the CloudFlare programmer who led the Railgun effort. “You can make references into the dictionary; if the page hasn’t changed at all, the server side could say, ‘Nothing’s changed, send the whole dictionary out.’“ Small changes in the page could be sent in messages as small as 5 bytes, he added. “That translates directly into bandwidth savings.”

Going with Go

The approach, which is similar in many respects to video compression techniques, was developed over the past year. Graham-Cumming said that the goal was to make sure that customers didn’t have to write any specially developed code for Railgun to work, as is needed to support Akamai’s Edge Side Includes (ESI) approach to doing dynamic content compression. He chose Go for the project because of its approach to handling concurrent sequential processing. “I was interested in it because we’re getting multicore machines everywhere and wanted to be able to exploit that parallelism. If it was written in C++, it would be threaded code.”

Graham-Cumming also found he was comfortable with Go in part because of one of the language’s sources of inspiration—Communicating Sequential Processes (CSP), the formal programming language devised by Sir Tony Hoare, who was Professor of Computing at Oxford when Graham-Cumming studied there.

“Go is very light,” he said, “and it has fundamental support for concurrent programming. And it’s surprisingly stable for a young language. The experience has been extremely good—there have been no problems with deadlocks or pointer exceptions.” But the code hit a bit of a performance bottleneck under CloudFlare’s heavy loads, particularly because of its cryptographic modules—all of Railgun’s traffic is encrypted from end to end. “We swapped some things out into C just from a performance perspective," Graham-Cumming said.

“We want Go to be as fast as C for these things,” he explained, and in the long term he believes Go’s cryptographic modules will mature and get better. But in the meantime, “we swapped out Go’s native crypto for OpenSSL,” he said, using assembly language versions of the C libraries.

“We think we must be one of the groups stressing Go the most,” Graham-Cumming said of the CloudFlare team. As a result of that work, they’ve submitted a number of fixes for the language’s core, including a fix for its logging functions and an external library for memcache.

To keep performance at its highest possible level, all of Railgun’s dictionary cache is kept in memory on the servers of CloudFlare cluster (for a full description of CloudFlare’s architecture, see our story on the company’s ten-data center rollout earlier this year). “We have hundreds of machines with cache on them for Railgun,” Graham-Cumming said.

Those bound to benefit most from Railgun are high-traffic sites running on Amazon Web Services and other cloud providers who charge based on bandwidth. One of those Amazon customers is Imgur, which uses AWS for much of its dynamic content. According to Prince, based on CloudFlare’s testing with hundreds of mobile applications and websites, Railgun has yielded an average 50 percent reduction in bandwidth usage and a 90 percent reduction in the “time to first byte” for Web and mobile clients. “If you’re on AWS, it’s a no brainer,” said Graham-Cumming. “It will pay for itself in no time.”

Graham-Cumming also found he was comfortable with Go in part because of one of the language’s sources of inspiration—Communicating Sequential Processes (CSP), the formal programming language devised by Sir Tony Hoare, who was Professor of Computing at Oxford when Graham-Cumming studied there.

This is a fragment and doesn't make any sense.

Really exciting stuff. It''s amazing that there is still so much optimization to be done on something you'd think had been figured out by now.

edit--So it's not a fragment, just a comma splice and really ugly sentence construction. My bad.

Why not? Go's a fairly simple language, and shouldn't be too alien if you're already familiar with some C-like languages. The concurrency is dead easy as well. You could probably get a good grasp of the language in an afternoon.

CloudFlare uses caching of static content at its data centers to help overcome the speed of light.

Nice article on the whole, but that sentence came out of nowhere and hung me up for a moment while I figured out what it meant. I understand what it means, but some previous reference to the speed of light being a limiting factor would (IMHO) have made things a lot smoother.

Graham-Cumming also found he was comfortable with Go in part because of one of the language’s sources of inspiration—Communicating Sequential Processes (CSP), the formal programming language devised by Sir Tony Hoare, who was Professor of Computing at Oxford when Graham-Cumming studied there.

What platforms do you prefer working on/need to support? Go's developers and core community are almost entirely unix centric. As a result despite being theoretically supported on Windows the few of my friends who've tried it in the past have complained about its tool chain being chronically broken on the platform and no one caring.

Graham-Cumming also found he was comfortable with Go in part because of one of the language’s sources of inspiration—Communicating Sequential Processes (CSP), the formal programming language devised by Sir Tony Hoare, who was Professor of Computing at Oxford when Graham-Cumming studied there.

This is a fragment and doesn't make any sense.

Looks fine to me.

It's not fine, but if you replace the m-dash with "was" it becomes fine.

Graham-Cumming also found he was comfortable with Go in part because of one of the language’s sources of inspiration—Communicating Sequential Processes (CSP), the formal programming language devised by Sir Tony Hoare, who was Professor of Computing at Oxford when Graham-Cumming studied there.

This is a fragment and doesn't make any sense.

Looks fine to me.

It's not fine, but if you replace the m-dash with "was" it becomes fine.

It does look fine from a technical standpoint. You skipped over the first "of" in the sentence. Without that "of", then you'd be right.

What platforms do you prefer working on/need to support? Go's developers and core community are almost entirely unix centric. As a result despite being theoretically supported on Windows the few of my friends who've tried it in the past have complained about its tool chain being chronically broken on the platform and no one caring.

Some Windows people I know chronically complain about everything that doesn't have a GUI or follows any convention that's not the win32 convention.

We use Go on win64 with no problems - It has fewer problems than say git on Windows, which IMO works just fine on Windows.

We use Go on win64 with no problems - It has fewer problems than say git on Windows, which IMO works just fine on Windows.

I've not worked with the Windows version of Go much lately, but I did work with it very early on (pre 1.0 days) and it worked rather well for me as well. I'm sure there's been issues unique to the Windows version, and less interest in the Windows version, but back when I used it, the developers of Go seemed quite interested in keeping the Windows version a first-class citizen.

As a result despite being theoretically supported on Windows the few of my friends who've tried it in the past have complained about its tool chain being chronically broken on the platform and no one caring.

We've been using Go on Windows (XP SP2 32-bit) since the first Windows support was available and have thousands of lines of code / thousands of hours of coding put into Go-based projects. In the early days (pre- Go 1.0) there were occasional Windows compatibility issues.

Since the release of Go 1.0 (and even a little before that) our experience -- heavy numerical computations, concurrency, and file system features, with some networking -- has been seamless. We have hit absolutely no Windows-specific shortcomings in spite of heavy daily use since the official release of Go 1.0 a year ago.

When I looked at Go awhile back what I found was it was a neat language with next to no library support... e.g. if you wanted database support, you had a choice of several alpha-quality MySQL bindings in various stages of incompletion. No standardized abstract DB interface, etc.

Has this changed meaningfully yet?

I am not a huge fan of PHP but if it did one thing well, it was bring a thorough "official" set of extensions for the tools real people use every day. (I'm not saying the extensions themselves were great -- far from it -- only that if you did a project in PHP you didn't spend three days tracking down and building the ad-hoc PHP bindings for every library you need to touch with that project.)

So this is basically people using the entity tag with some kind of 'diff' going on. I don't see why this is needed. It could already be implemented with ETags and the byte serving range headers.

Indeed.

One also wonders to what degree this really exceeds the savings you typically get using GZIP (LZH) compression to the client. It seems like the main advantage is maintaining a dictionary outside of the context of a connection, but it doesn't benefit the client side of things at all. I've found that with our highly web-service intensive and interactive web UIs that we can already achieve something close to 90% bandwidth reduction using GZIP (connections are fairly long lived when using web workers for instance, and initial page load is better than what Railgun would achieve, at least from the client's perspective).

So this is basically people using the entity tag with some kind of 'diff' going on. I don't see why this is needed. It could already be implemented with ETags and the byte serving range headers.

Indeed.

One also wonders to what degree this really exceeds the savings you typically get using GZIP (LZH) compression to the client. It seems like the main advantage is maintaining a dictionary outside of the context of a connection, but it doesn't benefit the client side of things at all. I've found that with our highly web-service intensive and interactive web UIs that we can already achieve something close to 90% bandwidth reduction using GZIP (connections are fairly long lived when using web workers for instance, and initial page load is better than what Railgun would achieve, at least from the client's perspective).

The problem with gzip is it does not place nicely with partial content. I also investigated the performance of using gzip on binary content, and it actually slows servers down more than it speeds it up especially when the content is binary. Only certain content-types should be gzipped. It also increased the overall transmission time from server content to client content: server read, compress, transmit, client decompress. This resulted in noticeable increased lag for anything other than trivial assets.

The partial updates will make an improvement where gzip cannot apply but I'd prefer they use existing mechanisms. All the server has to do is diff the contents of EtagAv1 and EtagAv2 and then use range to re-write the changed content with the updated etag.

So this is basically people using the entity tag with some kind of 'diff' going on. I don't see why this is needed. It could already be implemented with ETags and the byte serving range headers.

Indeed.

One also wonders to what degree this really exceeds the savings you typically get using GZIP (LZH) compression to the client. It seems like the main advantage is maintaining a dictionary outside of the context of a connection, but it doesn't benefit the client side of things at all. I've found that with our highly web-service intensive and interactive web UIs that we can already achieve something close to 90% bandwidth reduction using GZIP (connections are fairly long lived when using web workers for instance, and initial page load is better than what Railgun would achieve, at least from the client's perspective).

The problem with gzip is it does not place nicely with partial content. I also investigated the performance of using gzip on binary content, and it actually slows servers down more than it speeds it up especially when the content is binary. Only certain content-types should be gzipped. It also increased the overall transmission time from server content to client content: server read, compress, transmit, client decompress. This resulted in noticeable increased lag for anything other than trivial assets.

The partial updates will make an improvement where gzip cannot apply but I'd prefer they use existing mechanisms. All the server has to do is diff the contents of EtagAv1 and EtagAv2 and then use range to re-write the changed content with the updated etag.

So if y'all are such geniuses with this, why aren't they writing this article about you?

This whole story seems incredibly unlikely. Think of it... If you could achieve 99% compression, then you're doing it wrong in the first place!!! OR think about it this way; if you build up a response "dictionary", you can't do better than Huffman encoding... Are we really SO BAD at compressing dynamic web content at the moment? For almost ten years, I've been including response strings in an indexed Javascript array, which is cached client-side so that I can just send a message number and the client will read the message from the cached Javascript file. How does "GO" and "CloudFlare" improve upon that? Where's the innovation? This is just basic science...

Sean Gallagher / Sean is Ars Technica's IT Editor. A former Navy officer, systems administrator, and network systems integrator with 20 years of IT journalism experience, he lives and works in Baltimore, Maryland.