Facebook open-sources Thrift, again, with fbthrift overhaul

Fork of original project, now at Apache, adds guts for bigger cloud services.

In a blog post published today, Facebook announced that it was releasing another version of the Thrift toolset—a collection of software libraries and code generation tools that can be used to automatically generate the client and server code for distributed applications.

“We didn’t make this a breaking change,” said Facebook infrastructure engineer and former Tumblr vice president of engineering Blake Matheny in an interview with Ars. “It can still interact with legacy Thrift applications. But the C++ in Apache Thrift is fairly different from this one—we basically did a bunch of work to improve the scalability with C++, which is important to us because a lot of our popular services are built with Thrift.”

It's not clear how, or even if, the changes will be incorporated into the existing Apache Foundation Thrift project, which was created from the code Facebook originally open-sourced under the Apache license in 2007. The new version, fbthrift, adds a number of new features aimed at handling larger, more complex collections of services, a new C++ code generator, and components aimed at creating services that are less memory-intensive and demand less of hardware when under heavy load.

Thrift takes a lot of the work out of creating remote procedure call (RPC) interfaces for distributed applications—the network communications between components that drive much of Facebook’s platform and other major Web applications, as well as the backend for many mobile applications. It can generate code in a number of languages besides C++, including Java, Ruby, Perl, Python, PHP and C#, allowing developers to focus on the actual processing and presentation code. Thrift uses a simple interface definition file format that describes the data structures to be sent and received by services, and it uses the description to assemble a set of libraries together as generated code.

Thrift was originally designed for simple RPC services, and it can only handle in-order operations “The original version suffered from head-of-the-line blocking,” Matheny said. “You’d get requests back in the same order they were sent.” Because some requests take longer than others, Thrift-based applications can take a big performance hit while waiting for longer requests to be processed. “So we added support for out-of-order operations, so you get back the first one that’s ready.”

To make asynchronous request handling work better, Facebook engineers had to improve the memory handling capabilities of the generated C++ code. Thrift’s original C++ generated code reused the same memory space over and over for each request—which made it impossible for it to process requests out of order. So the Facebook engineering team rolled in a library from the open source folly library called IOBuf that requests new buffers for each request, with some optimization to reduce the performance hit that it creates.

Another key feature added is a new header protocol, called THeader, that allows new features to be added to services without breaking compatibility with existing Thrift services. “At Tumblr we had our own version of this,” said Matheny. “It can be used for specifying metadata information about requests, to do distributed tracing, and signal how overloaded a resource is. So we’re open sourcing it and putting a version out there that people can use as a reference implementation or that we can work with Apache to integrate into Thrift.”