Facebook releases internal C++ libraries as open source

The libraries, called Folly, are distributed under the Apache license.

Facebook is liberating a large collection of libraries that it uses internally for C++ development. The code is available from a public GitHub repository where it is distributed as open source under the permissive Apache Software License.

The assortment of frameworks is collectively called Folly, the Facebook Open Source Library. Its individual components support a diverse spectrum of capabilities, ranging from general-purpose programming functionality to more specialized pieces that are designed to help developers wring extra performance out of complex applications.

Among many other things, the Folly libraries simplify concurrency, string formatting, JSON manipulation, benchmarking, and iterating over collections. They also offer optimized drop-in replacements for several C++ standard library classes, including std::string.

As I learned when I visited Facebook’s headquarters earlier this year, open source software is an important part of Facebook’s infrastructure and development culture. The company contributes to a number of major projects such as Hadoop and memcached. It has also released some key pieces of its internal software stack, such as the Cassandra database server and Thrift RPC framework.

When Facebook wants to open a piece of software that has developed internally, the company must first isolate the component so that it can be used by third parties without depending on other proprietary Facebook code. The challenge of disentangling pieces of infrastructure for standalone consumption is an obstacle that hinders Facebook’s efforts to open more of its stack.

Releasing the company’s internal C++ libraries will make it easier to share additional software that depends on this code. Although the desire to get some critical Facebook dependencies out in the open is the primary motivation, the Folly code itself is also likely going to be useful for a number of C++ developers.

When the talks were announced, the prospect of a lock-free hashmap was tantalizing. I was hoping for a C++ version of Cliff Click's lockless hashmap. Sadly, this is not it. And the rest I don't really care about anyway.

Lock-free code has interesting and important repercussions for correctness (it can often be worse from a performance standpoint, but that is a lesser consideration much of the time), and can make some problems that would otherwise be beastly trivial to solve.

A big part of the reason this is cool is because it is opening the way for Facebook to release other code. I'm sure there's a bunch that has over time been filed under "we can't publish this, it's dependent on half a dozen random internal bits and pieces and we don't have time to figure out whether all of those are okay to put out there."

Someone just took the time for a bunch of the "random bits and pieces" separately. Now, other groups working internally can think like this: "if we can strip out all the non-folly dependencies, then we can push this code to the world." Also, from now on, by having a random open source catch-basin for internal code that doesn't really involve any secret sauce, but isn't interesting enough on its own for anyone to care, it becomes possible to build new stuff in the open from the start without having to do it all that differently vs. if it were being built as a purely internal code base.

I'd like to see more companies adopting this model for their inevitable collections of mostly-not-particularly-interesting internal "this is how we do things here" code. The fact that some of this code may actually be interesting in its own right to certain audiences is really just a bonus.

Facebook is still in infancy of software development. It is good that Facebook releases this and things like HipHip VM but these are already solved problems. Maybe Facebook will create something like MapReduce or JVM in future.

The only class I looked at was the folly::MicroLock. It left me feeling nervous. It is a pure spin lock, which are rarely a good idea in user code and are rapidly becoming a worse idea. In fact, just yesterday I did a blog post listing a significant set of reasons why pure spin locks are a terrible idea.

Facebook is still in infancy of software development. It is good that Facebook releases this and things like HipHip VM but these are already solved problems. Maybe Facebook will create something like MapReduce or JVM in future.

You know, working code that solves a problem is always nice. Facebook released a nice, compact, probably-useful library. It may save me from writing some (error-prone) code. I genuinely thank them for that.

I'd also like to point out that, while your comment holds up Google and Sun as examples of true innovation, neither MapReduce nor the JVM were particularly new ideas.

MapReduce has been around since the 70s (maybe the 60s) as two fundamental operations in Lisp. It's popped up time and again in the academic literature associated with parallel computing (a large body of literature which has existed since at least the 80s). Google reminded us that it was useful, and showed us how it solved their particular problems. They deserve enormous credit for that, but they didn't exactly "create" MapReduce.

Similarly, the JVM wasn't really a new concept. Virtual machines have existed for decades. In 1978, UCSD released their p-code machine, which may be the first, but I'm not sure. Even the JVM's just-in-time compilation stuff was borrowed from innovations made on compilers for the SELF programming language at Xerox PARC and Stanford (see http://en.wikipedia.org/wiki/Self_%28pr ... anguage%29). Sun made the concepts of JVM and JIT wide-spread and popular, and thus more useful. They deserved enormous credit for that, but not for the core concepts.

This industry periodically forgets things, rediscovers them, and treats the rediscovery like it's some kind of major new innovation. As a physical scientist (a computational chemist), I find this perplexing and funny, and I wish CS would do more to "stand on the shoulders of giants" (apologies to Newton) and less to spin a rediscovery as true invention.

If you want real innovation, go way back to the early days. Look at IBM's System/360 and its invention of the first compiler (for Fortran). Look at the development of Lisp in places like MIT's AI Lab. Look at the development of Smalltalk, the GUI, and other things at Xerox PARC. Look at all of the goodness that emerged from Bell Labs.

Those places were truly innovative. That era is over. The pseudo-monopolies that funded IBM Watson, Xerox PARC, and Bell Labs are basically gone, as is the long-term thinking that drove their funding. Much of that work has moved into the academic arena, for better or worse. What Google is doing is really good, but I'm not sure it's in the same league as those giants of yore.

But, my point is that you actually do stand on the shoulders of giants. You should respect them, know them, and build from them. Facebook released a nice library that builds on the STL and Boost libraries. It's pretty low-level, but it solves a few problems and provides some nice performance tweaks. It does something useful. Respect it for that, and respect that Facebook released their library into the world.

A bit of spinning, maybe. But spinning until the lock is acquired? That requires far more disclaimers than they listed.

I can understand your views there about the disclaimers not being strident enough, but I think you underestimate the usefulness of userspace spinlocks in writing high-throughput lock-free concurrent code. For example, the AtomicHashMap in folly uses a spinlock per hash map entry, so in real world use there can be billions of them. Furthermore, they are only ever contended on hash collisions, so if that happens much things are in bad shape in general and the contention is not the biggest issue.

You can argue against the usefulness-in-general of the MicroSpinLock idea, but that is not a good reason to keep it under wraps when it is a required component of something like AtomicHashMap which is actually pretty handy and good to have available to the public.