Opscode guts Chef control freak to scale it to 10,000 servers

Opscode is in a race with Puppet Labs to become a next-generation management tool, and its latest Chef product, which does configuration, change, and cloud management, is used by some of the name-brand hyperscale cloud application operators out there. As part of the launch of the Chef 11 tool, Facebook is outing itself as a customer, joining the ranks of Amazon and Google, and tens of thousands of other IT shops of all shapes and sizes, which already use code.

"Chef was the only automation solution flexible enough to bend to our scale dynamics without requiring us to change our workflow," Phil Dibowitz, production engineer at Facebook, is quoted as saying in the Chef 11 presentation that El Reg was given as part of its briefing on the new tools. "Private Chef's basis on open-source Chef also aligns with our own open philosophy allowing us to contribute back to the greater Chef community."

That's a pretty big endorsement, and in fact, Christopher Brown, CTO at Opscode, tells El Reg that the demands of Facebook and other large-scale Web operators is why the techies at Opscode changed the back-end system and database to make Chef 11 a lot more scalable than the prior release.

"Chef has been rebuilt from the ground up with Chef 11," explains Brown, who was brought in from Amazon expressly to do this reconstruction.

Brown was formerly the architect and lead developer of Amazon Web Services' foundational EC2 compute cloud and did a stint at as Microsoft's director of engineering for edge computing networks. As the CTO at Opscode, Brown works alongside Adam Jacob, the creator of Chef and chief customer officer at the company he co-founded more out of frustration with existing physical and virtual infrastructure management tools than out of a desire to start (another) business.

Chef creates what are called recipes to configure machines and cookbooks to manage a server's entire software stack. Unlike any good cook, it follows the recipe religiously every time without changes and is able to share recipes and therefore the means of passing the knowledge of how to configure a specific stack for a specific set of iron with other users of Chef or even outside the company walls.

Perhaps they should have called it Anti-Chef, because most real chefs mess with recipes and many don't share their secrets. But I digress. (Less than I used to, at least.)

Chef comes in three flavors. Opscode Chef is the open source code you can download and use. Private Chef is the enterprise edition with some extra features and tech support services behind it.

Then there's Hosted Chef is a version of Private Chef that Opscode runs on your behalf and, not accidentally, is used by a large number of customers at the same time and allows Opscode to test scalability limits and other features in real-time before finalizing them in each release. Facebook is using Private Chef, but it is benefiting from some of the work that was done in Hosted Chef to boost scalability.

The first thing to change with Chef 11 was the back end data store, which has been running on the CouchDB. That NoSQL data store is an Apache-licensed open source database that is coded in Erlang and that was created by Damien Katz, who worked on the Lotus Notes/Domino team at IBM.

The company was seeing how the Riak distributed database from Basho and the Cassandra NoSQL data store created by Facebook as its back end, but oddly enough Opscode is moving away from NoSQL and towards real SQL and has chosen the PostgreSQL relational database management system as the new back end.

"We took a look at a number of different data stores, and the modeling in a relational database was a better fit," says Brown. The back-end database has the open source Solr search engine bolted on to make elements of recipes searchable.

The Chef 10 server was written in Ruby with the Unicorn web server and the Merb framework, but this time around Chef 11 is written in Erlang, which Brown said was "highly concurrent and reliable" and hence a good choice.

CouchDB is also written in Erlang, and incidentally, with Couchbase Server, which is a derivative product done by a company called Couchbase where Katz now works, the NoSQL data store was ported from Erlang to C.

Anyway, the Erlang-based API stack at the heart of Chef 11 has an order of magnitude reduction in memory footprint compared to the Ruby version in Chef 10.

The upshot of all of these changes is that Chef 11 can manage up to 10,000 nodes from a single server, which is a factor of four more than the Chef 10 server could handle.

And the "Omnibus" installer, which was only available in the Private Chef enterprise edition or its Hosted Chef variant, in now much-improved and available on the open source version of Chef. This installer can put Chef agents on Windows and Linux servers quickly as well as on AWS, Rackspace Cloud, Google Compute Engine, Microsoft Azure, and other cloudy infrastructure. Opscode is now using the Pedant Test Suite to ensure that Chef works well against seven different Windows Server variants.

The new Opscode system control freak also has better change modeling, which allows you to see the effects of changes on the infrastructure before you cascade them over the physical and virtual servers.

In fact, Opscode is so confident in the open source Chef 11 version of tool and its services organization's ability to handle a volume of calls that it will now provide tech support services on the open source Chef for the first time along with the commercially supported Private Chef and Hosted Chef variants. Standard business hour support for the open source Chef 11 costs $3 per node per month, and premium 24x7 support costs $3.75 per node per month.

Private Chef and its hosted variant continues to have some goodies not in the open source version, since Opscode has to make some money somehow to appease its investors. This includes a graphical user interface (completely rewritten for the 11 release) to visualize and navigate those 10,000 nodes under management and an activity reporting dashboard to show historical and current data for nodes under management.

The Private and Hosted Chefs also have an on-demand command execution function called Push that, as the name suggests allows for admins to edit and execute code in real time on systems and to do so on thousands of nodes at the same time if need be.

Or you might use Push to scheduling compliance reporting or log polling for systems on portions of a cluster on a rolling basis rather than all at once. Private Chef also has role-based access controls and multi-tenancy to allow multiple system admins to manage nodes from the same Chef server.

With the Chef 11 launch, Opscode is shifting away from perpetual licensing to subscription pricing that is consistent with the support pricing it now has on the open source variant. Both Private Chef and Hosted Chef cost $6 per node under management per month.

Puppet Labs has raised $45.5m in four rounds of funding, and Opscode has raised $33m in three rounds of funding from Ignition Partners, Battery Ventures, and Draper Fisher Jurvetson. Both are getting traction because they are designed specifically for modern hyperscale infrastructure.

"We're seeing enterprises adopt hyperscale infrastructure, but there are some changes that have to take place," explains Jay Wampold, director of marketing at Opscode. "They have to change their tooling, of course, but we are also seeing IT shift its focus from being a back-office function to being a front-office function to deliver services to customers and users."

As for rival Puppet Labs getting a big bag of cash from VMware, Opscode is not worried. "I think it is great for the space and it is a great validation for the next generation of management tools," says Wampold.