Ever wondered how the Wikimedia servers are configured?

Well, wonder no longer! To configure the Wikimedia servers, we use Puppet, a configuration management system, which lets us write code that manages all of our servers like a single large application. Of course, to really know how our servers are configured, you’d need to see our Puppet configuration.

Good news: we’ve just released our Puppet configuration in a public Git repository.

What is and isn’t included

Basically everything is included in the repository. We spent a few weeks removing private and sensitive things from the repository, though. We have these in a private repository that is only available to Wikimedia staff and volunteers with root access.

This, of course, means that the puppet configuration, as released, won’t completely work. The public repository makes references to files and manifests in the private repository. To make the repository work, you’ll need to fill in the missing information. There isn’t very much in the private repository, though, so that task should be fairly easy.

The point of making this repository public

We have a couple reasons for making this repository public:

It shares knowledge with the world

It lets us treat operations like a software development project

Both reasons align with our mission, but we were already mostly sharing this knowledge via wikitech. The second reason aligns more closely with our mission, as it allows us to let the world be directly involved in our operations efforts.

Labs and community oriented operations

The release of this Puppet repository is the first step in the Wikimedia Test/Dev Labs project. We’ll be going further than just making the repository readable by the world. Part of the Test/Dev Labs project is to create a clone of our production cluster. This clone will run a branch of the puppet repository.

Staff and community developers, and staff and community operations engineers will be able to push changes to the test branch of the Puppet repository, which will manage the cloned cluster. They’ll then be able to push these changes for review to the production branch of the Puppet repository. The staff operations engineers can then code-review the changes and push the changes out to the production systems.

Like the Wikimedia content, the site interface, and the site’s software (MediaWiki), community members will be able to edit the site’s architecture as well.

Accessing the repository

Since this is a public Git repository, you can do an anonymous git clone like so:

Hmm… I was able to clone when I switched to an almost-identical machine, but with newer software. In particular, the machine with git 1.5.6.5 couldn’t clone, but the machine with git 1.7.2.5 could. That might be it, Yuri.

Depending on what version of puppet you’re using, you could also use extlookup; store the values in either host- or domain- or site-specific csv files, use extlookup(“super-secret-password”) and then you can simply store the extlookup data in a separate repo. That way you could still publish a sensible “common.csv” with such amusing fields as:

Great to see you using Puppet, and publishing your recipies! I work on the puppet packages for Debian, I hope you find them useful.

I noticed that you have not embraced the best practices model of using modules and instead have everything in manifests. Modules is a really interesting way of abstracting out your puppet modules that opens the door for great collaboration with other groups who are also working on module development. You can share development on abstract pieces of your infrastructure, without having to give any access. Its a very interesting way to be involved in collaborative development of your infrastructure, in ways that were never possible before.

We did the same 1 year ago for the mageia project ( for the same type of reason, plus a couple others like transparency and trust, and to help us recruit sysadmin ), and we faced the same issue for password.

And so we used extlookup with a default value of x, so anybody could test ( in a insecure way ). I heard that hiera was also something nice for that, you could publish a default file, and override it without fiddling with git.

Random idea: why don’t we publish boilerplate versions of the private files with placeholders for passwords etc such that you can replace those placeholders with your own private data and have things Just Work?

Recent Posts

The Wikimedia Foundation, Inc. Is a nonprofit charitable organization dedicated to encouraging the growth, development and distribution of free, multilingual content, and to providing the full content of these wiki-based projects to the public free of charge.

The Wikimedia projects have an international scope, and the Wikimedia movement has already made a significant impact throughout the world. To continue this success on an organizational level, Wikimedia is building an international network of associated organizations.