Paul Hammant's Blog:
Googlers Subset their Trunk

Jason Leyba spoke at QCon in San Francisco at the end of last year, and Jez Humble snapped a pic of a pertinent slide (I’ve straightened it out a little):

Sounds a bit unmanageable right? Not to them, there’s method in the madness, and it’s all optimized for maximum developer throughput while incorporating code reviews, code reuse, and quickest possible CI.

A single monolithic tree for 2000+ apps?

Though Android is held natively in Git, the bulk of Google’s source is (or was in 2009) in a single trunk in Perforce. That’s all the source for all apps, regardless of differences in their technologies and deployment schedules.

As a developer, given you’re interested in just your application, you only want to checkout the source files necessary to build and test your binaries. Yet Google has, say, a zigabyte of source at HEAD revision for a root-directory checkout. As a developer, you want to subset that checkout. Hell, you need to, as your C: drive (or NFS mount) is not big enough for that HEAD checkout without it. Subsetting means less files on your C: drive, less source files in your IDE, a quicker consequential build, and is just more managable all round.

gcheckout

Setting up your workstation for ‘SandwichOrdering’ (contrived app) would mean running gcheckout (shell script) on the command line and passing it a parameter of SandwichOrdering. It’s going to reach into Perforce and refer to a ‘BUILD’ file for SandwichOrdering. At some level, that’s a list of directories that are important for the SandwichOrdering to build. Perhaps it is just com/google/sandwichordering/ and deeper. There’s a globbing aspect to it that allows you to get quite fine-grained.

Of course each directory that’s pulled in could have another BUILD file within it, that allows the consequential directed graph of dependencies to expand. This way there’s little repetition. Say SandwichOrdering required FooPersistence (contrived library). The team for the latter would maintain their own BUILD file, and the SandwichOrdering one would essentially just include it in theirs. Maybe FooPersistence has a client library within the source tree – com/google/persistence/foo/client. Maybe it also had a FooPersistenceServer = com/google/persistence/foo/server that the SandwichOrdering team didn’t require to compile against. Both had a transitive dependence on the APIcom/google/persistence/foo/api and included it using the BUILD declarations.

The gcheckout technology modified a Perforce “client-spec” to define (and potentially redefine) something that was applicable to the developer in question and for that workstation’s checkout. Perforce has a globbing notation in the client-spec that covers includes and excludes. Gcheckout automated the modifications to the client spec, and without it developers would have risk copy/paste errors. Gcheckout has not been released as open source.

Messing up other teams

If a second project team called SandwichMakers with their own deployment schedule shared 5% of the code of SandwichOrdering (say the Sandwich POJOs), then they’d get only those sources globbed into their checkout. If one or other of the teams changed those POJOs then it could impact the other team, and wouldn’t by default be guarded by not running the build for both. Of course google have a solution for that – the checkin technologies compute what the full impact of the checkin would be, and report build failures (incl unit test failures) for things that the developer had not themselves directly tested. That’s prior to the actual commit in case that wasn’t clear – no actual build broken in the attempt. Say an attempted checkin showed that there would be a negative impact, the developer would probably run gcheckout again to augment their existing checkout with the impacted project’s source too. They’d then be able to expand their change set (making more code changed) and submit it again with no negative impacts the second time.

Ask for Forgiveness (not for permission)

The culture at Google is that anyone can try to make a change to any source file. That could be a project that you’re tracking in your 20% time, or a bugbear you have with something that you use, and it most definitely is not just the source code you’re assigned to work on as part of a team. As the ‘owner’ of a particular directory, you’re obligation is to review incoming commits fairly. Even those from outside of your team, and expediently consume them if they are good. Google’s culture requires an honest appraisal at that moment. You couldn’t reject things based on a “we’re too busy”, or “I want to change that later”. If the contributor from afar had passed all the unit tests in the commit, and made an nice improvement or fixed a bug, there’s a duty to consume it.

Outside Google – Buck

A bunch of Xooglers (ex Googlers) with plenty of new colleagues at Facebook that have made a technology called Buck. It covers all the recursive features of the Google build system described above. It doesn’t cover the composite checkout / subsetting of the repo it’s facing. Nor does Google’s build system really, it has to work in tandem with gcheckout. What I’d like to see is a buck-checkout tool that allowed for the subsetting of a larger trunk, for Subversion:

buck-checkout p4://perforce.example.com/trunk/apps/SandwichOrdering.buck -d swo_workingcopy
# If only Perforce had a URL design.

That’d make a directory swo_workingcopy and to a checkout of the bits and pieces of SandwichOrdering only. Subverison has Sparse Directory feature that’d allow you to do this very effectively. Perforce (as mentioned above) could also do it via a client-spec. Other source-control tools, especially the DVCS ones, not so much in their current version. As an enterprise developer, I quite like this feature, but the Linux Kernel does not need it so Git for one is unlikely to receive the partial checkout features that Subversion and Perforce can do. There’s Gitolite of course which allows some permissioning capabilities for Git, but I’m not sure how far that can go.

Buck in use in Open-Source land?

The Selenium team is flipping from CrazyFun to Buck presently. We flipped from Maven to CrazyFun about five years ago. Maven is very XML centric, and has been written about a lot. CrazyFun was only really for the Selenium project and was also inspired by the Google build system. Simon Stewart (ex ThoughtWorks, ex Google) was the chief protagonist of CrazyFun, and happily contributes to Buck while at Facebook.

Modern Perforce

Perforce now has a Git-fusion tool that allows developers to use Git and all of it’s idioms against a Perforce back-end, including the ability to do partial checkouts as described. The Git front-end for Perforce that Google for themselves isn’t the same thing – they were independently developed.