What Google Taught Me About Scaling Engineering Teams

Every week, a group of Googlers would plaster the walls of bathroom stalls worldwide with one-page sheets that shared the week’s testing tip. One week, the one-pager might discuss dependency injection and provide a simple example of how to use it in various languages; another week, it might share how to set up a tool for measuring test coverage of your team’s codebase. The “Testing on the Toilet” initiative was a quirky and fun way to teach engineers something new and useful as they were doing their business. 1 It also highlighted one of the key strengths of Google’s engineering culture: efficiently disseminating a consistent and opinionated set of best practices to a large engineering organization.

I joined the Search Quality team at Google right out of college and stayed from mid-2006 to mid-2008, when the company grew from about 8,000 employees to almost 20,000. 23 I worked with two very talented engineers on my first project, and in a short six months, we prototyped, tested, and launched a new feature on google.com to show related searches to many millions of users every day. As the token Noogler on the team, what stood out throughout the experience was how quickly the company could ramp up a new engineer like me to be productive within its environment. Like the Borg, 4 the company had mastered the art of assimilating new engineers.

If it were not for certain key elements of Google’s engineering culture, for my team to release a feature at that level of scale and impact in that short of a timeline would have been extremely difficult. These elements were the ones that enabled me to pick up Google’s codebase, tools, and infrastructure in short order. They’re also the same elements that allowed the company to reach the scale that it’s at today with over 50,000 employees. Some ex-Googlers might complain about how slow or bureaucratic the company has become, 5 but there’s no denying that it’s been able to achieve high levels of success at a large scale while still remaining the top-ranked company on Fortune’s list of 100 Best Companies to Work For.

Here are six core principles that I took away from Google’s engineering culture and that you might be able to learn from:

Dedicate engineering resources toward shared tools and abstractions. From the very early days, Google has invested heavily in tools and abstractions like Protocol Buffers, MapReduce, BigTable, and more that are used throughout the engineering organization. The attitude of solving a problem really well once and then getting everyone internally to adopt it has had huge payoffs. Each team spends fewer mental cycles choosing which tools to use, dedicated tools teams can focus on improvements to engineering productivity, and those improvements easily propagate to everyone already using the tools or services. When contrasted with engineering organizations where each team might use vastly disparate tool chains, this philosophy also means that it’s much easier to understand the designs behind many projects once you’ve learned the fundamental building blocks. The downside of this approach is that sometimes you might feel pressured to shoehorn your use case into a particular well-supported tool, even if it wasn’t the best tool for the job.

Invest in reusable training materials to onboard new engineers. One reason I was able to quickly become productive within Google was because the company had invested so many resources into training documents called codelabs. Codelabs covered the core abstractions at the company, explained why they were designed, highlighted relevant snippets of the codebase, and then validated understanding through a few implementation exercises. Without them, it would’ve taken me much longer to learn about the multitude of technologies that I needed to know to be effective, and it would have meant that my teammates would have had to spend more time explaining them to me. My positive experience with codelabs at Google strongly shaped my later decision to push for codelabs in the onboarding process at Quora.

Standardize on coding conventions. Each convention on whitespace, capitalization, line length, whether to use smart pointers, etc., might individually seem trivial, but has huge implications when you hit Google’s scale. I won’t be the first to admit that it was annoying when code reviewers nitpicked my code because I indented a line incorrectly or added two characters past the prescribed line length. But because everyone followed the same conventions, it made browsing the source code significantly easier. There was little overhead to learn a new team’s conventions when switching teams or working on cross-functional projects. Conventions are one of those things that’s easy to ignore when your team is small but then takes increasingly more effort to change when the codebase and team get large enough such that you actually wished for some consistency. Agree on some set of consistent conventions early if possible, or just use the style guides that Google has open sourced.

Increase code quality through code reviews. Having code reviews required and programmatically enforced for every change slows down iteration speed but optimizes for code quality. New engineers received the feedback they needed to quickly pick up best practices and converge on accepted levels of code quality. The higher quality code overall also meant that new engineers modelling off of the code around them would also tend to write cleaner code to start. Code reviews therefore were instrumental in helping the company maintain high-quality software at large scale.

Having the right data (and lots of it) solves many problems. Peter Norvig, the Director of Research at Google, has frequently talked about the “unreasonable effectiveness of data” at solving otherwise complex problems. 67 The right data can help you understand users, slice through office politics, resolve arguments, and let you track progress. Developing logging and data infrastructure like Sawzall and MapReduce made it possible for engineers at Google to sift through massive amounts of it.

Automate testing to scale your code. Google has an extremely strong culture of unit testing, for which “Testing on the Toilet” is but one illustrative example. Nearly every code change I worked on was accompanied by a unit test, and code reviewers would rigorously check for them. It made developing a given change slower, but it also meant that hundreds or thousands of engineers could scalably make changes to the same parts of the codebase without sacrificing too much quality or reliability. In the same way that Google invested in shared tools, it would also heavily in shared testing frameworks and educating people in best testing practices to make writing tests easier.

When I later helped to build products and teams at Ooyala and Quora, the practices that worked well at Google (as well as the ones that didn’t) strongly informed my thinking on what would make a good engineering culture at those places. Just because specific decisions worked well at Google’s scale, however, wouldn’t necessarily mean that they’d work well for a different organization in a different phase of growth. Every engineering decision involves a set of tradeoffs, but Google’s engineering culture provides one slice of those tradeoffs that you can start from.