How Google Remade the Computer Science Researcher

At Google, John Wilkes is building a software system that will drive tens of thousands of machines, orchestrating a worldwide network of data centers. But on Saturday afternoons, he blows glass. Photo: Ariel Zambelich/Wired

John Wilkes

John Wilkes spent a year negotiating his move to Google, and when he finally agreed to join the company, he still didn’t know what he’d be working on.

Wilkes was recruited by a Googler named Bill Coughran — who had joined the web giant after years as a researcher at the famed Bell Labs — and he knew he’d be working alongside people a lot like Coughran. He was joining a team stacked with top computer science researchers, many poached from the leading corporate labs of the pre-internet age, including Bell Labs and Xerox PARC and HP Labs, where Wilkes himself had worked since the early ’80s. But no one would tell him what his Google research would look like.

Wilkes was lured by the prospect of teaming with these big thinkers — and he knew they wouldn’t be at Google unless the work was intellectually satisfying — but he also knew there was an added importance to their research. “One of the things that made Google interesting is that they wouldn’t say what I’d be working on,” Wilkes says, with the impish grin that so often punctuates his view of the world.

“One of the things that made Google interesting is that they wouldn’t say what I’d be working on,” Wilkes says, with the impish grin that so often punctuates his view of the world.

At Google, research isn’t what it used to be. When Google recruits the best and brightest computer science researchers, it doesn’t set them up inside some sort of secluded lab where people noodle on ideas that may or may not see the light of day. The web grows too quickly for that. Google pushes these minds to the front lines of the net, where they seek to remake the world’s technology as soon as possible. John Wilkes wasn’t told what he’d be working on because he was joining the team that builds the fundamental hardware and the software that underpins Google’s entire online empire — the stuff that Google views as its most vital of trade secrets.

Like other seasoned computer science researchers before him — including Jeff Dean and Sanjay Ghemawat — Wilkes joined Google in large part because the company gave him the opportunity to change its immediate future, and perhaps even the future of the web as a whole. Over the past decade, in moving these top minds to the front lines, Google itself has evolved into what you might call a research lab for the internet at large. Time and again, the company pushes the edge of the envelope, finding ways of more quickly and more efficiently juggling the billions of requests that hit its web services with each passing second, and more often than not, the rest of the net follows suit.

Most famously, a sweeping number-crunching platform called MapReduce — developed at Google more than a decade ago — is now mimicked by just about all of the web’s biggest names, thanks to an open source project dubbed Hadoop. But this is just one of many examples, ranging from data centers that snap together like LEGO bricks to world-spanning databases.

The rub is that when these Google minds go to work on such projects, there’s very little give-and-take with the larger research community. This is often the case with corporate research projects, but the secrecy only heightens at Google. The Google research model, Wilkes says, “increases the value of the ideas, and provokes more need to be circumspect, compared to when I was working on ideas that weren’t going to make it into a product.”

Even when you talk to academics who know Wilkes — and others working on the guts of the Googlenet — there comes a point where they stop and say that Google won’t tell them what it’s working on. But Wilkes is pushing to change that — at least in some ways. This can help advance the net as a whole, he says, but also benefit Google.

According to some who know him, Wilkes can be intimidating, if only because his body of work is so impressive — and because he’s a Cambridge University graduate who speaks with the sort of educated English accent Americans are so often intimated by. But they’ll also tell you that he is every bit the mentor, someone who is kind yet firm and uncompromising in his criticism. “He’s a very hands-on type,” says Andy Konwinski, a University of California at Berkeley graduate student who interned at Google under Wilkes, “which is only positive for a Ph.D. student looking to do research.”

Wilkes clearly thrives on such collaboration — something you can see even on Saturday afternoons, when he blows glass with fellow students at San Francisco State University, not far from Google’s Mountain View, California, headquarters — and he’s intent on expanding the way Google collaborates.

The best evidence of this is that he’ll actually tell you what he’s working on. He’s overseeing the creation of a mind-boggling new system that will orchestrate each and every computing task that runs across Google’s worldwide network of data centers, perhaps the largest single operation on the net. This system is called Omega, and it’s meant to replace an existing tool known as Borg, which has helped drive Google’s empire for nearly a decade.

Over that decade, Borg was the best kept of secrets — outside of Google, you never even heard the name — but Omega is another matter. “I came and worked on Omega,” Wilkes says, “and I chose to do things differently.”

Inside the Hacker Mind: John Wilkes on Google Omega

At Google, John Wilkes is guiding the creation of a sweeping software system called Omega. The successor to a rarely discussed Google contraption called Borg, Omega runs across thousands of computer servers, ensuring that each of them is used to the fullest.

Omega lets Google run many different services — from public services like Google Search and Google Maps and Gmail to private data-crunching tools like Google MapReduce — atop the same collection of machines. Instead of setting up a different server cluster for each service, Google can evenly spread tasks from multiple services across one uber cluster.

Wilkes says the process is like taking a big pile of wooden blocks and packing them neatly into buckets. “If you just throw the blocks in the buckets, you’ll either have a lot of building blocks left over — because they didn’t fit very well — or you’ll have a bunch of buckets that are full and a bunch that are empty, and that’s wasteful,” he says. “But if you place the blocks very carefully, you can have fewer buckets.” For blocks, read computing tasks. For buckets, think servers.

Borg does much the same thing, but Omega does more. Today, each engineering team inside Google can tune Borg to their particular needs. But Omega will remove these controls — and tune itself “We just want people to tell us what they want to do,” Wilkes says, “and we’ll do that for them.” Shades of the “self-managing” storage systems Wilkes designed at HP Labs.

Yes, those using the system will have less control, but that’s a good thing — at least for Google as a whole. If you give engineers the power to tune the system for their particular service, you see, this can undermine the performance of other services. Part of the problem is that public services like Gmail behave very different from internal Google tools like MapReduce, a system that uses thousands of servers to very quickly process large amounts of data. With Omega, Google has separated the way these two very different types of services are handled.

“It turns out that doing both of those effectively with one lump of code is quite hard, so we split them up,” Wilkes says. “We’re looking forward to a world where we can have both a very high response time for batch jobs and a much more carefully orchestrated layout for long-running service jobs.”

As it turns out, this same basic notion is championed by an open source project called Mesos, and in a way that belies Google’s reputation as a black box of computer systems research, Wilkes often trades notes the researchers who build Mesos, helping to push these ideas across the web.

John Wilkes at home with his wife, Marjan, and son, Ian. Photo: Ariel Zambelich/Wired

The Beer That Changed a Life

“HP was a great company to work for, for many years,” Wilkes says. “But towards the end, it became less fun. And then Google came calling.”

John Wilkes says he wound up at HP Labs because he was sitting in the right pub at the right time drinking the right beer. This was in the early ’80s, while he was still working on a computer science Ph.D. at Cambridge University in Great Britain. Wilkes was born and raised in Britain, and that’s where he planned to stay — until his former math professor walked into the pub too.

This professor, it turns out, was also a partner in a consulting firm that had just won a contract with the Argentinian Navy to develop some algorithms that could determine where on the River Plate the Navy should put its next grain port. This was before Britain went to war with Argentina over the Falkland Islands, and the professor was slated to do the job himself. But his wife had just had another baby.

“He came by, looking very despondent, and asked what I was drinking. I told him. And he asked if I wanted another,” Wilkes remembers. “He wanted to know where he could find someone to go to Argentina for six weeks and do some programming — and be paid.”

Wilkes leapt at the chance. And then, on his way back to England from Argentina, he took a detour through Northern California, where he interviewed with several big-name research operations. “It was as cheap to fly via San Francisco as it was to fly back nonstop across the Atlantic,” he says. In the end, he took the job at HP Labs, the research arm of the hardware and software outfit that loomed so large over Silicon Valley and beyond.

He spent nearly a quarter century at HP, pushing the boundaries of what is commonly called “systems research.” Basically, this means he explored the way various pieces of hardware and software combine to form a single computing system. He worked on HP’s seminal PA-RISC processor and software operating systems that spanned multiple machines and eventually made his mark with storage systems — the hardware and software we use to hold digital data. He spent years designing a massive storage system that was smart enough to essentially manage itself.

“Rather than getting people to do all the knob-twiddling,” Wilkes says, “the aim was to build a system where you could tell it what you want and let the computer sweat the small stuff.”

Much of this research was published, and on the strength of his storage work, Wilkes was named both an HP Fellow and ACM Fellow. “He’s a legend in the field,” says Jeff Hammerbacher, the man who oversaw the creation of the Hadoop infrastructure that underpins Facebook. But that storage research never made its way into an HP product. “HP was a great company to work for, for many years,” Wilkes says. “But towards the end, it became less fun. And then Google came calling.”

Google didn’t come calling because Wilkes was building storage systems. It came calling because he’s someone who re-imagines the way computers work. Google, you see, is now at the forefront of systems research. Its web operation has grown to the point where it’s forced to build a new breed of system — systems far larger than anything that came before.

To do this, Google designs much of the hardware used inside its data centers, rethinking everything from servers and storage gear to networking switches. But on another level, it builds massive software systems that run atop all that hardware, spanning entire data centers. In an effort to improve the efficiency of its operation, Google strives to fashion data centers that behave much like a single machine.

That’s why Google needs people like John Wilkes — and it’s why people like John Wilkes are drawn to Google. “If you’re a data infrastructure scholar,” says Hammerbacher, “it’s where you want to work. It means something when someone like John Wilkes goes to Google.”

Omega is one of those massive Google systems. In fact, it’s a system that will help drive the entire operation. It’s a way of parceling work across tens of thousands of computer servers.

Rather than running each of its online services on a separate computer cluster — Google Search on one group of machines, Gmail on another, Google Maps on a third, etc. — Omega can run multiple services atop one large cluster. The work from each service is divided into tiny pieces, and then Omega sends these tasks wherever it can locate free computing resources, such as processing power or computer memory or storage space.

Wilkes says that Borg — Omega’s predecessor — is so efficient, it has saved the company the cost of building an entire data center, and Omega aims to take this idea even further (see sidebar).

Why Glassblowing Is Like Software Research

Inside Google’s offices in Mountain View, when you ask Wilkes how his glassblowing relates to his computer research, he says it doesn’t. “What I like to tell people is that the process of glassblowing is incredibly intense and focused — and not work,” he explains. “It’s completely separate.”

But when you show up at San Francisco State on a Saturday afternoon, watching Wilkes blow glass, you see why he’s attracted to both.

Standing by the massive oven where the glass is heated, Wilkes acknowledges that glassblowing is like software development in at least one way: You reach a finished product by reworking your creation over and over and over again. Like a coder, a glassblower spends the day changing a little here, adding a little there, connecting this piece to that piece, and massaging what he’s already built. Sometimes, Wilkes says, you have to break things before you find the right way to go.

Then, as a fellow student bends down to help Wilkes put a foot on the drinking glass he’s blowing, you see another connection. Glassblowing isn’t something you can do on your own, and the same goes for big-time software development. In building some of Google’s most important data center technologies, legendary Google engineers Jeff Dean and Sanjay Ghemawat would code at the same machine, and clearly, Wilkes thrives on the same kind of give-and-take.

This has already left its mark on the Omega project. When you first ask Wilkes how much he can tell you about what he’s working on, he grins that impish grin. “Six,” he says, before pausing for effect. “Sorry, I couldn’t resist.” But then he tells you.

It’s not just that Wilkes will name the project. Google is only just beginning to test the system in its data centers, but as far back as 2011, Wilkes described Omega during a gathering of academics, and he and his team regularly trade notes with outside engineers who oversee the Mesos project — an Omega-like open source project that began at the University of California at Berkeley and eventually spread to commercial operations like Twitter and Airbnb.

Wilkes has even gone so far as to share data describing the behavior of systems inside Google’s data centers, hoping to spur additional research outside the company.

To be sure, Google’s relationship with the broader web community is still rather complicated. Some outsiders continue to complain that, unlike a Facebook or a Yahoo or a Twitter, Google doesn’t open source the new-age software it builds for its massive data centers. It shares ideas rather than software code. But share it does. And John Wilkes wants to share more.

“I’m really interested in having outside researchers get a chance to think about Google’s kind of problems,” he says. “I don’t necessarily want them to have the same solutions we have, because we have those already. We want them to have different solutions — in case they’re better.”

Other Articles You Might Like

Wilkes at the San Francisco State glassblowing studio. Photo: Ariel Zambelich/Wired

Wilkes and his wife, Marjan, in their livingroom, where he so proudly displays the fruits of his glassblowing. Photo: Ariel Zambelich/Wired