Google recently released a paper on Spanner, their planet enveloping tool for organizing the world’s monetizable information. Reading the Spanner paper I felt it had that chiseled in stone feel that all of Google’s best papers have. An instant classic. Jeff Dean foreshadowed Spanner’s humungousness as early as 2009. Now Spanner seems fully online, just waiting to handle “millions of machines across hundreds of datacenters and trillions of database rows.” Wow.

The Wise have yet to weigh in on Spanner en masse. I look forward to more insightful commentary. There’s a lot to make sense of. What struck me most in the paper was a deeply buried section essentially describing Google’s motivation for shifting away from NoSQL and to NewSQL. The money quote:

We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions.

This reads as ironic given Bigtable helped kickstart the NoSQL/eventual consistency/key-value revolution.

We see most of the criticisms leveled against NoSQL turned out to be problems for Google too. Only Google solved the problems in a typically Googlish way, through the fruitful melding of advanced theory and technology. The result: programmers get the real transactions, schemas, and query languages many crave along with the scalability and high availability they require.

The full quote:

Spanner exposes the following set of data features to applications: a data model based on schematized semi-relational tables, a query language, and general purpose transactions. The move towards supporting these features was driven by many factors. The need to support schematized semi-relational tables and synchronous replication is supported by the popularity of Megastore [5].

At least 300 applications within Google use Megastore (despite its relatively low performance) because its data model is simpler to manage than Bigtable’s, and because of its support for synchronous replication across datacenters. (Bigtable only supports eventually-consistent replication across datacenters.) Examples of well-known Google applications that use Megastore are Gmail, Picasa, Calendar, Android Market, and AppEngine.

The need to support a SQLlike query language in Spanner was also clear, given the popularity of Dremel [28] as an interactive data analysis tool. Finally, the lack of cross-row transactions in Bigtable led to frequent complaints; Percolator [32] was in part built to address this failing.

Some authors have claimed that general two-phase commit is too expensive to support, because of the performance or availability problems that it brings [9, 10, 19]. We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions. Running two-phase commit over Paxos mitigates the availability problems.

What was the cost? It appears to be latency, but apparently not of the crippling sort, though we don’t have benchmarks. In any case, Google thought dealing with latency was an easier task than programmers hacking around the lack of transactions. I find that just fascinating. It brings to mind so many years of RDBMS vs NoSQL arguments it’s not even funny.

I wonder if Amazon could build their highly available shopping cart application, said to a be a motivator for Dynamo, on top of Spanner?

Is Spanner the Future in the Same Way Bigtable was the Future?

Will this paper spark the same revolution that the original Bigtable paper caused? Maybe not. As it is Open Source energy that drives these projects, and since few organizations need to support transactions on a global scale (yet), whereas quite a few needed to do something roughly Bigtablish, it might be awhile before we see a parallel Open Source development tract.

A complicating factor for an Open Source effort is that Spanner includes the use of GPS and Atomic clock hardware. Software only projects tend to be the most successful. Hopefully we’ll see clouds step it up and start including higher value specialized services. A cloud wide timing plane should be a base feature. But we are still stuck a little bit in the cloud as Internet model instead of the cloud as a highly specialized and productive software container.

Another complicating factor is that as Masters of Disk it’s not surprising Google built Spanner on top of a new Distributed File System called Colossus. Can you compete with Google using disk? If you go down the Spanner path and commit yourself to disk, Google already has many years lead time on you and you’ll never be quite as good. It makes more sense to skip a technological generation and move to RAM/SSD as a competitive edge. Maybe this time Open Source efforts should focus elsewhere, innovating rather than following Google?

Reader Comments (9)

Each application has its own requirements, so what worked for Google here may not be applicable to other organizations.

Also, remember that Google has a big army of programmers, enough to solve any large scale problem they consider important. In this case, they figured out that creating their in house solution would be the best way to solve the DB issues they were having.

It is hard to generalize this experience to other websites. In my particular case, it is simpler to use a standard SQL stack until you have a successful product. When that happens, it is frequently possible to apply standard optimization techniques before using a non-standard DB.

It's reasonably clear, why Google rewired their DB engine to work more like traditional SQL than KV-Stores. Because they have the power to built a new DB engine from "scratch" and they coldheartedly lack innovation in their typical nerdish fashion - "theory, science & code".

They want sync, distribution & low latency all automated and bulletproof security. But reinvent something not-so-new to fix something not-so-old. Waste! The problem is much deeper than just the DB implementation, in reality the TCP and I/O Stack in the Linux kernel are known old problems. Nobody except Google has the equipment & ressources to fix the "Kernel design flaws" entirely in a timely fashion though. It's in the need of a huge rewrite of large parts in the codebase using latest innovations, while purging some masses of old code. Driver compatibility is a problem that should go into a seperate Kernel, because it can slow development of everything else down. Think of a fractal structure like the Cerebrum http://en.wikipedia.org/wiki/File:EmbryonicBrain.svg Click?

You can drop relations and wire by code, or add relations and wire by abstract commands, or go do a little of both like Google Spanner. (Which btw. is an interesting name for Google's DB, reflecting their voyeurish data empire. See the german translation for that word ;)

I know that Google has lots of "fixes/improvements" here and there for the linux kernel they use. And their need for a distributed filesystem grew until they made it, because they never cared to discuss/improve existing solutions to keep their wannabe advantage. Honestly filesystems, kernels, more energy effcient servers, databases, custom network equipment, an ad empire, a social network and all the money of the world can't give you nature like efficiency (which operates at the physical limit of thermodynamics). Because ancient common knowledge, especially in maths/physics is still rewarding mental accord and fit, instead of "rewriting the codebase" using current knowledge. It's plausible, rewriting existing code until it's improved can be exhausting, that's why we re-invent. Nature never re-invents, it only adapts, forks and (re-)connects - keep that in mind dear reader.

The same plague comes into the windmills everytime someone says: MVP - Minimum Viable Product. That's economically correct, technically wrong and ethically unbearable, due to the waste of ressources in foreseeable time.I don't disagree dogmatically though, because I adapt, fork and (re-)connect according to the situation as required.

It's my first comment here on http://highscalability.com forgive my blurry formulations, I hope the reader knows what the moral of the story is. They can, will and have done it many times just the like the entire new fraction of the IT-Industry, they will reform some time. The old fractions and fruits of hour knowledgehunt luckily kept more or less vivid during the deepnesses of our history has brought us into the urge and need for reformatios that are much stronger than those needed by our fairly young technologist generation. It's the human, not the machine or technology that needs the innovation.

In this field the so called "Future" usually last about 3-5 years, so I'll wait for the next one, at some point a phone will have more computing power than a data center, when that happens all these so called "problems" will no longer be an issue.

CPU is no longer the bottleneck, storage is our current bottleneck, and TCP/IP will become our next bottleneck. Like building a super computer, getting a bunch of CPU is easy, having them interconnect efficiently is what is difficult.

Forget complicated distributed file systems, mechanical hard disk simply needs to go, SSD won't last long either, we need to simply stop using 0s and 1s to store data, use something that has more than 3/4/5/6+ states and bam your storage size/speed increase by a few orders of magnitude.

* No such thing as one size fits all database* NoSQL = Not Only SQL, not No SQL; plenty of blazing scale-out SQL, Acid DBs out there like Volt associated w NoSQL movement* When ORCL figures out how useful mem and compression effectively in databases (see x3), well, what's next then?* You either need a "raft of programmers" a la Google, or some $1m+ MPP to crunch SQL for complex analytics queries - columnar and graph look pretty damn good as hardware alternatives* Sure looks like megastore more of a driver for Spanner arch than general purpose global domination