Posted
by
Unknown Lamer
on Tuesday September 18, 2012 @07:04AM
from the all-while-ironing-your-clothes dept.

vu1986 writes with this bit from GigaOm: "Google has made public the details of its Spanner database technology, which allows a database to store data across multiple data centers, millions of machines and trillions of rows. But it's not just larger than the average database, Spanner also allows applications that use the database to dictate where specific data is stored so as to reduce latency when retrieving it. Making this whole concept work is what Google calls its True Time API, which combines an atomic clock and a GPS clock to timestamp data so it can then be synched across as many data centers and machines as needed."Original paper. The article focuses a lot of the Time API, but external consistency on a global scale seems to be the big deal here. From the paper: "Even though many projects happily use Bigtable, we have also consistently received complaints from users that Bigtable can be difficult to use for some kinds of applications: those that have complex, evolving schemas, or those that want strong consistency in the presence of wide-area replication. ... Many applications at Google have chosen to use Megastore (PDF) because of its semi-relational data model and support for synchronous replication, despite its relatively poor write throughput. As a consequence, Spanner has evolved from a Bigtable-like versioned key-value store into a temporal multi-version database. Data is stored in schematized semi-relational tables; data is versioned, and each version is automatically timestamped with its commit time; old versions of data are subject to configurable garbage-collection policies; and applications can read data at old timestamps. Spanner supports general-purpose transactions, and provides a SQL-based query language." Update: 09/20 17:57 GMT by T: Also in a story at Slash BI.

Sounds like they are trying to elbow into Oracle territory. As for Google ditching it I doubt they'd be so lackadaisical about peoples mission critical data as they are about a glorified rss aggregate (IGoogle).

I am sorry moderators if you find this fact unpopular but that is the point of Google development problems.Google has a problem of pushing out new and innovative development stuff then a year or so, if it hasn't skyrocketed they will just kill the project.

As a developer you need to choose tools that you know will last, not something that will be here today and gone next week.

I would mod parent up if I could because I would've said the same thing, in different words and with a capital letter at the beginning of my sentence. I am seriously sick of Google stabbing users in the back.

Though not many people will need huge multi-centre databases it has cracked some of the big problems. Interestingly some of these don't appear to affect google's main business.

Spanner has two features that are difcult to implement in a distributed database: it provides externally consistent reads and writes, and globally-consistent reads across the database at a timestamp.

One of the issues with large distributed data systems was that reads at different nodes could retrieve data at a different (though consistent) state. I have seen this on google, a search shows a recent news item, then another doesn't show it again, before it finally covers all nodes and is generally available.

Making this whole concept work is what Google calls its True Time API, which combines an atomic clock and a GPS clock to timestamp data so it can then be synched across as many data centers and machines as needed.

I'm guessing there's a little more to it than reinventing and installing ntp on your DBMS server. That little bit more is the actual interesting part.

GPS/Atomic clock is better than NTP. It's a system to distribute time that will have a 400ns precision (probably a couple microseconds once you reach the actual servers in the data center).If you use NTP or message passing you can't synchronize data centers more accurately than a couple milliseconds (assuming you have paths that are quite stable between them as transit time can be corrected).So basically GPS/Atomic clock lets you synchronize 2 systems that are far apart more precisely and without having to make them communicate.Note that Atomic clocks protect them from GPS outages, so they can really rely on the timestamps.

Since there are large sections of the patriot act that are sealed, and we have no idea whats in them, I'd say "no"The US federal government feels that their goals are holy, and they will achieve them "by any means necessary"

Spanner’s data model is not purely relational, in that rows must have names. More precisely, every table is required to havean ordered set of one or more primary-key columns.

OK, relational keys should not be ordered. But the fact that each table must have a key makes it a relation, at least inprinciple, so Spanner at first looks like it is in fact more relational than SQL. Am I missing anything?

RDBMs don't require you to define primary keys, Spanner does, because it's evolved from key-value DB. If you'd read on to next sentence:

This requirement is where Spanner still looks like a key-value store: the primary keys form the name for a row, and each table denes a mapping from theprimary-key columns to the non-primary-key columns.

Yes, they do. The few of them, that is. SQL DBMSs do not, but they are not relational.

This requirement is where Spanner still looks like a key-value store: the primary keys form the name for a row, and each table denes a mapping from theprimary-key columns to the non-primary-key columns.

This makes little sense to me, because it describes not a key-value store — unless you consider the ‘value’to be allnon-primary key columns, which would stretch the definition of a key-value store —, but a relational database relation.

It seems it's the GPS clock signals they want to use here. When those are dropped I guess they'll fall back on their own atomic clocks. It might be a little less accurate 'though.

From T*A:

Google’s cluster-management software provides an implementation of the TrueTime API. This implementation keeps uncertainty small (generally less than 10ms) by using multiple modern clock references (GPS and atomic clocks).

Nothing in there about GPS being essential. Just needs 'multiple modern clock references'.

What happens when governments decide it's time to tamper with or block GPS signals?

To what end? What possible purpose would that serve other than to interfere with critical systems like aircraft and marine navigation? (Yes, I know that aircraft and ships have backup means of navigation but it would still cause significant disruption.)

GPS certainly has other uses, precision timekeeping among them, and disruption of GPS would interfere with surveying, time transfer, and a variety of other functions. Pretty much any modern country uses GPS in some way and would suffer from disruption (it's f