Structuring Data for Strong Consistency

Google Cloud Datastore provides high availability, scalability and durability by
distributing data over many machines and using masterless, synchronous
replication over a wide geographic area. However, there is a tradeoff in this
design, which is that the write throughput for any single
entity group is limited to about
one commit per second, and there are limitations on queries or transactions that
span multiple entity groups. This page describes these limitations in more
detail and discusses best practices for structuring your data to support strong
consistency while still meeting your application's write throughput
requirements.

Strongly-consistent reads always return current data, and, if performed within a
transaction, will appear to come from a single, consistent snapshot. However,
queries must specify an ancestor filter in order to be strongly-consistent or
participate in a transaction, and transactions can involve at most 25 entity
groups. Eventually-consistent reads do not have those limitations, and are
adequate in many cases. Using eventually-consistent reads can allow you to
distribute your data among a larger number of entity groups, enabling you to
obtain greater write throughput by executing commits in parallel on the
different entity groups. But, you need to understand the characteristics of
eventually-consistent reads in order to determine whether they are suitable for
your application:

The results from these reads might not reflect the latest transactions. This
can occur because these reads do not ensure that the replica they are running
on is up-to-date. Instead, they use whatever data is available on that replica
at the time of query execution. Replication latency is almost always less than
a few seconds.

A committed transaction that spanned multiple entities might appear to have
been applied to some of the entities and not others. Note, though, that a
transaction will never appear to have been partially applied within a single
entity.

The query results can include entities that should not have been included
according to the filter criteria, and might exclude entities that should have
been included. This can occur because indexes might be read at a different
version than the entity itself is read at.

To understand how to structure your data for strong consistency, compare two
different approaches for a simple guestbook application. The first approach
creates a new root entity for each entity that is created:

However, because you are using a non-ancestor query, the replica used to perform
the query in this scheme might not have seen the new greeting by the time the
query is executed. Nonetheless, nearly all writes will be available for
non-ancestor queries within a few seconds of commit. For many applications, a
solution that provides the results of a non-ancestor query in the context of the
current user's own changes will usually be sufficient to make such replication
latencies completely acceptable.

If strong consistency is important to your application, an alternate approach is
to write entities with an ancestor path that identifies the same root entity
across all entities that must be read in a single, strongly-consistent ancestor
query:

This approach achieves strong consistency by writing to a single entity group
per guestbook, but it also limits changes to the guestbook to no more than
1 write per second (the supported limit for entity groups). If your application
is likely to encounter heavier write usage, you might need to consider using
other means: for example, you might put recent posts in a
memcache with an expiration
and display a mix of recent posts from the memcache and
Cloud Datastore, or you might cache them in a cookie, put some state
in the URL, or something else entirely. The goal is to find a caching solution
that provides the data for the current user for the period of time in which the
user is posting to your application. Remember, if you do a get, an ancestor
query, or any operation within a transaction, you will always see the most
recently written data.