Multiple data stores and eventual consistency using micro-services

11th March 2018

There are cases where there isn't a single source of truth when it comes to the data that a fleet of micro-services consume. The data is thus sparse across multiple storage solutions, each designed to solve a specific problem using a sub-sets of it. This can be either mirrored via syncing to a specific format (think for example of a graph database) or a set of specific features that exist only in another store (think of a relational database with some custom metadata) all derived from an "M:N" data store (this can be anything from a relational database, a document one or even a set of resources exposed as an API, all mixed together).

The question in this case is how to achieve eventual data consistency across all the data stores keeping in mind that the single source of truth is physically sparse across multiple data stores and third party APIs each with their own set of shortcomings. Moreover let's assume that consolidating them into one data store is not an option - in this specific case a strategy within a set of constraints is required in order to achieve an adequate level of consistency which can guarantee the correct resolution of end-user specific use-cases.

First it becomes clear that the criteria for each data store consistency needs to be properly defined - how fresh must the data be? does the access requires bi-directional access? read and writes or it's just a form of read-replica within a specific type of data store to optimally solve a problem e.g. moving the relations between users into a graph database in order to easily make connections between them and generate insights that can then be easily consumed through a micro-service by various clients (mobile, web apps, etc.)

The simplest case is read-only mode without any or minimal metadata, also it's important if the metadata is encapsulated or it leaks into other micro-services business logic. If we're talking about the former then defining the rules for how fresh the data is dictated by its consumers.

The next one is when we have to deal with bi-directional consistency i.e. a client updates something in our data store and we need to sync that back with all the sparse single source of truth data stores. Now, this becomes quite problematic, because we have a dependency tree with side-effects leaking on each leaf. Existing criteria dictated by the consumers will thus require syncing the data to lowest acceptable interval, otherwise we will break the consistency contract we have with our clients. This becomes quite complex with metadata that it's shared between multiple micro-services.

What about metadata? this is now a full fledged cross cutting concern that becomes dissonant with itself as it needs to be synced in multiple places and via different protocols (think about having similar metadata in a graph db. and a relational one). Optimally the metadata, since it's cross cutting across multiple micro-services it should reside in one data store but let's say this is not an option.

This quickly transforms into an aching architectural issue - the solution resides not in a shared data store but within a shared mechanism of keeping track of these changes. One solution could emerge in the form of queues with ack/retry capability backed by persistence (something à la RabbitMQ, Kafka, etc.). It is now self-evident that this adds another complication: how do we deal with persistent events in the queues that are now inconsistent due to how much time they resided (a simple example would be a set of unacked events that after some time "T" become invalid as the record/object they have a reference to doesn't exist anymore) as a result: this will require another mechanism to drop those event from the queues or add another high level abstraction leak as the clients will have to deal with the issues of out of date data.

Eventual consistency in a complex system of sparse single source of truth data store with an added complication of cross cutting metadata within a fleet of micro-services is quite an interesting problem to solve - it its most basic form we deal with a graph of (inter)dependencies that should govern the orchestration of both the micro-services and the syncing of the various data stores.

Finally the solution would be emergent of the graph of (inter)dependencies in the form of an attached mechanism which abstracts away the sparse single source of truth into just one that stays within the confines of its consumers, thus delivering the expected business value on all fronts.