Is ElasticSearch Set/Get Eventual Consistent?

ElasticSearch does much of the heavy lifting on handling horizontal scalability for us, managing failures, nodes, shards. Now I was just getting into it a few days ago in a new project I was working at. I wanted to know if the SET/GET operation is eventually consistent or not. I started by thinking, well it's a nosql, there are replicas, it should be eventually consistent but then I read some documentation which leads me to interesting insights at least if you are the client writing the data in whether for you its going to be eventual consistent if you try to read from a replica. But first, allow me to summarize for you some of the concepts I have learned and then I will say what I think about SET/GET eventual consistency. (I also did a local cluster test to confirm that.)

cluster.name

Nodes with the same name belong to same cluster

The cluster reorganizes itself as we add or remove data, meaning it manages moving data between nodes if needed.

Master Node

He is not involved in searching - one node is elected as master node, it's going to be in charge of adding and deleting indexes, adding/removing nodes from cluster. He is just a manager.

Any Node

You can talk to any node for searching and indexing including the master. The entry point node (any node) knows where data resides so it will communicate with it to get and set data and it will get back to us (the entry point node) with the results.

Index

Logical namespace that points to one or more shards. It's like a database in a relational ddatabase. Index groups together mone or more shards.

1 index -> multi shards # => one index can have one or more multi shards it's like a database.
shard # => documents are stored in shards. single instance of lucene. a complete search engine in it's own right.
application -> index -> shard # => applications talk to shards via indexes which are logical namespaces pointers to shards.
cluster grows # => move shards between nodes.
primary shard # => document is on a **single** primary shard. data is only on one primary shard.
replica shard # => in case of hardware failure on primary shard, serve read requests (read/get).
number of shards # => you can have multiple primary shards for an index.
who handles what # => Read / Search is handled by either primary or replica, the more copies the higher the throughput.
concurrency # => if conflict two proesses read 50 and increase to one and store we can end up with 51 and not 52. elasticsearch is using optimistic concurrency control (versioning).

Distributed Document Store

When you index a document it is stored on a single primary shard.

shard = hash(routing) % number_of_primary_shards

This explains why the number of primary shards can be set only when an index is created and never changed: if the number of primary shards ever changed in the future, all previous routing values would be invalid and documents would never be found.

Create, index, and delete requests are write operations, which must be successfully completed on the **primary shard before ** they can be copied to any associated replica shards. The client will get OK only if finished successfully on primary shard.

parameters/configuration

replication # => sync: wait for successull response from replicas. async: success as soon as primary finished. avoid sync...
quorum # => By default primary shards requires a quorum (shards majority) to be **available** before attermting write.
read miss # => it is possible that while a document is indexed document is in primary but not yet copied to replica, replica will return that document does not exist, while the primary would return the document successfully. in that sense read is not consistent but eventual consistent.

Now - Is ElasticSearch SET/GET Read Eventual consistent?

Elasticsearch read consistency is eventually consistent but it can also be consistent :). The realtime flag is per shard, so if we have a replicated shard which did not get the data yet, while it may still be realtime we won't get the most recent data, at most we would get the data on it's transaction log.

realtime:true + reaplication: sync ==> read consistent for same client # => because replication true means master waits for the written data to be replicated to all replicas.

It is possible that, while a document is being indexed, the document will already be present on the primary shard but not yet copied to the replica shards. In this case, a replica might report that the document doesn’t exist, while the primary would have returned the document successfully. Once the indexing request has returned success to the user, the document will be available on the primary and all replica shards.

So it's possible for the document to be only on master and not replicas, well that makes sense, if we managed to set the document only on master and the replica didn't get it yet, but in this case the above section also said that the client would not get an ok response.

The translog is also used to provide real-time CRUD. When you try to retrieve, update, or delete a document by ID, it first checks the translog for any recent changes before trying to retrieve the document from the relevant segment. This means that it always has access to the latest known version of the document, in real-time.

To the client which is waiting until data is replicated it is consistent, as the sync flag of the consistency is returning a success result to the client only after it was replicated. Together with the realtime flag this ensures that even if the operation is only in the transaction log, it would be returned to the client. but if i'm client2 which did not do the write, i might be just inside the operation where it finished on master and was not replicated yet to the replicas, in this case it would be eventual consistent. Ofcourse I encourage you to tell me if you think this is not the case :)

BOOK: If you are interested in more of developer oriented discussion on elasticseach and not just admin wise, then the best book I have found for it is: "ElasticSearch Essentials"

For each topic we have a status column, use it for our own to track the status of your progress in the study this topic. In addition, we have a tutorial column where we point to the best video or tutorial for study this topic, this doc is a work in progress, please let us know for any suggestion.

Now by far the best book (although I think I could have created a better version) for studying for programing interviews is: "Cracking The Coding Interview"

Remember that actors interact only using message passing. In order to check actors behavior you can do it through the messages sent to them and back from them. So how do you test actors? You send them messages :)
To test actors that communicate only with messages, you need to send it a message and get reply back and check it.
akka has a TestProbe valp=TestProbe(); // record incoming messages in queue so you can assert and verify them.
Creating actor system for tests: implicitvalsystem=ActorSystem("TestSys") valtoggle= system.actorOf(Props[Toggle]) // this is the actor we are going to test.valp=TestProbe() // this is the test client actor which will record the messages.
p.send(toggle, "How are you") // probe --> tested actor: how are you?
p.exepectMsg("happy") // assert result is happy.
To have the probe actor created for you: newTestKit(ActorSystem("TestSys")) withImplicitSender { // we are in probe actor.valtoggle= system.actorOf(Props[Toggle…

You see it's much easier than you think there exists a limit set of rules you should apply to most of the programming interview questions which involves algorithms and data structures. I have prepared a summary of them for you, just read below and get your tips for today.When you have no clue / Under panic attack => Brute Force!

If you don't have a clue, brute force the fu**** question! In most cases the question you are presented with has a brute force solution. Mention clearly that you are brute forcing it and say that the time complexity is O(n^2) or whatever it is. Then think where do you waste time in your brute force solution, try to improve that part, in many cases, this will get you closer to the actual answer.

By brute forcing you get to be familiarize with the problem better. A common theme for brute forcing means you are going to have a for loop inside a foor loop something like the below, so it's great to get familiarize with common bruteforcing snippets,…

We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.