Apache HBase

NOTE: This blog post describes how Apache HBase does concurrency control. This assumes knowledge of the HBase write path, which you can read more about in this other blog post.

Introduction

Apache HBase provides a consistent and understandable data model to the user while still offering high performance. In this blog, we’ll first discuss the guarantees of the HBase data model and how they differ from those of a traditional database. Next, we’ll motivate the need for concurrency control by studying concurrent writes and then introduce a simple concurrency control solution. Finally, we’ll study read/write concurrency control and discuss an efficient solution called Multiversion Concurrency Control.

Why Concurrency Control?

In order to understand HBase concurrency control, we first need to understand why HBase needs concurrency control at all; in other words, what properties does HBase guarantee about the data that requires concurrency control?The answer is that HBase guarantees ACID semantics per-row. ACID is an acronym for:• Atomicity: All parts of transaction complete or none complete• Consistency: Only valid data written to database• Isolation: Parallel transactions do not impact each other’s execution• Durability: Once transaction committed, it remainsIf you have experience with traditional relational databases, these terms may be familiar to you. Traditional relational databases typically provide ACID semantics across all the data in the database; for performance reasons, HBase only provides ACID semantics on a per-row basis. If you are not familiar with these terms, don’t worry. Instead of dwelling on the precise definitions, let’s look at a couple of examples.

Writes and Write-Write Synchronization

Image 1. Two writes to the same rowFrom the previously cited Write Path Blog Post, we know that HBase will perform the following steps for each write:(1) Write to Write-Ahead-Log (WAL)(2) Update MemStore: write each data cell [the (row, column) pair] to the memstoreList 1. Simple list of write stepsThat is, we write to the WAL for disaster recovery purposes and then update an in-memory copy (MemStore) of the data.Now, assume we have no concurrency control over the writes and consider the following order of events:

Image 2. One possible order of events for two writesAt the end, we are left with the following state:

Image 3. Inconsistent result in absence of write-write synchronizationwhich is a role I’ve never held. In ACID terms, we have not provided Isolation for the writes, as the two writes became intermixed.We clearly need some concurrency control. The simplest solution is to provide exclusive locks per row in order to provide isolation for writes that update the same row. So, our new list of steps for writes is as follows (new steps are in blue).(0) Obtain Row Lock(1) Write to Write-Ahead-Log (WAL)(2) Update MemStore: write each cell to the memstore(3) Release Row LockList 2: List of write-steps with write-write synchronization

Read-Write Synchronization

So far, we’ve added row locks to writes in order to guarantee ACID semantics. Do we need to add any concurrency control for reads? Let’s consider another order of events for our example above (Note that this order follows the rules in List 2):Image 4. One possible order of operations for two writes and a readAssume no concurrency control for reads and that we request a read concurrently with the two writes. Assume the read is executed directly before “Waiter” is written to the MemStore; this read action is represented by a red line above. In that case, we will again read the inconsistent row:

Image 5. Inconsistent result in absence of read-write synchronizationTherefore, we need some concurrency control to deal with read-write synchronization. The simplest solution would be to have the reads obtain and release the row locks in the same manner as the writes. This would resolve the ACID violation, but the downside is that our reads and writes would both contend for the row locks, slowing each other down.Instead, HBase uses a form of Multiversion Concurrency Control (MVCC) to avoid requiring the reads to obtain row locks. Multiversion Concurrency Control works in HBase as follows:For writes:(w1) After acquiring the RowLock, each write operation is immediately assigned a write number(w2) Each data cell in the write stores its write number.(w3) A write operation completes by declaring it is finished with the write number.For reads:(r1) Each read operation is first assigned a read timestamp, called a read point.(r2) The read point is assigned to be the highest integer such that all writes with write number <= x have been completed.(r3) A read r for a certain (row, column) combination returns the data cell with the matching (row, column) whose write number is the largest value that is less than or equal to the read point of r.List 3. Multiversion Concurrency Control stepsLet’s look at the operations in Image 4 again, this time using MultiVersion Concurrency Control:Image 6. Write steps with Multiversion Concurrency ControlNotice the new steps introduced for Multiversion Concurrency Control. Each write is assigned a write number (step w1), each data cell is written to the memstore with its write number (step w2, e.g. “Cloudera [wn=1]”) and each write completes by finishing its write number (step w3).Now, let’s consider the read in Image 4, i.e. a read that begins after step “Restaurant [wn=2]” but before the step “Waiter [wn=2]”. From rule r1 and r2, its read point will be assigned to 1. From r3, it will read the values with write number of 1, leaving us with:Image 7. Consistent answer with Multiversion Concurrency ControlA consistent response without requiring locking the row for the reads!Let’s put this all together by listing the steps for a write with Multiversion Concurrency Control: (new steps required for read-write synchronization are in red):(0) Obtain Row Lock(0a) Acquire New Write Number(1) Write to Write-Ahead-Log (WAL)(2) Update MemStore: write each cell to the memstore(2a) Finish Write Number(3) Release Row Lock

Conclusion

In this blog we first defined HBase’s row-level ACID guarantees. We then demonstrated the need for concurrency control by studying concurrent writes and introduced a row-level locking solution. Finally, we investigated read-write concurrency control and presented an efficient mechanism called Multiversion Concurrency Control (MVCC).This blog post is accurate as of HBase 0.92. HBase 0.94 has various optimizations, e.g. HBASE-5541 that will be described in a future blog post.

Nice Explanation..
I am currently using HBase 0.94, As you said if there is no locking it will show as Restaurant->Engineer. In HBase0.94 they are using operation wise locking because of this again there is a chance of occuring Restaurant->engineer.
As far my understanding from your blog, the entire row will be locked until it fully inserts all the values.
In HBase 0.94 since operation level locking the same above problem may occur right....

Consider this program,
Put p=new Put(/*rowkey*/);
p.add(....,'cloudera');
Put p1=new Put(/*rowkey*/);
p1.add(...,'Restaurant');
p1.add(....,'Waiter');
table.put(p1);
p.add(.....,'Engineer');
table.put(p);
We get the lock for p at the following step
Put p=new Put(/*rowkey*/)
because we hold the lock on the row key no other is allowed to get the lock.
but in the case of HBase 0.94 it's not happening like that...Is there anything wrong in my understanding...Can you help me in this..