Details

The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. There is a fundamental change to the group-commit behaviour: it batches transactions in HBase code before pushing it down to the wal.

The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. There is a fundamental change to the group-commit behaviour: it batches transactions in HBase code before pushing it down to the wal.

Description

This allows for better throughput when there are hot rows.I have seen this change make a single row update improve from 400 increments/sec/server to 4000 increments/sec/server.

Activity

The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows.

Introuced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS.

dhruba borthakur
added a comment - 29/Sep/11 07:09 The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows.
Introuced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS.

Ted Yu
added a comment - 29/Sep/11 07:35 Nice feature.
+ System .err.println( "Version Mismatch between client and server" +
+ "... command aborted." );
I think LOG should be used.
Incrementer thread should be set as Daemon thread.
For HLog.unflushedEntries, since it is used to generate txid, I wonder if there is a better name for it.
+ // method is pretty heavyweight as far a locking is concerned. The
Should be "as far as ..."
+ // write out all accumulated Entries to hdfs.
+ for (Entry e : pending) {
+ writer.append(e);
+ }
I think exception handling should be added here. What if writer.append() raises exception ? The remaining Entries would be lost ?

does not catch exceptions, instead throws an exception to the caller if any of the edits fail to make it to HDFS. In fact, Hbase regionserver exits if an HDFS write/sync fails, this is expected behaviour.

dhruba borthakur
added a comment - 29/Sep/11 07:57 Addressed Ted Yu's review comments. The code that does
for (Entry e : pending) {
+ writer.append(e);
+ }
does not catch exceptions, instead throws an exception to the caller if any of the edits fail to make it to HDFS. In fact, Hbase regionserver exits if an HDFS write/sync fails, this is expected behaviour.

Wow, this is the quickest turnaround for code review I have ever seen
Normally you can wait for other people's comments before making the next patch.

I still see System.err.println() call.
For my last comment, thanks for reminding me the current behavior. What I meant was that your change introduced some kind of buffering which would delay the bubbling of IOException.
I guess this is Okay. We should document this in release notes.

Ted Yu
added a comment - 29/Sep/11 08:13 Wow, this is the quickest turnaround for code review I have ever seen
Normally you can wait for other people's comments before making the next patch.
I still see System.err.println() call.
For my last comment, thanks for reminding me the current behavior. What I meant was that your change introduced some kind of buffering which would delay the bubbling of IOException.
I guess this is Okay. We should document this in release notes.
Can I ask my favorite question ? Test suite.
Also, using https://reviews.apache.org/ would be convenient.

Ted Yu
added a comment - 29/Sep/11 14:37 I got the following from test suite run:
testAppend(org.apache.hadoop.hbase.regionserver.wal.TestHLog) Time elapsed: 0.037 sec <<< ERROR!
java.lang.NullPointerException
at org.apache.hadoop.hbase.regionserver.wal.TestHLog.testAppend(TestHLog.java:567)
Please use my script on HBASE-4480 to reproduce the above.
I can provide test output if needed.

All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch.

The one reference to System.err.println() is a printUsage() message that is needed only if u want to run the unit test as a standalone command line utility.

There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result.

There is a benchmark TestHLogBench that measures the performance of the appendNoSync call.

dhruba borthakur
added a comment - 29/Sep/11 20:23 All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch.
The one reference to System.err.println() is a printUsage() message that is needed only if u want to run the unit test as a standalone command line utility.
There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result.
There is a benchmark TestHLogBench that measures the performance of the appendNoSync call.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:https://reviews.apache.org/r/2116/
-----------------------------------------------------------

Review request for hbase and Ted Yu.

Summary
-------

The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. Introduced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS.

There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result. There is a benchmark TestHLogBench that measures the performance of the appendNoSync call.

All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch.

jiraposter@reviews.apache.org
added a comment - 29/Sep/11 20:27
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2116/
-----------------------------------------------------------
Review request for hbase and Ted Yu.
Summary
-------
The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. Introduced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS.
There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result. There is a benchmark TestHLogBench that measures the performance of the appendNoSync call.
This addresses bug HBASE-4487 .
https://issues.apache.org/jira/browse/HBASE-4487
Diffs
/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1177401
/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 1177401
/src/test/java/org/apache/hadoop/hbase/regionserver/TestIncrement.java PRE-CREATION
/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogBench.java PRE-CREATION
/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALActionsListener.java 1177401
/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java 1177401
Diff: https://reviews.apache.org/r/2116/diff
Testing
-------
All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch.
Thanks,
Dhruba

The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. Introduced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS.

There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result. There is a benchmark TestHLogBench that measures the performance of the appendNoSync call.

All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch.

jiraposter@reviews.apache.org
added a comment - 29/Sep/11 22:59
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2116/#review2180
-----------------------------------------------------------
Ship it!
Patch v6 passed all unit tests.
Nice job Dhruba.
Ted
On 2011-09-29 19:27:36, Dhruba Borthakur wrote:
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2116/
-----------------------------------------------------------
(Updated 2011-09-29 19:27:36)
Review request for hbase and Ted Yu.
Summary
-------
The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. Introduced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS.
There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result. There is a benchmark TestHLogBench that measures the performance of the appendNoSync call.
This addresses bug HBASE-4487 .
https://issues.apache.org/jira/browse/HBASE-4487
Diffs
-----
/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1177401
/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 1177401
/src/test/java/org/apache/hadoop/hbase/regionserver/TestIncrement.java PRE-CREATION
/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogBench.java PRE-CREATION
/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALActionsListener.java 1177401
/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java 1177401
Diff: https://reviews.apache.org/r/2116/diff
Testing
-------
All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch.
Thanks,
Dhruba

This is like a group commit in HLog. We have a group committing going on inside in sync down in dfsclient. Does this group commit not 'group' enough.

With this change, its no longer possible to sync each increment. We ok w/ that? We should at least release note this difference in increment behaviors and erring on the conservative side, I'd mark this an 'incompatible change' rather than an 'improvement' just so its more likely folks will know of this changed increment behavior.

The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. Introduced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS.

There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result. There is a benchmark TestHLogBench that measures the performance of the appendNoSync call.

All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch.

jiraposter@reviews.apache.org
added a comment - 29/Sep/11 23:31
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2116/#review2186
-----------------------------------------------------------
Looks good. I like the bench marking tool.
This is like a group commit in HLog. We have a group committing going on inside in sync down in dfsclient. Does this group commit not 'group' enough.
With this change, its no longer possible to sync each increment. We ok w/ that? We should at least release note this difference in increment behaviors and erring on the conservative side, I'd mark this an 'incompatible change' rather than an 'improvement' just so its more likely folks will know of this changed increment behavior.
Michael
On 2011-09-29 19:27:36, Dhruba Borthakur wrote:
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2116/
-----------------------------------------------------------
(Updated 2011-09-29 19:27:36)
Review request for hbase and Ted Yu.
Summary
-------
The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. Introduced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS.
There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result. There is a benchmark TestHLogBench that measures the performance of the appendNoSync call.
This addresses bug HBASE-4487 .
https://issues.apache.org/jira/browse/HBASE-4487
Diffs
-----
/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1177401
/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 1177401
/src/test/java/org/apache/hadoop/hbase/regionserver/TestIncrement.java PRE-CREATION
/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogBench.java PRE-CREATION
/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALActionsListener.java 1177401
/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java 1177401
Diff: https://reviews.apache.org/r/2116/diff
Testing
-------
All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch.
Thanks,
Dhruba

> This is like a group commit in HLog. We have a group committing going on inside in sync down in dfsclient. Does this group commit not 'group' enough.

>

> With this change, its no longer possible to sync each increment. We ok w/ that? We should at least release note this difference in increment behaviors and erring on the conservative side, I'd mark this an 'incompatible change' rather than an 'improvement' just so its more likely folks will know of this changed increment behavior.

HRegion has:

final Configuration conf;

Can we pass new config parameter through conf which enables this new feature ?

The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. Introduced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS.

There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result. There is a benchmark TestHLogBench that measures the performance of the appendNoSync call.

All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch.

jiraposter@reviews.apache.org
added a comment - 30/Sep/11 01:29
On 2011-09-29 22:31:37, Michael Stack wrote:
> Looks good. I like the bench marking tool.
>
> This is like a group commit in HLog. We have a group committing going on inside in sync down in dfsclient. Does this group commit not 'group' enough.
>
> With this change, its no longer possible to sync each increment. We ok w/ that? We should at least release note this difference in increment behaviors and erring on the conservative side, I'd mark this an 'incompatible change' rather than an 'improvement' just so its more likely folks will know of this changed increment behavior.
HRegion has:
final Configuration conf;
Can we pass new config parameter through conf which enables this new feature ?
Ted
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2116/#review2186
-----------------------------------------------------------
On 2011-09-29 19:27:36, Dhruba Borthakur wrote:
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2116/
-----------------------------------------------------------
(Updated 2011-09-29 19:27:36)
Review request for hbase and Ted Yu.
Summary
-------
The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. Introduced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS.
There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result. There is a benchmark TestHLogBench that measures the performance of the appendNoSync call.
This addresses bug HBASE-4487 .
https://issues.apache.org/jira/browse/HBASE-4487
Diffs
-----
/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1177401
/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 1177401
/src/test/java/org/apache/hadoop/hbase/regionserver/TestIncrement.java PRE-CREATION
/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogBench.java PRE-CREATION
/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALActionsListener.java 1177401
/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java 1177401
Diff: https://reviews.apache.org/r/2116/diff
Testing
-------
All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch.
Thanks,
Dhruba

How does that relate to the logic in HLog.syncer, which explicitly syncs without holding the updateLock.
If I read the patch correctly it adds new sync calls that are performed with the updateLock held (locking out all updates while the sync operation is in progress).

Lars Hofhansl
added a comment - 30/Sep/11 03:42 How does that relate to the logic in HLog.syncer, which explicitly syncs without holding the updateLock.
If I read the patch correctly it adds new sync calls that are performed with the updateLock held (locking out all updates while the sync operation is in progress).

The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. Introduced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS.

There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result. There is a benchmark TestHLogBench that measures the performance of the appendNoSync call.

All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch.

jiraposter@reviews.apache.org
added a comment - 30/Sep/11 06:02
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2116/#review2206
-----------------------------------------------------------
/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
< https://reviews.apache.org/r/2116/#comment5140 >
Am not sure. Just to clarify
Is this check not needed here then?
ramkrishna
On 2011-09-29 19:27:36, Dhruba Borthakur wrote:
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2116/
-----------------------------------------------------------
(Updated 2011-09-29 19:27:36)
Review request for hbase and Ted Yu.
Summary
-------
The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. Introduced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS.
There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result. There is a benchmark TestHLogBench that measures the performance of the appendNoSync call.
This addresses bug HBASE-4487 .
https://issues.apache.org/jira/browse/HBASE-4487
Diffs
-----
/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1177401
/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 1177401
/src/test/java/org/apache/hadoop/hbase/regionserver/TestIncrement.java PRE-CREATION
/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogBench.java PRE-CREATION
/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALActionsListener.java 1177401
/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java 1177401
Diff: https://reviews.apache.org/r/2116/diff
Testing
-------
All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch.
Thanks,
Dhruba

> This is like a group commit in HLog. We have a group committing going on inside in sync down in dfsclient.

The DFSClient write code path is pretty heavyweight (as it stands now), as shown in the code DFSClient.writeChunk(). It acquires a synchronized section, creates packets, stuffs crcs in it, acquires locks for the outgoing queue and queues the packet. And all this is done while holding the HLog.updateLock, which means it blocks out any other writes to the HLog for this time. Instead, this patch queues the write-data completely inside the hbase code and the HLog.updateLock is released much much earlier.

> If I read the patch correctly it adds new sync calls that are performed with the updateLock held

There are no additional code paths that this patch introduced where a HLog.sync occurs within the HLog.updateLock. The only pre-existing place where a sync calls occurs within the HLog.updateLock is in HLog.cleanupCurrentWriter. This code path is invoked only during a logroll; and during a logroll all currently executing transactions are anyways quiesced, so I see no impact to concurrency in this code path. Please let me know if you disagree.

> if (this.closed) : Is this check not needed here then?

I think this code is still needed, this check was not introduced by this patch. Any reasons why you think it might not be needed?

> Can we pass new config parameter through conf which enables this new feature ?

I can surely make this configurable. But I am doubtful if anybody would care to run with this feature switched off (since it is a performance fix with potentially no downsides for any usecase). But if you feel strongly about it, I will make it configurable. However, most of the common code in HLog will not be inside this configurable parameter, only the change in HRegion.increment() can be switched on/off via this new config. Please let me know.

dhruba borthakur
added a comment - 30/Sep/11 06:25 > This is like a group commit in HLog. We have a group committing going on inside in sync down in dfsclient.
The DFSClient write code path is pretty heavyweight (as it stands now), as shown in the code DFSClient.writeChunk(). It acquires a synchronized section, creates packets, stuffs crcs in it, acquires locks for the outgoing queue and queues the packet. And all this is done while holding the HLog.updateLock, which means it blocks out any other writes to the HLog for this time. Instead, this patch queues the write-data completely inside the hbase code and the HLog.updateLock is released much much earlier.
> If I read the patch correctly it adds new sync calls that are performed with the updateLock held
There are no additional code paths that this patch introduced where a HLog.sync occurs within the HLog.updateLock. The only pre-existing place where a sync calls occurs within the HLog.updateLock is in HLog.cleanupCurrentWriter. This code path is invoked only during a logroll; and during a logroll all currently executing transactions are anyways quiesced, so I see no impact to concurrency in this code path. Please let me know if you disagree.
> if (this.closed) : Is this check not needed here then?
I think this code is still needed, this check was not introduced by this patch. Any reasons why you think it might not be needed?
> Can we pass new config parameter through conf which enables this new feature ?
I can surely make this configurable. But I am doubtful if anybody would care to run with this feature switched off (since it is a performance fix with potentially no downsides for any usecase). But if you feel strongly about it, I will make it configurable. However, most of the common code in HLog will not be inside this configurable parameter, only the change in HRegion.increment() can be switched on/off via this new config. Please let me know.

Hi Ramakrishna, from my understanding, HLog.syncer() used to acquire the updateLock and then check this.closed, then used to release the updateLock and proceed with the sync. So, essentially the check was providing no protection against the concurrent execution of the body of the code in syncer() vs some other thread marking this.closed to true.

But I agree with you, that even prior to this patch, there is a rare possibility that a region server shutdown (via HLog.close) can race with a HLog.sync() that is in progress and cause exceptions. If I can recreate that race in a test case, I will submit it as part of another JIRA, does that sound reasonable?

dhruba borthakur
added a comment - 30/Sep/11 17:46 Hi Ramakrishna, from my understanding, HLog.syncer() used to acquire the updateLock and then check this.closed, then used to release the updateLock and proceed with the sync. So, essentially the check was providing no protection against the concurrent execution of the body of the code in syncer() vs some other thread marking this.closed to true.
But I agree with you, that even prior to this patch, there is a rare possibility that a region server shutdown (via HLog.close) can race with a HLog.sync() that is in progress and cause exceptions. If I can recreate that race in a test case, I will submit it as part of another JIRA, does that sound reasonable?

dhruba borthakur
added a comment - 30/Sep/11 17:50 > I'd mark this an 'incompatible change' rather than an 'improvement' just so its more likely folks
I could not find the place where to mark it as "incompatible". Can you pl tell me where to mark it as such?

I could not find the place where to mark it as "incompatible". Can you pl tell me where to mark it as such?

This is a committer requirement Dhruba, don't worry about it, unless you disagree about the characterization. There is a checkbox to mark when resolving the issue and where the changelog line goes in CHANGES.txt is different.

Andrew Purtell
added a comment - 30/Sep/11 17:56 I could not find the place where to mark it as "incompatible". Can you pl tell me where to mark it as such?
This is a committer requirement Dhruba, don't worry about it, unless you disagree about the characterization. There is a checkbox to mark when resolving the issue and where the changelog line goes in CHANGES.txt is different.

My comment lines up with Stack's on the code review, basically: This is a new code path for increments, and it means that one cannot guarantee that any individual increment has been committed to the WAL. For the increment case this walks back some representations we make about HBase for the sake of performance. I have mixed feelings about that. This should be configurable. Happy to agree the default is "enabled".

I believe there is another JIRA somewhere about making different config examples for users depending on their risk tolerance or interest in performance. Enabling this and deferred flushing etc. makes sense in a "performance" profile; conversely turned off in a "strict" profile.

Andrew Purtell
added a comment - 30/Sep/11 18:02 My comment lines up with Stack's on the code review, basically: This is a new code path for increments, and it means that one cannot guarantee that any individual increment has been committed to the WAL. For the increment case this walks back some representations we make about HBase for the sake of performance. I have mixed feelings about that. This should be configurable. Happy to agree the default is "enabled".
I believe there is another JIRA somewhere about making different config examples for users depending on their risk tolerance or interest in performance. Enabling this and deferred flushing etc. makes sense in a "performance" profile; conversely turned off in a "strict" profile.

> one cannot guarantee that any individual increment has been committed to the WAL.

Even prior to this patch, individual increments were batched-synced-by-hdfs, so there was no guarantee that each increment operation resulted in a sync.

From my understanding, the contract for hbase clients is that an increment operation is deemed complete only after it has been sycned to the transaction log. This contract is valid prior to this patch and this patch does not change this contract. The server does not return the RPC response back to the client until the increment operation has been sycned to the wal.

Also, the increment operation does not follow the RWCC rules, thereby exposing uncommitted increment operations to readers. This patch does not change that behaviour either.

dhruba borthakur
added a comment - 30/Sep/11 18:11 > one cannot guarantee that any individual increment has been committed to the WAL.
Even prior to this patch, individual increments were batched-synced-by-hdfs, so there was no guarantee that each increment operation resulted in a sync.
From my understanding, the contract for hbase clients is that an increment operation is deemed complete only after it has been sycned to the transaction log. This contract is valid prior to this patch and this patch does not change this contract. The server does not return the RPC response back to the client until the increment operation has been sycned to the wal.
Also, the increment operation does not follow the RWCC rules, thereby exposing uncommitted increment operations to readers. This patch does not change that behaviour either.
is there something I am missing?

looking over the patch, I think I agree with Dhruba - dropping the row locks before syncing would only have effects on ACID if we were using row locks for visibility. But, we use RWCC/MVCC for visibility. So which guarantee are we losing?

Todd Lipcon
added a comment - 30/Sep/11 18:14 looking over the patch, I think I agree with Dhruba - dropping the row locks before syncing would only have effects on ACID if we were using row locks for visibility. But, we use RWCC/MVCC for visibility. So which guarantee are we losing?

Andrew Purtell
added a comment - 30/Sep/11 18:25 Even prior to this patch, individual increments were batched-synced-by-hdfs, so there was no guarantee that each increment operation resulted in a sync.
Point taken.

@ramkrishna and @dhruba. I agree with ramkrishna that the if (this.closed) is still needed.
True, there already was a race and that was the point of HBASE-4387.
We want to sync without the updateLock held, while at the same make sure we don't sync unless we need to (i.e. HLog not closed) and if we fail (which will be rare) without the lock we retry with the lock.

Lars Hofhansl
added a comment - 30/Sep/11 18:37 @ramkrishna and @dhruba. I agree with ramkrishna that the if (this.closed) is still needed.
True, there already was a race and that was the point of HBASE-4387 .
We want to sync without the updateLock held, while at the same make sure we don't sync unless we need to (i.e. HLog not closed) and if we fail (which will be rare) without the lock we retry with the lock.

+1 on v7 (IIUC, Ram and Lars close issue is addressed by the call to sync if edits and then the close). From the back and forth above, my take away is this patch makes the grouping fatter but doesn't change current semantics – we are as 'broke' as we were before this patch. I no longer think this an incompatible change. We should release note the fatter grouping of increments on commit.

stack
added a comment - 30/Sep/11 20:28 +1 on v7 (IIUC, Ram and Lars close issue is addressed by the call to sync if edits and then the close). From the back and forth above, my take away is this patch makes the grouping fatter but doesn't change current semantics – we are as 'broke' as we were before this patch. I no longer think this an incompatible change. We should release note the fatter grouping of increments on commit.

Hi Lars, this is a performance fix, so it is unlikely to be a candidate for 0.92. How about we make frequent releases (say 0.94? coming out sooner rather than later) rather than porting these to existing release-candidates?

dhruba borthakur
added a comment - 02/Oct/11 06:46 Hi Lars, this is a performance fix, so it is unlikely to be a candidate for 0.92. How about we make frequent releases (say 0.94? coming out sooner rather than later) rather than porting these to existing release-candidates?

ShiXing
added a comment - 13/Jun/12 09:56 @ Ted Yu or @ dhruba borthakur
I have a question about the code in the trunk:
try {
Integer lid = getLock(lockid, row, true );
this .updatesLock.readLock().lock();
try {
xxxx
} finally {
this .updatesLock.readLock().unlock();
releaseRowLock(lid);
}
if (writeToWAL) {
this .log.sync(txid); // sync the transaction log outside the rowlock
}
} finally {
closeRegionOperation();
}
What will happen if
this .log.sync(txid);
failed?
As I know, the RS will tell client that this op failed, and the client will retry. So MemStore and hlog on HDFS are inconsistent.
Am I right? Or the RS exits if an HDFS write/sync fails as Dhruba said?