Details

Don't do compaction on the current delta if it has a file in bucket pattern but not compactable

Description

hive hcatalog streaming will also create a file like bucket_n_flush_length in each delta directory. Where "n" is the bucket number. But the compactor.CompactorMR think this file also needs to compact. However this file of course cannot be compacted, so compactor.CompactorMR will not continue to do the compaction.

Did a test, after removed the bucket_n_flush_length file, then the "alter table partition compact" finished successfully. If don't delete that file, nothing will be compacted.
This is probably a very severity bug. Both 0.13 and 0.14 have this issue

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-common: Compilation failure: Compilation failure:
[ERROR] /Users/noland/workspaces/hive-apache/hive/common/src/java/org/apache/hadoop/hive/common/ValidTxnListImpl.java:[23,8] org.apache.hadoop.hive.common.ValidTxnListImpl is not abstract and does not override abstract method getInvalidTransactions() in org.apache.hadoop.hive.common.ValidTxnList
[ERROR] /Users/noland/workspaces/hive-apache/hive/common/src/java/org/apache/hadoop/hive/common/ValidTxnListImpl.java:[46,3] method does not override or implement a method from a supertype
[ERROR] /Users/noland/workspaces/hive-apache/hive/common/src/java/org/apache/hadoop/hive/common/ValidTxnListImpl.java:[54,3] method does not override or implement a method from a supertype
[ERROR] /Users/noland/workspaces/hive-apache/hive/common/src/java/org/apache/hadoop/hive/common/ValidTxnListImpl.java:[121,3] method does not override or implement a method from a supertype
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <goals> -rf :hive-common

Brock Noland
added a comment - 26/Jan/15 23:11 Looks like this was committed but I am seeing:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-common: Compilation failure: Compilation failure:
[ERROR] /Users/noland/workspaces/hive-apache/hive/common/src/java/org/apache/hadoop/hive/common/ValidTxnListImpl.java:[23,8] org.apache.hadoop.hive.common.ValidTxnListImpl is not abstract and does not override abstract method getInvalidTransactions() in org.apache.hadoop.hive.common.ValidTxnList
[ERROR] /Users/noland/workspaces/hive-apache/hive/common/src/java/org/apache/hadoop/hive/common/ValidTxnListImpl.java:[46,3] method does not override or implement a method from a supertype
[ERROR] /Users/noland/workspaces/hive-apache/hive/common/src/java/org/apache/hadoop/hive/common/ValidTxnListImpl.java:[54,3] method does not override or implement a method from a supertype
[ERROR] /Users/noland/workspaces/hive-apache/hive/common/src/java/org/apache/hadoop/hive/common/ValidTxnListImpl.java:[121,3] method does not override or implement a method from a supertype
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <goals> -rf :hive-common

Alan Gates
added a comment - 24/Jan/15 01:39 Final version of the patch. Moved ValidCompactorTxnList per Owen's request. Also made small changes to StreamingIntegrationTester to make it work properly in cases where you want it to go slowly.

After a little more thought, I'm worried that someone will accidentally create a ValidCompactorTxnList and get confused by the different behavior. I think it would make sense to move it into the compactor package to minimize the chance that someone accidentally uses it by mistake.

Owen O'Malley
added a comment - 21/Jan/15 00:53 After a little more thought, I'm worried that someone will accidentally create a ValidCompactorTxnList and get confused by the different behavior. I think it would make sense to move it into the compactor package to minimize the chance that someone accidentally uses it by mistake.

Owen O'Malley pointed out that I need to change the implementation of ValidCompactorTxnList.isTxnValid to return false for aborted transactions so that aborted records aren't carried forward in compacted files.

Alan Gates
added a comment - 12/Jan/15 23:56 Owen O'Malley pointed out that I need to change the implementation of ValidCompactorTxnList.isTxnValid to return false for aborted transactions so that aborted records aren't carried forward in compacted files.

This patch takes a new approach. Rather than changing AcidUtils.getAcidState (as previous 2 attempts) this patch gives a new implementation of ValidTxnList that only returns isTxnRangeValid ALL or NONE, and gives NONE if there are any open transactions <= the max transaction in the range (even if it's below the range). This new implementation is used only by the compactor so that it's understanding of what files it should compact are different than what files a reader views as available for reading.

I've also added tests to TestCompactor to test compaction during streaming and compaction after a streamer has aborted and died without cleaning up.

Alan Gates
added a comment - 10/Jan/15 00:03 This patch takes a new approach. Rather than changing AcidUtils.getAcidState (as previous 2 attempts) this patch gives a new implementation of ValidTxnList that only returns isTxnRangeValid ALL or NONE, and gives NONE if there are any open transactions <= the max transaction in the range (even if it's below the range). This new implementation is used only by the compactor so that it's understanding of what files it should compact are different than what files a reader views as available for reading.
I've also added tests to TestCompactor to test compaction during streaming and compaction after a streamer has aborted and died without cleaning up.

The issue is that since the writer died with an unclosed batch it left the orc file in a state where it cannot be read without the length file. So removing the length file means any reader will fail when reading it.

The proper solution is for the compactor to stop at that partition until it has determined all transactions in that file have committed or aborted. Then it should compact it using the length file, but properly ignore the length file. I'll work on the fix.

Alan Gates
added a comment - 09/Jan/15 00:32 The issue is that since the writer died with an unclosed batch it left the orc file in a state where it cannot be read without the length file. So removing the length file means any reader will fail when reading it.
The proper solution is for the compactor to stop at that partition until it has determined all transactions in that file have committed or aborted. Then it should compact it using the length file, but properly ignore the length file. I'll work on the fix.

2015-01-06 16:42:35,014 INFO [IPC Server handler 5 on 33406] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1419291043936_1639_m_000002_0: Error: java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkIndex(Buffer.java:532)
at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:369)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:311)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:464)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1232)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:510)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:489)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2015-01-06 16:42:35,016 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1419291043936_1639_m_000002_0: Error: java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkIndex(Buffer.java:532)
at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:369)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:311)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:464)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1232)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:510)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:489)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2015-01-06 16:42:40,345 INFO [IPC Server handler 2 on 33406] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1419291043936_1639_m_000002_1: Error: java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkIndex(Buffer.java:532)
at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:369)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:311)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:464)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1232)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:510)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:489)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2015-01-06 16:42:40,346 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1419291043936_1639_m_000002_1: Error: java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkIndex(Buffer.java:532)
at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:369)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:311)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:464)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1232)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:510)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:489)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2015-01-06 16:42:44,500 INFO [IPC Server handler 4 on 33406] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1419291043936_1639_m_000002_2: Error: java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkIndex(Buffer.java:532)
at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:369)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:311)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:464)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1232)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:510)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:489)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2015-01-06 16:42:44,500 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1419291043936_1639_m_000002_2: Error: java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkIndex(Buffer.java:532)
at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:369)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:311)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:464)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1232)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:510)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:489)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2015-01-06 16:42:50,434 INFO [IPC Server handler 4 on 33406] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1419291043936_1639_m_000002_3: Error: java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkIndex(Buffer.java:532)
at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:369)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:311)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:464)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1232)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:510)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:489)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2015-01-06 16:42:50,434 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1419291043936_1639_m_000002_3: Error: java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkIndex(Buffer.java:532)
at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:369)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:311)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:464)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1232)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:510)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:489)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Did a test. Generally the new version works as expected. But for the following case, the compaction will always fail:

1. due to any reason, the writer exits without closing a batch. So the "length" file is still there. This could happen, for example the program is killed, hive/server restarts.
2. restart the program, so a new writer and a new batch is created and continute to write into the same partition. The data will go to a new delta.
3. Now we manually delete that "length" file in the previous delta. Then do compaction, but it fails. Even we totally exit the program so that no any open batch and no any "length" file, the compaction will never success for this partition.

Jihong Liu
added a comment - 05/Jan/15 04:43 Did a test. Generally the new version works as expected. But for the following case, the compaction will always fail:
1. due to any reason, the writer exits without closing a batch. So the "length" file is still there. This could happen, for example the program is killed, hive/server restarts.
2. restart the program, so a new writer and a new batch is created and continute to write into the same partition. The data will go to a new delta.
3. Now we manually delete that "length" file in the previous delta. Then do compaction, but it fails. Even we totally exit the program so that no any open batch and no any "length" file, the compaction will never success for this partition.
However the current hive 14.0 will work fine for the above case.

A new version of the patch which properly handles not putting any deltas in the list once we see a delta with a flush length file. Unfortunately, Owen O'Malley who needs to review this is out for a couple of weeks. Jihong Liu, please take a look at this and test it in your environment.

Alan Gates
added a comment - 22/Dec/14 18:43 A new version of the patch which properly handles not putting any deltas in the list once we see a delta with a flush length file. Unfortunately, Owen O'Malley who needs to review this is out for a couple of weeks. Jihong Liu , please take a look at this and test it in your environment.

Right, makes sense. I need to think about whether it makes more sense to change AcidUtils.getAcidState to catch this as well or whether your approach of post processing it in the compactor makes more sense.

Alan Gates
added a comment - 10/Dec/14 17:11 Right, makes sense. I need to think about whether it makes more sense to change AcidUtils.getAcidState to catch this as well or whether your approach of post processing it in the compactor makes more sense.

Alan,
Your idea is very good. But there is an issue here – we should only do this "compacting" test for the most recent delta, not for all deltas. Following is an example for the reason:
Assume there are two deltas:
1 delta_00011_00020 this delta has open transaction batch
2 delta_00021_00030 this delta has no open transaction batch. All closed.

In the above, the first delta has open transaction batch, the second has not. And the second delta is the most recent delta. This case is possible, especially when multiple threads write to the same partition. If we ignore the first one, then the compaction will success and create a base, like base_00030. Then cleaner will delete all the two deltas since their transaction id are less or equal to the base transaction id. Thus the data in delta 2 will be lost. This is why we should only test the most recent delta, all other deltas will be automatically in the list. Thus in this case, the compaction will be fail, since the "flush_length" file is there. And for this case, the compaction will be success only when all transaction batchs are closed. Although it is not perfect, at least no data lost. Since each delta file and transaction id for compaction is not saved anywhere, probably this is the only solution for now.
In my removeNotCompactableDeltas() method, we first sort the deltas, then only check the last one. But the name: "removeNotCompactableDeltas" is not good, easy makes confusion. It will be clear if named it as "removeLastDeltaIfNotCompactable".
Thanks

Jihong Liu
added a comment - 10/Dec/14 06:00 Alan,
Your idea is very good. But there is an issue here – we should only do this "compacting" test for the most recent delta, not for all deltas. Following is an example for the reason:
Assume there are two deltas:
1 delta_00011_00020 this delta has open transaction batch
2 delta_00021_00030 this delta has no open transaction batch. All closed.
In the above, the first delta has open transaction batch, the second has not. And the second delta is the most recent delta. This case is possible, especially when multiple threads write to the same partition. If we ignore the first one, then the compaction will success and create a base, like base_00030. Then cleaner will delete all the two deltas since their transaction id are less or equal to the base transaction id. Thus the data in delta 2 will be lost. This is why we should only test the most recent delta, all other deltas will be automatically in the list. Thus in this case, the compaction will be fail, since the "flush_length" file is there. And for this case, the compaction will be success only when all transaction batchs are closed. Although it is not perfect, at least no data lost. Since each delta file and transaction id for compaction is not saved anywhere, probably this is the only solution for now.
In my removeNotCompactableDeltas() method, we first sort the deltas, then only check the last one. But the name: "removeNotCompactableDeltas" is not good, easy makes confusion. It will be clear if named it as "removeLastDeltaIfNotCompactable".
Thanks

Alan Gates
added a comment - 09/Dec/14 23:15 A new version of the patch that moves Jihong's code into AcidUtils.getAcidState so that delta directories with flush length files are not put into the list of files to compact.
Jihong Liu , could you test this on your end to make sure it addresses your issues. I'll also do some long running tests to see that it allows compaction while streaming is ongoing.

I see. Basically there are two solutions. One is that when get the delta list, we don't include the current delta if it has open tranaction. So uptate the AcidUtil.getAcidState() directly. The other is what I posted here. We first get the delta list, then when do compaction, we don't compact the last one if there is open transaction. The first solution is better as long as changing getAcidState() doesn't affact other existing code, since it is a public static method.
By the way, we should only do that to the current delta (the delta with the largest transaction id), not to all deltas which have open transactions. If I am correct, the base file will be named based on the largest transaction id in the deltas. So if the latest delta is closed, but an early delta has an open transaction, we should not do anything. So simply let the compaction fail. Otherwise, the base will be named by the last transaction id, and all early deltas will be removed. That will cause data lost. This is my understanding, please correct me, it it is not correct. Thanks

Jihong Liu
added a comment - 09/Dec/14 20:25 I see. Basically there are two solutions. One is that when get the delta list, we don't include the current delta if it has open tranaction. So uptate the AcidUtil.getAcidState() directly. The other is what I posted here. We first get the delta list, then when do compaction, we don't compact the last one if there is open transaction. The first solution is better as long as changing getAcidState() doesn't affact other existing code, since it is a public static method.
By the way, we should only do that to the current delta (the delta with the largest transaction id), not to all deltas which have open transactions. If I am correct, the base file will be named based on the largest transaction id in the deltas. So if the latest delta is closed, but an early delta has an open transaction, we should not do anything. So simply let the compaction fail. Otherwise, the base will be named by the last transaction id, and all early deltas will be removed. That will cause data lost. This is my understanding, please correct me, it it is not correct. Thanks

Rather than go remove these directories from the list of deltas I think it makes more sense to change Directory.getAcidState to not include these deltas. We obviously can't do that in all cases, as readers need to see these deltas. But we can change it to see that this is the compactor and therefore those should be excluded. I'll post a patch with this change.

Alan Gates
added a comment - 09/Dec/14 17:59 Rather than go remove these directories from the list of deltas I think it makes more sense to change Directory.getAcidState to not include these deltas. We obviously can't do that in all cases, as readers need to see these deltas. But we can change it to see that this is the compactor and therefore those should be excluded. I'll post a patch with this change.

Jihong Liu
added a comment - 07/Dec/14 20:00 I am confused about the QA test. The error looks like not related to HIVE-8966 .patch. First, was this patch really included in the build? Also this patch is for 0.14.1, not for trunk.

Jihong Liu
added a comment - 07/Dec/14 06:04 Alan,
I created a wrong patch about 1 hour ago. Before I removed it. QA automatically did the above test. Please ignore and look the current attached patch. I think it really solves the issue.

Hi Alan,I have created a new patch. It works fine. The patch is pasted in that jira, also added comment about the logic. Please have a look. Thanks and have a good dayJihong
From: Alan Gates (JIRA) <jira@apache.org>
To: jhliu08@yahoo.com
Sent: Friday, December 5, 2014 7:41 AM
Subject: [jira][Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted

We could change this to not compact the current delta file, or we could change the cleaner to not remove the delta file that was still open during compaction. I'll try to look at this in the next couple of days. We need to get this fixed for 0.14.1.

Jihong Liu
added a comment - 07/Dec/14 05:34 Hi Alan,I have created a new patch. It works fine. The patch is pasted in that jira, also added comment about the logic. Please have a look. Thanks and have a good dayJihong
From: Alan Gates (JIRA) <jira@apache.org>
To: jhliu08@yahoo.com
Sent: Friday, December 5, 2014 7:41 AM
Subject: [jira] [Commented] ( HIVE-8966 ) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235645#comment-14235645 ]
Alan Gates commented on HIVE-8966 :
----------------------------------
Jihong, thanks for doing the testing on this.
We could change this to not compact the current delta file, or we could change the cleaner to not remove the delta file that was still open during compaction. I'll try to look at this in the next couple of days. We need to get this fixed for 0.14.1.
–
This message was sent by Atlassian JIRA
(v6.3.4#6332)

By the way, hive may need another cleaning process which auto removes the bucket_n_flash_length file if the connection is actually closed. A program may not be able to close a transaction batch, due to many reasons, for example, network disconnected, server shutdown, application killed, and etc. So if the connection which creates a batch has been closed, that bucket_n_flash_length file needs to be removed. Otherwise that delta and the deltas after it can never be compacted unless we remove that file manually.

Jihong Liu
added a comment - 07/Dec/14 04:43 By the way, hive may need another cleaning process which auto removes the bucket_n_flash_length file if the connection is actually closed. A program may not be able to close a transaction batch, due to many reasons, for example, network disconnected, server shutdown, application killed, and etc. So if the connection which creates a batch has been closed, that bucket_n_flash_length file needs to be removed. Otherwise that delta and the deltas after it can never be compacted unless we remove that file manually.

Solution:
if the last delta has any file which is in bucket file pattern, but actually is non bucket file, don’t compact this delta. When a transaction is not close, a delta will have a file like bucket_n_flash_length, which is non bucket file. Actually for any reason, if the last delta has a file with bucket file pattern but not compactable, we should ignore this delta. Since after compaction, the delta will be removed. So if the whole delta cannot be compacted, leave it as what it is. So in the above scenario, the second delta will not be compacted. And the cleaner will not remove it because it has higher transaction id than the new created compaction file(base or delta).
The reason we only do the above for the last delta is to consider the case that two or more transaction batches may be created and the last one is close first. Then if the last delta gets compacted, the transaction id in the base will be big, so all deltas will be removed by cleaner. So data could be lost. In this case, in the list of deltas for compaction, at least one delta has that bucket_n_flash_length file inside. Since we do not ignore it, the compaction will be auto-fail, so nothing happen, no data lost. In this case, the compaction can only be done after all transaction batches are closed. Although it is not so good, at least no data lost.
The patch is attached. It adds one method to test whether needs to remove the last delta from the delta list. And before process the delta list, run that method. After applying this patch, no data is lost. We can do either major or minor compaction meanwhile keeping loading data in the same time.

Jihong Liu
added a comment - 07/Dec/14 04:42 Solution:
if the last delta has any file which is in bucket file pattern, but actually is non bucket file, don’t compact this delta. When a transaction is not close, a delta will have a file like bucket_n_flash_length, which is non bucket file. Actually for any reason, if the last delta has a file with bucket file pattern but not compactable, we should ignore this delta. Since after compaction, the delta will be removed. So if the whole delta cannot be compacted, leave it as what it is. So in the above scenario, the second delta will not be compacted. And the cleaner will not remove it because it has higher transaction id than the new created compaction file(base or delta).
The reason we only do the above for the last delta is to consider the case that two or more transaction batches may be created and the last one is close first. Then if the last delta gets compacted, the transaction id in the base will be big, so all deltas will be removed by cleaner. So data could be lost. In this case, in the list of deltas for compaction, at least one delta has that bucket_n_flash_length file inside. Since we do not ignore it, the compaction will be auto-fail, so nothing happen, no data lost. In this case, the compaction can only be done after all transaction batches are closed. Although it is not so good, at least no data lost.
The patch is attached. It adds one method to test whether needs to remove the last delta from the delta list. And before process the delta list, run that method. After applying this patch, no data is lost. We can do either major or minor compaction meanwhile keeping loading data in the same time.

The scenario of data lost:
Assume when start compaction there are two deltas, delta_00011_00020 and delta_00021_00030, where the transaction batch in the first one is closed, and the second one still has transaction batch open. After compaction is finished, the status in compaction_ queue will become “ready_for_clean”. Then clean process will be triggered. Cleaner will remove all deltas if its transaction id is less than the base which just created and if there is no lock on it. In the meantime, we still load data into the second delta. When finish loading and close the transaction batch, cleaner detects no lock on that, so delete it. So the new data added after compaction will be lost.

Jihong Liu
added a comment - 07/Dec/14 04:41 The scenario of data lost:
Assume when start compaction there are two deltas, delta_00011_00020 and delta_00021_00030, where the transaction batch in the first one is closed, and the second one still has transaction batch open. After compaction is finished, the status in compaction_ queue will become “ready_for_clean”. Then clean process will be triggered. Cleaner will remove all deltas if its transaction id is less than the base which just created and if there is no lock on it. In the meantime, we still load data into the second delta. When finish loading and close the transaction batch, cleaner detects no lock on that, so delete it. So the new data added after compaction will be lost.

We could change this to not compact the current delta file, or we could change the cleaner to not remove the delta file that was still open during compaction. I'll try to look at this in the next couple of days. We need to get this fixed for 0.14.1.

Alan Gates
added a comment - 05/Dec/14 15:40 Jihong, thanks for doing the testing on this.
We could change this to not compact the current delta file, or we could change the cleaner to not remove the delta file that was still open during compaction. I'll try to look at this in the next couple of days. We need to get this fixed for 0.14.1.

I think we may have to withdraw this patch for now. It looks like currently hive must not support doing compaction and loading in the same time for a partition.
Without this patch, if loading for a partition is not completely finished, compaction will always fail, so nothing happen. After apply this patch, compaction will go through and finish. However we may loss data! I did a test. Data could be lost if we do compaction meanwhile the loading is not finished yet.
But if keep the current version, it must be a limitation for hive. If streaming load to a partition for a long period, performance will be affected if cannot do compaction on it.

For completely solve this issue, my initial thinking is that the delta files with open transaction should not be compacted. Currently they must be inlcuded, and it is probably the reason for data lost. But other closed delta files should be able to compact. So we can do compaction and loading in the same time.

Jihong Liu
added a comment - 04/Dec/14 23:11 I think we may have to withdraw this patch for now. It looks like currently hive must not support doing compaction and loading in the same time for a partition.
Without this patch, if loading for a partition is not completely finished, compaction will always fail, so nothing happen. After apply this patch, compaction will go through and finish. However we may loss data! I did a test. Data could be lost if we do compaction meanwhile the loading is not finished yet.
But if keep the current version, it must be a limitation for hive. If streaming load to a partition for a long period, performance will be affected if cannot do compaction on it.
For completely solve this issue, my initial thinking is that the delta files with open transaction should not be compacted. Currently they must be inlcuded, and it is probably the reason for data lost. But other closed delta files should be able to compact. So we can do compaction and loading in the same time.

Ok, that makes sense. You're current delta has the file because it's still open and being written to. It also explains why my tests don't see it, as they don't run long enough. The streaming is always done by the time the compactor kicks in. Why don't you post a patch to this JIRA with the change for 1, and I can get that committed.

Gunther Hagleitner, I'd like to put this in 0.14.1 as well as trunk if you're ok with it, since it blocks compaction for users using the streaming interface.

Alan Gates
added a comment - 26/Nov/14 22:54 Ok, that makes sense. You're current delta has the file because it's still open and being written to. It also explains why my tests don't see it, as they don't run long enough. The streaming is always done by the time the compactor kicks in. Why don't you post a patch to this JIRA with the change for 1, and I can get that committed.
Gunther Hagleitner , I'd like to put this in 0.14.1 as well as trunk if you're ok with it, since it blocks compaction for users using the streaming interface.

That flush_length file is only in the most recent delta. By the way, for streaming loading, a transaction batch is probably always open since data keeps coming. Is it possible to do compaction in the streaming loading environment? Thanks

Jihong Liu
added a comment - 26/Nov/14 22:43 That flush_length file is only in the most recent delta. By the way, for streaming loading, a transaction batch is probably always open since data keeps coming. Is it possible to do compaction in the streaming loading environment? Thanks

1 might be the right thing to do. 2 breaks backward compatibility. Before we do that though I'd like to understand why you still see the flush length files hanging around. In my tests I don't see this issue because the flush length file is properly cleaned up. I want to make sure that its existence doesn't mean something else is wrong.

Do you see the flush length files in all delta directories or only the most recent?

Alan Gates
added a comment - 26/Nov/14 22:19 1 might be the right thing to do. 2 breaks backward compatibility. Before we do that though I'd like to understand why you still see the flush length files hanging around. In my tests I don't see this issue because the flush length file is properly cleaned up. I want to make sure that its existence doesn't mean something else is wrong.
Do you see the flush length files in all delta directories or only the most recent?

2. don't use the bucket file pattern to name to "flush_length" file. So update the following code:
in org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.java
change the following code:
static Path getSideFile(org.apache.tools.ant.types.Path main)