[ https://issues.apache.org/jira/browse/PHOENIX-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061051#comment-16061051
]
chenglei edited comment on PHOENIX-3964 at 6/23/17 3:02 PM:
------------------------------------------------------------
[~ankit@apache.org],thank you for review,
bq. IMO, index regions will come up on all regionserver before data region during the race.
I just mean when we write index regions on other RegionServers, we can not guarantee the
index regions on other RegionServers is online.
bq. No, it will kill the region server because of KillServerOnFailurePolicy. And, We will
lose the cached edits and not sure we will get a chance to replay them again.
the method name {{IndexWriter.writeAndKillYourselfOnFailure}} is very misleading, maybe we
can rename it to {{IndexWriter.writeAndHandleExceptionOnFailure}}, actually the default IndexFailurePolicy
used by IndexWriter is PhoenixIndexFailurePolicy, not KillServerOnFailurePolicy, PhoenixIndexFailurePolicy
will not kill the RegionServer if it handles exception successfully, and even if it would
kill the RegionServer in the worst case, the region's {{recovered.edits}} file folder is not
deleted,and surely we have a chance to replay them again.
bq.I'm now becoming sceptical about this change, cached WAL replay after postOpen could overwrite
the new index writes, if there are new overlapped writes coming to the data table(which eventually
also written to index table) because now the region is open and available.(May result in data
loss)
Cached WALs replay in postOpen would not overwrite the new index writes,because when we builded
the index updates cells in Indexer.preBatchMutate previously, we set the timestamp of cells
in PhoenixIndexCodec, so even there are new overlapped writes now, because the timestamp,
it will not cause data overwrite or loss.
was (Author: comnetwork):
[~ankit@apache.org],thank you for review,
bq. IMO, index regions will come up on all regionserver before data region during the race.
I just mean when we write index regions on other RegionServers, we can not guarantee the
index regions on other RegionServers is online.
bq. No, it will kill the region server because of KillServerOnFailurePolicy. And, We will
lose the cached edits and not sure we will get a chance to replay them again.
the method name {{IndexWriter.writeAndKillYourselfOnFailure}} is very misleading, maybe we
can rename it to {{IndexWriter.writeAndHandleExceptionOnFailure}}, actually the default IndexFailurePolicy
used by IndexWriter is PhoenixIndexFailurePolicy, not KillServerOnFailurePolicy, PhoenixIndexFailurePolicy
will not kill the RegionServer if it handles exception successes, and even if it would kill
the RegionServer in the worst case, the region's {{recovered.edits}} file folder is not deleted,and
surely we have a chance to replay them again.
bq.I'm now becoming sceptical about this change, cached WAL replay after postOpen could overwrite
the new index writes, if there are new overlapped writes coming to the data table(which eventually
also written to index table) because now the region is open and available.(May result in data
loss)
Cached WALs replay in postOpen would not overwrite the new index writes,because when we builded
the index updates cells in Indexer.preBatchMutate previously, we set the timestamp of cells
in PhoenixIndexCodec, so even there are new overlapped writes now, because the timestamp,
it will not cause data overwrite or loss.
> Index.preWALRestore should handle index write failure
> -----------------------------------------------------
>
> Key: PHOENIX-3964
> URL: https://issues.apache.org/jira/browse/PHOENIX-3964
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.10.0
> Reporter: chenglei
> Attachments: PHOENIX-3964_v1.patch
>
>
> When I restarted my hbase cluster a certain time, I noticed some regions are in FAILED_OPEN
state and the RegionServers have some error logs as following:
> {code:java}
> 2017-06-20 12:31:30,493 ERROR [RS_OPEN_REGION-rsync:60020-3] handler.OpenRegionHandler:
Failed open of region=BIZARCH_NS_PRODUCT.BIZTRACER_SPAN,0100134e-7ddf-4d13-ab77-6f0d683e5fad_0,1487594358223.57a7be72f9beaeb4285529ac6236f4e5.,
starting to roll back the global memstore size.
> org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException: Failed to write
to multiple index tables
> at org.apache.phoenix.hbase.index.write.recovery.TrackingParallelWriterIndexCommitter.write(TrackingParallelWriterIndexCommitter.java:221)
> at org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:185)
> at org.apache.phoenix.hbase.index.write.RecoveryIndexWriter.write(RecoveryIndexWriter.java:75)
> at org.apache.phoenix.hbase.index.Indexer.preWALRestore(Indexer.java:554)
> at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$58.call(RegionCoprocessorHost.java:1312)
> at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1517)
> at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1592)
> at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1549)
> at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preWALRestore(RegionCoprocessorHost.java:1308)
> at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:3338)
> at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:3220)
> at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:823)
> at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:716)
> at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:687)
> at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4596)
> at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4566)
> at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4538)
> at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4494)
> at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4445)
> at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:465)
> at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:139)
> at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> When I look the code of Index.preWALRestore method, I find RecoveryIndexWriter.write
method is used to write the indexUpdates in following line 543:
>
> {code:java}
>
> 526 public void preWALRestore(ObserverContext<RegionCoprocessorEnvironment> env,
HRegionInfo info,
> 527 HLogKey logKey, WALEdit logEdit) throws IOException {
> 528 if (this.disabled) {
> 529 super.preWALRestore(env, info, logKey, logEdit);
> 530 return;
> 531 }
> 532 // TODO check the regions in transition. If the server on which the region lives
is this one,
> 533 // then we should rety that write later in postOpen.
> 534 // we might be able to get even smarter here and pre-split the edits that are
server-local
> 535 // into their own recovered.edits file. This then lets us do a straightforward
recovery of each
> 536 // region (and more efficiently as we aren't writing quite as hectically from
this one place).
> 537
> 538 /*
> 539 * Basically, we let the index regions recover for a little while long before
retrying in the
> 540 * hopes they come up before the primary table finishes.
> 541 */
> 542 Collection<Pair<Mutation, byte[]>> indexUpdates = extractIndexUpdate(logEdit);
> 543 recoveryWriter.write(indexUpdates, true);
> 544 }
> {code}
> but the RecoveryIndexWriter.write method is as following, it directly throws Exception
except non-existing tables, so RecoveryIndexWriter's failurePolicy(which is StoreFailuresInCachePolicy
by default) even has no opportunity to be used, and it leads to Index.failedIndexEdits which
is filled by the StoreFailuresInCachePolicy is always empty.
> {code:java}
> public void write(Collection<Pair<Mutation, byte[]>> toWrite, boolean allowLocalUpdates)
throws IOException {
> try {
> write(resolveTableReferences(toWrite), allowLocalUpdates);
> } catch (MultiIndexWriteFailureException e) {
> for (HTableInterfaceReference table : e.getFailedTables()) {
> if (!admin.tableExists(table.getTableName())) {
> LOG.warn("Failure due to non existing table: " + table.getTableName());
> nonExistingTablesList.add(table);
> } else {
> throw e;
> }
> }
> }
> }
> {code}
> So the Index.postOpen method seems useless, because the updates Multimap in following
500 line which is got from Index.failedIndexEdits is always empty.
> {code:java}
> 499 public void postOpen(final ObserverContext<RegionCoprocessorEnvironment> c)
{
> 500 Multimap<HTableInterfaceReference, Mutation> updates = failedIndexEdits.getEdits(c.getEnvironment().getRegion());
> 501
> 502 if (this.disabled) {
> 503 super.postOpen(c);
> 504 return;
> 505 }
> 506
> 507 //if we have no pending edits to complete, then we are done
> 508 if (updates == null || updates.size() == 0) {
> 509 return;
> 510 }
> 511
> 512 LOG.info("Found some outstanding index updates that didn't succeed during"
> 513 + " WAL replay - attempting to replay now.");
> 514
> 515 // do the usual writer stuff, killing the server again, if we can't manage to
make the index
> 516 // writes succeed again
> 517 try {
> 518 writer.writeAndKillYourselfOnFailure(updates, true);
> 519 } catch (IOException e) {
> 520 LOG.error("During WAL replay of outstanding index updates, "
> 521 + "Exception is thrown instead of killing server during index
writing", e);
> 522 }
> 523 }
> {code}
> So I think in Index.preWALRestore method, we should use RecoveryWriter.writeAndKillYourselfOnFailure
method to write the indexUpdates and handle index write failure, not just use the RecoveryIndexWriter.write
method.
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)