[ https://issues.apache.org/jira/browse/HBASE-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623057#action_12623057
]
Jim Kellerman commented on HBASE-810:
-------------------------------------
This is very ugly. The locking as depicted in HBASE-316 is essentially correct.
If we wanted to be more responsive during a split, we should use a tryLock in HRegion.getScanner(...)
This would allow us to scan (if we get the lock) or split (and continue from the last scanned
row if we don't).
Maybe getScanner should do a synchronized(splitLock) as well.
Or, maybe splits should be more like cache flushes in that they only acquire a write lock
at the end, when they are ready to move new HStores into place? No, that won't work for splits
because splits require the master to reassign the children, whereas flushes and compactions
continue to be served from the same HRegionServer and the row range for HRegion is the same.
It appears as if a region is splitting, any outstanding scanners either need to finish scanning
the region (blocking the split) or the scanners need to be notified that a split is going
to happen and they need to wait and recalibrate.
> Prevent temporary deadlocks when, during a scan with write operations, the region splits
> ----------------------------------------------------------------------------------------
>
> Key: HBASE-810
> URL: https://issues.apache.org/jira/browse/HBASE-810
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.2.0
> Reporter: Jean-Daniel Cryans
> Priority: Blocker
> Fix For: 0.2.1, 0.3.0
>
>
> HBASE-804 was not about the good problem, this one is. Anyone that iterates through the
results of a scanner and that rewrites data back into the row at each iteration will hit a
UnknownScannerException if a split occurs. See the stack in the referred jira. Timeline :
> Split occurs, acquires a write lock and waits for scanners to finish
> The scanner in the custom code iterates and writes data until the write is blocked by
the lock
> deadlock
> The scanner timeouts thus the region splits but the USE will be thrown when next() is
called
> Inside a Map, the task will simply be retried when the first one fails. Elsewhere, it
becomes more complicated.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.