a) A first decrement numResponsePending from 2 to 1.
b) A increment validResponses from 0 to 1.
c) B then decrement numResponsePending from 1 to 0.
d) A check numResponsePending before B check validResponse, A found the numResponsePending is 0 now. A will callback with exception. But the right action is B check validResponse and callback with OK.

3) if an LegerHandle is opened by openLedgerNoRecovery, the lastAddConfirmed will be set to -1. so all read requests will be failed since readEntry id > lastAddConfirmed.

so I suggested that if an LegerHandle is opened by openLegerNoRecovery, the ledgerHandle is under unsafeRead mode. close/write operations will be failed, read operations should not check condition entry_id > lastAddConfirmed.

Activity

1) Yikes, that's a big oversight. There is actually a test for it, BookieReadWriteTest#testReadFromOpenLedger, but the @Test annotation is missing from it so it never gets run. Also, the actual checking code seems to be wrong, as it tries to read from lh, not lhOpen (line 861). Could you break the fix for this problem into a single patch along with the fix for the test and ill commit that as BOOKKEEPER-91.

2) This is unrelated to 1) so should be in a separate JIRA. Also, im unsure the race you describe can occur. ReadLastConfirmedOp#readEntryComplete is already synchronized.

3) Actually this could go into BOOKKEEPER-91. However, I think a better solution may be to do a ReadLastConfirmedOp in the else part of LedgerOpenOp#processResult.

This way, a non recovery ledger will be able to read entries up to the point it was opened and no further. I think this should be correct behaviour, as otherwise it could be possible for the ledger to read an entry which hasn't been confirmed to the writer. If it hasn't been confirmed to the writer and the writer closes at that point. Which means the reader can read more than the writer, which I don't think affects correctness, but is a little ugly.

Ivan Kelly
added a comment - 25/Oct/11 16:44 1) Yikes, that's a big oversight. There is actually a test for it, BookieReadWriteTest#testReadFromOpenLedger, but the @Test annotation is missing from it so it never gets run. Also, the actual checking code seems to be wrong, as it tries to read from lh, not lhOpen (line 861). Could you break the fix for this problem into a single patch along with the fix for the test and ill commit that as BOOKKEEPER-91 .
2) This is unrelated to 1) so should be in a separate JIRA. Also, im unsure the race you describe can occur. ReadLastConfirmedOp#readEntryComplete is already synchronized.
3) Actually this could go into BOOKKEEPER-91 . However, I think a better solution may be to do a ReadLastConfirmedOp in the else part of LedgerOpenOp#processResult.
if (!unsafe) {
lh.recover( new GenericCallback< Void >() {
@Override
public void operationComplete( int rc, Void result) {
if (rc != BKException.Code.OK) {
cb.openComplete(BKException.Code.LedgerRecoveryException, null , LedgerOpenOp. this .ctx);
} else {
cb.openComplete(BKException.Code.OK, lh, LedgerOpenOp. this .ctx);
}
}
} else {
lh.asyncReadLastConfirmed( new ReadLastConfirmedCallback() {
void readLastConfirmedComplete( int rc, long lastConfirmed, Object ctx) {
lh.lastAddConfirmed = lh.lastAddPushed = lastConfirmed;
cb.complete(rc, LedgerOpenOp. this .ctx);
}
});
}
This way, a non recovery ledger will be able to read entries up to the point it was opened and no further. I think this should be correct behaviour, as otherwise it could be possible for the ledger to read an entry which hasn't been confirmed to the writer. If it hasn't been confirmed to the writer and the writer closes at that point. Which means the reader can read more than the writer, which I don't think affects correctness, but is a little ugly.

Sijie Guo
added a comment - 26/Oct/11 03:13 Thanks for Ivan's suggestions.
fixes:
1) avoid two callbacks when readLastConfirmedOp
2) readLastConfirmedOp to set lastAddConfirmed when opening ledger no recovery. so the entries be read will all confirmed by writter.
3) add unsafeRead in LedgerHandle to avoid close/write on it.

I see you created BOOKKEEPER-94 for the test change. That change should actually be part of this JIRA. It's part 1) (The two callback changes) which should be in the other JIRA, as it's unrelated, whereas 2) & 3) and the fix to testing is all the same thing.

Regarding 2 & 3, these changes look good. However, I'd change the unsafeRead flag to be called readOnly. Also, add a logging line before the addComplete in asyncAddEntry saying that the client tried to write on a read only ledger handle.

Ivan Kelly
added a comment - 26/Oct/11 11:08 I see you created BOOKKEEPER-94 for the test change. That change should actually be part of this JIRA. It's part 1) (The two callback changes) which should be in the other JIRA, as it's unrelated, whereas 2) & 3) and the fix to testing is all the same thing.
Regarding 2 & 3, these changes look good. However, I'd change the unsafeRead flag to be called readOnly. Also, add a logging line before the addComplete in asyncAddEntry saying that the client tried to write on a read only ledger handle.

My previous comment was incomplete. The changes should be tested also. The whole reason the bug exists is a lack of testing in the first place. The easiest thing is to simply extend the BookieReadWriteTest for his case to ensure that add fails on lhOpen, and that the ledger metadata isn't closed after lhOpen is called.

Im still confused by the callback issue on readLastConfirmedOp. The only scenario where the callback can be called twice is where it recieves more responses than it has requests made. This discussion should continue on BOOKKEEPER-94.

Ivan Kelly
added a comment - 26/Oct/11 11:12 My previous comment was incomplete. The changes should be tested also. The whole reason the bug exists is a lack of testing in the first place. The easiest thing is to simply extend the BookieReadWriteTest for his case to ensure that add fails on lhOpen, and that the ledger metadata isn't closed after lhOpen is called.
Im still confused by the callback issue on readLastConfirmedOp. The only scenario where the callback can be called twice is where it recieves more responses than it has requests made. This discussion should continue on BOOKKEEPER-94 .