[ https://issues.apache.org/jira/browse/CASSANDRA-8589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sylvain Lebresne updated CASSANDRA-8589:
----------------------------------------
Fix Version/s: (was: 3.1)
2.2.x
2.1.x
> Reconciliation in presence of tombstone might yield state data
> --------------------------------------------------------------
>
> Key: CASSANDRA-8589
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8589
> Project: Cassandra
> Issue Type: Bug
> Reporter: Sylvain Lebresne
> Fix For: 2.1.x, 2.2.x
>
>
> Consider 3 replica A, B, C (so RF=3) and consider that we do the following sequence of
actions at {{QUORUM}} where I indicate the replicas acknowledging each operation (and let's
assume that a replica that don't ack is a replica that don't get the update):
> {noformat}
> CREATE TABLE test (k text, t int, v int, PRIMARY KEY (k, t))
> INSERT INTO test(k, t, v) VALUES ('k', 0, 0); // acked by A, B and C
> INSERT INTO test(k, t, v) VALUES ('k', 1, 1); // acked by A, B and C
> INSERT INTO test(k, t, v) VALUES ('k', 2, 2); // acked by A, B and C
> DELETE FROM test WHERE k='k' AND t=1; // acked by A and C
> UPDATE test SET v = 3 WHERE k='k' AND t=2; // acked by B and C
> SELECT * FROM test WHERE k='k' LIMIT 2; // answered by A and B
> {noformat}
> Every operation has achieved quorum, but on the last read, A will respond {{0->0,
tombstone 1, 2->2}} and B will respond {{0->0, 1->1}}. As a consequence we'll answer
{{0->0, 2->2}} which is incorrect (we should respond {{0->0, 2->3}}).
> Put another way, if we have a limit, every replica honors that limit but since tombstones
can "suppress" results from other nodes, we may have some cells for which we actually don't
get a quorum of response (even though we globally have a quorum of replica responses).
> In practice, this probably occurs rather rarely and so the "simpler" fix is probably
to do something similar to the "short reads protection": detect when this could have happen
(based on how replica response are reconciled) and do an additional request in that case.
That detection will have potential false positives but I suspect we can be precise enough
that those false positives will be very very rare (we should nonetheless track how often this
code gets triggered and if we see that it's more often than we think, we could pro-actively
bump user limits internally to reduce those occurrences).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)