NPE during HH delivery when gossip turned off on target

Details

Description

probably not important bug

ERROR [OptionalTasks:1] 2011-12-27 21:44:25,342 AbstractCassandraDaemon.java (line 138) Fatal exception in thread Thread[OptionalTasks:1,5,main]
java.lang.NullPointerException
at org.cliffc.high_scale_lib.NonBlockingHashMap.hash(NonBlockingHashMap.java:113)
at org.cliffc.high_scale_lib.NonBlockingHashMap.putIfMatch(NonBlockingHashMap.java:553)
at org.cliffc.high_scale_lib.NonBlockingHashMap.putIfMatch(NonBlockingHashMap.java:348)
at org.cliffc.high_scale_lib.NonBlockingHashMap.putIfAbsent(NonBlockingHashMap.java:319)
at org.cliffc.high_scale_lib.NonBlockingHashSet.add(NonBlockingHashSet.java:32)
at org.apache.cassandra.db.HintedHandOffManager.scheduleHintDelivery(HintedHandOffManager.java:371)
at org.apache.cassandra.db.HintedHandOffManager.scheduleAllDeliveries(HintedHandOffManager.java:356)
at org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffManager.java:84)
at org.apache.cassandra.db.HintedHandOffManager$1.run(HintedHandOffManager.java:119)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

Brandon Williams
added a comment - 28/Dec/11 18:25 Can you confirm the version is 1.0.6? This only seems possible with CASSANDRA-3554 , because otherwise hint delivery only happens when a node comes up.

ERROR [OptionalTasks:1] 2011-12-29 10:59:06,482 AbstractCassandraDaemon.java (line 138) Fatal exception in thread Thread[OptionalTasks:1,5,main]
java.lang.NullPointerException
at org.cliffc.high_scale_lib.NonBlockingHashMap.hash(NonBlockingHashMap.java:113)
at org.cliffc.high_scale_lib.NonBlockingHashMap.putIfMatch(NonBlockingHashMap.java:553)
at org.cliffc.high_scale_lib.NonBlockingHashMap.putIfMatch(NonBlockingHashMap.java:348)
at org.cliffc.high_scale_lib.NonBlockingHashMap.putIfAbsent(NonBlockingHashMap.java:319)
at org.cliffc.high_scale_lib.NonBlockingHashSet.add(NonBlockingHashSet.java:32)
at org.apache.cassandra.db.HintedHandOffManager.scheduleHintDelivery(HintedHandOffManager.java:371)
at org.apache.cassandra.db.HintedHandOffManager.scheduleAllDeliveries(HintedHandOffManager.java:356)
at org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffManager.java:84)
at org.apache.cassandra.db.HintedHandOffManager$1.run(HintedHandOffManager.java:119)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

Radim Kolar
added a comment - 02/Jan/12 15:10 I suspect that this is happening when node is trying to deliver hints created for himself (to is null after all)
because i have some stuck hints, cant list them via CLI to look at keys because node will OOM.
INFO [CompactionExecutor:7] 2012-01-02 15:48:13,084 CompactionTask.java (line 221) Compacted to [/usr/local/cassandra/data/system/HintsColumnFamily-hc-1011-Data.db,] . 168,568,056 to 163,770,679 (~97% of original) bytes for 1 keys at 29.619551MB/s. Time: 5,273ms.
Its 2 node cluster and no hints were delivered to other node:
INFO [HintedHandoff:1] 2012-01-02 15:41:29,035 HintedHandOffManager.java (line 267) Started hinted handoff for token: 99070591730234615865843651857942052864
INFO [HintedHandoff:1] 2012-01-02 15:41:29,217 HintedHandOffManager.java (line 334) Finished hinted handoff of 0 rows to

ERROR [OptionalTasks:1] 2012-02-07 14:41:57,710 AbstractCassandraDaemon.java (line 134) Fatal exception in thread Thread[OptionalTasks:1,5,main]
java.lang.NullPointerException
at org.cliffc.high_scale_lib.NonBlockingHashMap.hash(NonBlockingHashMap.java:113)
at org.cliffc.high_scale_lib.NonBlockingHashMap.putIfMatch(NonBlockingHashMap.java:553)
at org.cliffc.high_scale_lib.NonBlockingHashMap.putIfMatch(NonBlockingHashMap.java:348)
at org.cliffc.high_scale_lib.NonBlockingHashMap.putIfAbsent(NonBlockingHashMap.java:319)
at org.cliffc.high_scale_lib.NonBlockingHashSet.add(NonBlockingHashSet.java:32)
at org.apache.cassandra.db.HintedHandOffManager.scheduleHintDelivery(HintedHandOffManager.java:410)
at org.apache.cassandra.db.HintedHandOffManager.scheduleAllDeliveries(HintedHandOffManager.java:395)
at org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffManager.java:84)
at org.apache.cassandra.db.HintedHandOffManager$1.run(HintedHandOffManager.java:119)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)

Adverse Effects:

The 10min repeating scheduleAllDeliveries() introduced in CASSANDRA-3554 will no longer fire (since the exception was uncaught), so the node has now regressed to pre-CASSANDRA-3554 behaviour

Even after the node is restarted, the exception will be thrown again next time the schedule is run.

This will continue until the row tombstone for the dead node's hints is garbage-collected (10 days)

Fix:

don't try to schedule delivery of hints for tokens which are no longer valid endpoints (see attached).

Sam Overton
added a comment - 07/Feb/12 15:08 I've been able to reproduce this in 1.07 and in trunk as follows:
create a cluster of 2 nodes
stop one of the nodes
insert some data at RF=1, CL=ANY - this causes hints to be stored
nodetool removetoken on the token of the dead node
some time up to 10 minutes later the exception is logged:
ERROR [OptionalTasks:1] 2012-02-07 14:41:57,710 AbstractCassandraDaemon.java (line 134) Fatal exception in thread Thread [OptionalTasks:1,5,main]
java.lang.NullPointerException
at org.cliffc.high_scale_lib.NonBlockingHashMap.hash(NonBlockingHashMap.java:113)
at org.cliffc.high_scale_lib.NonBlockingHashMap.putIfMatch(NonBlockingHashMap.java:553)
at org.cliffc.high_scale_lib.NonBlockingHashMap.putIfMatch(NonBlockingHashMap.java:348)
at org.cliffc.high_scale_lib.NonBlockingHashMap.putIfAbsent(NonBlockingHashMap.java:319)
at org.cliffc.high_scale_lib.NonBlockingHashSet.add(NonBlockingHashSet.java:32)
at org.apache.cassandra.db.HintedHandOffManager.scheduleHintDelivery(HintedHandOffManager.java:410)
at org.apache.cassandra.db.HintedHandOffManager.scheduleAllDeliveries(HintedHandOffManager.java:395)
at org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffManager.java:84)
at org.apache.cassandra.db.HintedHandOffManager$1.run(HintedHandOffManager.java:119)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Adverse Effects:
The 10min repeating scheduleAllDeliveries() introduced in CASSANDRA-3554 will no longer fire (since the exception was uncaught), so the node has now regressed to pre- CASSANDRA-3554 behaviour
Even after the node is restarted, the exception will be thrown again next time the schedule is run.
This will continue until the row tombstone for the dead node's hints is garbage-collected (10 days)
Fix:
don't try to schedule delivery of hints for tokens which are no longer valid endpoints (see attached).

Edward Capriolo
added a comment - 15/Feb/12 16:26 Q: don't try to schedule delivery of hints for tokens which are no longer valid endpoints (see attached).
Does this mean the hints are just lost for good? Does anyone thing an attempt should be made to determine where the hints should go and possibly re-write them?

It doesn't make a lot of sense. a) we have no way to quickly find such hints, and b) if you removetoken the node, the data from existing replicas will be copied to restore the RF, so the hint isn't necessary (unless you wrote at ANY, in which case you've already lived dangerously.)

Brandon Williams
added a comment - 15/Feb/12 17:00 It doesn't make a lot of sense. a) we have no way to quickly find such hints, and b) if you removetoken the node, the data from existing replicas will be copied to restore the RF, so the hint isn't necessary (unless you wrote at ANY, in which case you've already lived dangerously.)

When a token is removed, hints intended for that endpoint are removed (see StorageService.excise(Token token, InetAddress endpoint)) so yes, they are lost for good.

The removetoken process involves streaming from replicas to the new endpoint, so it should be up to date assuming writes were at CL > ANY. I can't think of a case where we would gain anything by delivering the hints for the removed endpoint (except where writes were at CL.ANY).

Sam Overton
added a comment - 15/Feb/12 17:03 When a token is removed, hints intended for that endpoint are removed (see StorageService.excise(Token token, InetAddress endpoint)) so yes, they are lost for good.
The removetoken process involves streaming from replicas to the new endpoint, so it should be up to date assuming writes were at CL > ANY. I can't think of a case where we would gain anything by delivering the hints for the removed endpoint (except where writes were at CL.ANY).

It seems wrong to abandon hints. Arguments like such as "(unless you wrote at ANY, in which case you've already lived dangerously.)" are a slippery slope, and it says something about the contract of ANY.

According to Cassandra.thrift.

ANY Ensure that the write has been written once somewhere, including possibly being hinted in a non-target node.

It does not say :

ANY Ensure that the write has been written once somewhere, including possibly being hinted in a non-target node, which probably wont get lost, unless .....

Maybe there is some other harder to contrive scenario out there RF3, write ONE, two hints and one node failure then a move also causes an issue with lost hints.

It is an edge case, but I think it is important. Since writes are idempotent I would rather a hint gets delivered causing an extra write then it gets lost.

1.0's made HH way more reliable, I would like to see Cassandra push that high standard and not have caveats associated around how ANY works.

Edward Capriolo
added a comment - 15/Feb/12 17:31 - edited It seems wrong to abandon hints. Arguments like such as "(unless you wrote at ANY, in which case you've already lived dangerously.)" are a slippery slope, and it says something about the contract of ANY.
According to Cassandra.thrift.
ANY Ensure that the write has been written once somewhere, including possibly being hinted in a non-target node.
It does not say :
ANY Ensure that the write has been written once somewhere, including possibly being hinted in a non-target node, which probably wont get lost, unless .....
Maybe there is some other harder to contrive scenario out there RF3, write ONE, two hints and one node failure then a move also causes an issue with lost hints.
It is an edge case, but I think it is important. Since writes are idempotent I would rather a hint gets delivered causing an extra write then it gets lost.
1.0's made HH way more reliable, I would like to see Cassandra push that high standard and not have caveats associated around how ANY works.

If a node is removed while it's up (decommission), then removing hints should be a no-op, since they should be handed off before the stream-to-new-owners. I'd be okay with a minor tweak to perform a check before decommission to refuse to continue if this is not the case.

But if a node is removed while dead (removeToken) then the hints for the old node are worthless, since we're streaming from a different copy, that by definition didn't need the hints for the dead one.

Jonathan Ellis
added a comment - 23/Feb/12 20:59 If a node is removed while it's up (decommission), then removing hints should be a no-op, since they should be handed off before the stream-to-new-owners. I'd be okay with a minor tweak to perform a check before decommission to refuse to continue if this is not the case.
But if a node is removed while dead (removeToken) then the hints for the old node are worthless, since we're streaming from a different copy, that by definition didn't need the hints for the dead one.