After Choosing EC2Snitch you can't migrate off w/o a full cluster restart

Details

Description

Once you choose the Ec2Snitch the gossip messages will trigger this exception if you try to move (for example) to the property file snitch:

ERROR [pool-2-thread-11] 2011-08-30 16:38:06,935 Cassandra.java (line 3041) Internal error processing get_slice
java.lang.NullPointerException
at org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:84)
at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122)
at org.apache.cassandra.service.DatacenterReadCallback.assureSufficientLiveNodes(DatacenterReadCallback.java:77)
at org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:516)
at org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:480)
at org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:109)
at org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:263)
at org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:345)
at org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:306)
at org.apache.cassandra.thrift.Cassandra$Processor$get_slice.process(Cassandra.java:3033)
at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Issue Links

is related to

CASSANDRA-3186nodetool should not NPE when rack/dc info is not yet available

Resolved

relates to

CASSANDRA-3186nodetool should not NPE when rack/dc info is not yet available

Brandon Williams
added a comment - 16/Sep/11 03:04 Defaulting the AbstractEndpointSnitch's gossiperStarting by populating the ApplicationState.DC,ApplicationState.RACK wll help then any snitch relying the gossip info to getDC and getRack.
Yes, but setting DC to 'foo' and rack to 'bar' just creates a new DC and rack and breaks the replication policy and consistency guarantees.

the given endpoint is not the local address; its the address from "other" nodes. For those "other" nodes, if they are not using the Ec2Snitch, which would have populated the "ApplicationState.DC" and "ApplicationState.RACK" with the values, getApplicationState(ApplicationState.DC) (and getApplicationState(ApplicationState.RACK) for that matter) is going to be return null. Hence you got a NPE from that line on .value.

Defaulting the AbstractEndpointSnitch's gossiperStarting by populating the ApplicationState.DC,ApplicationState.RACK wll help then any snitch relying the gossip info to getDC and getRack.

Jackson Chung
added a comment - 16/Sep/11 00:32 - edited "I don't see how making your dc/rack names your external IP address is going to solve anything."
well the NPE was on
return Gossiper.instance.getEndpointStateForEndpoint(endpoint).getApplicationState(ApplicationState.DC).value;
the given endpoint is not the local address; its the address from "other" nodes. For those "other" nodes, if they are not using the Ec2Snitch, which would have populated the "ApplicationState.DC" and "ApplicationState.RACK" with the values, getApplicationState(ApplicationState.DC) (and getApplicationState(ApplicationState.RACK) for that matter) is going to be return null. Hence you got a NPE from that line on .value.
Defaulting the AbstractEndpointSnitch's gossiperStarting by populating the ApplicationState.DC,ApplicationState.RACK wll help then any snitch relying the gossip info to getDC and getRack.

I'm not sure there's a good solution here. We could make PFEPS inject the local nodes dc/rack info into gossip similar to what I suggested in CASSANDRA-1974, but you'd still have to name things with the ec2snitch conventions for things to not break, and it would be very PFEPS-specific; other snitches are out of the question.

Ultimately I'm inclined to say you need to choose your snitch like you choose your partitioner: very carefully.

Brandon Williams
added a comment - 31/Aug/11 22:16 I'm not sure there's a good solution here. We could make PFEPS inject the local nodes dc/rack info into gossip similar to what I suggested in CASSANDRA-1974 , but you'd still have to name things with the ec2snitch conventions for things to not break, and it would be very PFEPS-specific; other snitches are out of the question.
Ultimately I'm inclined to say you need to choose your snitch like you choose your partitioner: very carefully.