Monday, June 26, 2017



DSE Cassandra Node Failed To Start Post OS Upgrade 


One of our Cassandra production cluster node refused to start after OS Patching was done with following errorr.

ERROR [main] 2017-06-25 18:09:40,906  CassandraDaemon.java:709 - Exception encountered during startup
org.apache.cassandra.io.FSReadError: java.io.EOFException
        at org.apache.cassandra.hints.HintsDescriptor.readFromFile(HintsDescriptor.java:142) ~[cassandra-all-3.0.8.1293.jar:3.0.8.1293]
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[na:1.8.0_66]
        at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) ~[na:1.8.0_66]
        at java.util.Iterator.forEachRemaining(Iterator.java:116) ~[na:1.8.0_66]
        at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) ~[na:1.8.0_66]
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[na:1.8.0_66]
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[na:1.8.0_66]
        at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[na:1.8.0_66]
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:1.8.0_66]
        at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) ~[na:1.8.0_66]
        at org.apache.cassandra.hints.HintsCatalog.load(HintsCatalog.java:65) ~[cassandra-all-3.0.8.1293.jar:3.0.8.1293]
        at org.apache.cassandra.hints.HintsService.<init>(HintsService.java:88) ~[cassandra-all-3.0.8.1293.jar:3.0.8.1293]
        at org.apache.cassandra.hints.HintsService.<clinit>(HintsService.java:63) ~[cassandra-all-3.0.8.1293.jar:3.0.8.1293]
        at org.apache.cassandra.service.StorageProxy.<clinit>(StorageProxy.jav


Upon looking around, we found that there is a reported major bug with this error under this JIRA as following.

CassandraCASSANDRA-12728 Handling partially written hint files

Cause – 
Corruption to the hints tables causing Cassandra to go in failure loop. This could have happen due to following. 

1. Node was rebooted before service was shutdown properly. 
2. service went down abruptly while writing Hints table. 
3. Node rebooted due to power failure. 

Since the cause of the issue was corrupted Hints table, we need to cleanup the hints for the node and then try to restart. 

After that node started fine. Also, since the node was down, it is imperative to run the repair on the node to make sure the data is consistent. 

Hope that helps. 

1 comment:

  1. How did you cleaned the hints for a particular node which is down

    ReplyDelete