Cassandra nodetool rebuild process failed with Stream error
We added new DC into clusters. After added all nodes, while performing the nodetool rebuild process, we have exceptions like java.io.IOException: Checksum didn't match
. Anyone faced this exception and solution for this?. Thanks in advance.
We are running COSS v4.1.3 in Ubuntu 20.04.
Error while reading partition DecoratedKey(-1908928440232805475, 000800fd592c2fa372470000014d00000a362a362a382a332a362a00) from stream on ks='xxxxxxxxxxx' and table='xxxxxxxxxxx'.
ERROR [Stream-Deserializer-/10.x.x.x:7002-ab4567ee] 2024-08-11 15:46:29,110 StreamSession.java:696 - [Stream #a3506520-17ca-11ef-8dcb-c97277257478] Streaming error occurred on session with peer 10.x.x.x:7002
java.io.IOException: Checksum didn't match (expected: 1541040037, actual: -1225830714)
at org.apache.cassandra.db.streaming.CompressedInputStream.maybeValidateChecksum(CompressedInputStream.java:206)
at org.apache.cassandra.db.streaming.CompressedInputStream.loadNextChunk(CompressedInputStream.java:157)
at org.apache.cassandra.db.streaming.CompressedInputStream.reBuffer(CompressedInputStream.java:121)
at org.apache.cassandra.io.util.RebufferingInputStream.read(RebufferingInputStream.java:90)
at org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:68)
at org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:62)
at org.apache.cassandra.io.util.TrackedDataInputPlus.readFully(TrackedDataInputPlus.java:118)
at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:433)
at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:408)
at org.apache.cassandra.db.streaming.CassandraStreamReader$StreamDeserializer.newPartition(CassandraStreamReader.java:211)
at org.apache.cassandra.db.streaming.CassandraStreamReader.writePartition(CassandraStreamReader.java:184)
at org.apache.cassandra.db.streaming.CassandraCompressedStreamReader.read(CassandraCompressedStreamReader.java:96)
at org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:84)
at org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:50)
at org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:36)
at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:50)
at org.apache.cassandra.streaming.StreamDeserializingTask.run(StreamDeserializingTask.java:59)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:750)
DEBUG [Stream-Deserializer-/10.x.x.x:7002-ab4567ee] 2024-08-11 15:46:29,111 StreamSession.java:551 - [Stream #a3506520-17ca-11ef-8dcb-c97277257478] Changing session state from STREAMING to FAILED
DEBUG [Stream-Deserializer-/10.x.x.x:7002-ab4567ee] 2024-08-11 15:46:29,111 StreamSession.java:551 - [Stream #a3506520-17ca-11ef-8dcb-c97277257478] Changing session state from FAILED to FAILED
1
When streaming SSTables to a new node, they are split into chunks which are compressed to reduce the size of the payload.
The receiving node tried to validate the checksum of the incoming chunk but in this case it didn’t match the expected checksum leading to the exception:
java.io.IOException: Checksum didn't match (expected: 1541040037, actual: -1225830714)
My best guess is that the stream got corrupted in transit which can occur if the network was disrupted and a packet was dropped, for example.
I would suggest performing a quick diagnostic on the network to see if the node’s network interface is reporting errors. In practice, the network disruption could have been a transient, one-off event so it would be difficult to detect.
Additionally, check the logs on the sending node to see if reported any issues with the stream particularly when reading the SSTables as those could provide clues as to why the stream failed.
In any case, try to run nodetool rebuild
again and it should resume from where it left off. Cheers!