Go Version: 1.20
GoCQL PoolConfig: Round Robin Host Policy
Cassandra Version: 4.1.3 – OpenJDK 11
We’re in the process of upgrading from Cassandra 3.11.16 to Cassandra 4.1.3. During performance testing we observed that the V4.1.3 cluster was not able to perform as well as V3.11.16. Our DB clusters (both V3.11.16 and 4.1.3) have 2 Seed nodes and 4 worker nodes. Whenever our application instances execute queries against the V4.1.3 cluster under heave load (more than 5000 TPS) we noticed that only one or two Cassandra nodes would receive the requests and become very slow to respond, and eventually all the queries from my application would time out. Our dashboard indicated the following during testing:
- Almost 99% of all inbound and outbound network traffic were seen only on the EC2 instances running the affected nodes (1 DB process per EC2 instance)
- Very high CPU utilization seen on the affected nodes
- Increased Garbage Collection activities on the affected nodes
None of these issues described above happened when our applications (same code) were connected to the V3.11.16 cluster. It looks like the GoCQL driver distributed requests to all V3.11.16 Cassandra nodes equally, but not when connected to V4.1.3 cluster. Both DB clusters have the exact same schema and data. Has anyone experienced similar issues with applications using GoCQL driver connecting to Cassandra 4.1?
user26663330 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.