I am running into read timeout problems which I would not expect.
- 3 nodes on commodity hardware and on my home network, replication factor 2
- one simple keystore and table:
CREATE KEYSPACE IF NOT EXISTS uzzstore WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : '2' }; CREATE TABLE IF NOT EXISTS uzzstore.chunks(id blob primary key, size bigint);
- very simple query:
select * from chunks where id in (... up to 50 ids ...)
- the id is 32 bytes hash
- one single client using datastax java driver which synchronously runs the query above; think about it like you scan another data source and search the ids in cassandra; this is the big workload that runs for hours
- about 600M records in chunks
What I see is that sometimes I get a “Query timed out after PT2S”, which is a bit unexpected to me given such a simple use case (ie simple select on PK and no concurrency at all). I can not observe any node particularly loaded or caped in memory.
In debug log, I see messages like:
SELECT * FROM uzzstore.chunks WHERE id = 0xa9d6f7de939aa2ff41a88011717c41c1d369beb314b10ab62f3f09c1cf840864 LIMIT 4935 ALLOW FILTERING>, time 547 msec - slow timeout 500 msec/cross-node
Is this normal? is there anything I should optimize to prevent it? am I doing anything wrong?
Of course my point is not to increase timeout unless there are good reasons to do it, but to understand how to scale the system to support the use case.
PS: it looks like cassandra turns select ... where id in ...
into select ... where id=... allow filtering
– shall I use directly the latter in my code? does allow filtering
impact performance?