We are running strimzi with kafka in a openshift cluster.
We have multiple topics, all with the same retention.ms
settings: 259200000 which is 72 hours.
We observed that kafka disk space have been decreasing over time.
To check where the space is used, we exec’ed into one of the kafka pods, and ran:
du -sh /var/lib/kafka/data/kafka-log0/* | sort -hr
This produced the following output
bash-4.4$ du -sh /var/lib/kafka/data/kafka-log0/* | sort -hr
149G /var/lib/kafka/data/kafka-log0/mytopic1-0
29G /var/lib/kafka/data/kafka-log0/mytopic2-0
3.6G /var/lib/kafka/data/kafka-log0/mytopic3-0
681M /var/lib/kafka/data/kafka-log0/mytopic4-security-0
Checking the /var/lib/kafka/data/kafka-log0/mytopic1-0
revealed that there is data as far back as 28th of march, which is over a month old data.
This with a retention set to 72 hours.
total 155550304
-rw-rw-r--. 1 1000760000 1000760000 6328 Mar 28 10:17 00000000001211051718.index
-rw-rw-r--. 1 1000760000 1000760000 12339043 Mar 28 10:17 00000000001211051718.log
-rw-rw-r--. 1 1000760000 1000760000 1164 Mar 28 10:17 00000000001211051718.timeindex
For the other topics that has the same retention settings, the data in i the folders is in line with the retention settings.
Having checked the oldest file earlier this morning, no change has been made in the old files. So it seems to removal of old files is not taking place for this specific topic.
Kubernetes: OpenShift
Kafka image: kafka:0.35.1-kafka-3.4.0
Kafka version: 3.4.0
Strimzi version: 0.35.1
KafkaTopic – the one that has the issue
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
annotations:
labels:
app.kubernetes.io/instance: kafka
strimzi.io/cluster: kafka
name: mytopic1
namespace: kafka-ns
resourceVersion: "162114882"
uid: 266caad6-c73d-4a23-9913-6c9a64f505ca
spec:
config:
retention.ms: 259200000
segment.bytes: 1073741824
partitions: 1
replicas: 3
status:
conditions:
- lastTransitionTime: "2023-10-27T09:03:58.698836682Z"
status: "True"
type: Ready
observedGeneration: 2
topicName: mytopic1
Another kafka topic without issues
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
labels:
app.kubernetes.io/instance: kafka
strimzi.io/cluster: kafka
name: mytopic2
namespace: kafka-ns
resourceVersion: "162114874"
uid: 871b5fab-7996-4674-a0b4-d3476cbe9c6c
spec:
config:
retention.ms: 259200000
segment.bytes: 1073741824
partitions: 1
replicas: 3
status:
conditions:
- lastTransitionTime: "2023-10-27T09:03:58.539808920Z"
status: "True"
type: Ready
observedGeneration: 2
topicName: mytopic2
Any ideas on where to check to try to get to the root cause?