We have one service that is logging so much at a time, that the log file gets rotated often enough that fluent bit is unable to get all the logs consistently. At least that is what we assume is happening. The rotation is happening every few minutes during activity spikes.
Anyway, my understanding is that the kublet on the node manages log rotation. And containerLogMaxSize is configured by default to 10Mib. So increasing that to like 50 should help I would think. And I can find docs on how to do that manually by logging into a node and all that. What I can’t find is how to configure that so that new nodes get the change too. Ideally we would like to do this via terraform.
If we can’t modify it through a setting on the terraform resources, it would seem like we could put some code in the launch template for the node to modify it. But I am not sure we can even touch that on an EKS cluster.
Maybe this just isn’t something EKS let’s you do?