We have a lot of postgresql clusters set up with patroni.
Most of the time and most of all clusters are behaving as expected.
But at one of the standby nodes WAL files are not removed and so pg_wal keeps growing until the underlying filesystem gets exhausted.
The setup is basically the same for this instance as for one of the other ones which are working as expected. (patroni 4.0.1 vs. patroni 4.0.2. / postgresql 14.13 on Oracle Linux 9.4)
I already found various documents and webpages describing the possible reasons for such behaviour.
https://www.cybertec-postgresql.com/en/why-does-my-pg_wal-keep-growing/
How to remove old WAL file in postgresql?
But none of the reasons there seems to match.
There’s an inactive replication slot in the standby instance, but that’s also the case for the working cluster.
We tried bouncing the instance but pg_wal was not cleaned up afterwards.
In the past the only solution was to reinit the cluster member – but that should be just an emergency workaround.
Maybe someone has experienced a similar problem and can help.
3