I’m setting up a Patroni cluster and facing an issue where both nodes remain in the ‘stopped’ state, and the logs indicate that the system is waiting for a leader to bootstrap. Here is the output when I run patronictl -c /etc/postgres0.yml list
:
Cluster: postgres (7366xxxxxxxxxxxxxxx) ---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+---------+---------+----+-----------+
| postgresql0 | xxx.xxx.xx.57 | Replica | stopped | | unknown |
| postgresql1 | xxx.xxx.xx.129 | Replica | stopped | | unknown |
+-------------+----------------+---------+---------+----+-----------+
Log Entries:
[postgres@patroni01 ~]$ patroni /etc/postgres0.yml
2024-05-07 13:24:24,345 INFO: No PostgreSQL configuration items changed, nothing to reload.
2024-05-07 13:24:24,401 INFO: Lock owner: None; I am postgresql0
2024-05-07 13:24:24,404 INFO: waiting for leader to bootstrap
(repeated logs)
2024-05-07 13:25:04,405 INFO: waiting for leader to bootstrap
Configuration Details:
- Patroni version: 3.3.0
- PostgreSQL version: 15
- Configuration file:
scope: postgres
namespace: /service/
name: postgresql0
restapi:
listen: xxx.xxx.xx.57:8008
connect_address: xxx.xxx.xx.57:8008
etcd:
host: xxx.xxx.xx.155:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
use_slots: true
pg_hba:
- host replication replicator 127.0.0.1/32 trust
- host replication replicator xxx.xxx.xx.57/0 trust
- host replication replicator xxx.xxx.xx.129/0 trust
- host all all 0.0.0.0/0 trust
parameters:
initdb:
- encoding: UTF8
- data-checksums
postgresql:
listen: xxx.xxx.xx.57:5432
connect_address: xxx.xxx.xx.57:5432
data_dir: /data/patroni
pgpass: /tmp/pgpass0
authentication:
replication:
username: replicator
password: replicator
superuser:
username: postgres
password: postgres
parameters:
unix_socket_directories: '..'
tags:
noloadbalance: false
clonefrom: false
nosync: false
nostream: false
Attempts to resolve the issue:
- Checked the network connectivity between the nodes.
- Ensured that all nodes are running with the correct permissions.
- Reviewed configuration files for any obvious mistakes.
Additional Questions:
- What might be causing the cluster to not elect a leader, resulting in all nodes showing as ‘stopped’?
- Are there specific settings in the configuration that should be checked or modified to resolve this?
- How can I troubleshoot the bootstrap process for errors that prevent leader election?
Any insights or suggestions to troubleshoot or resolve this issue would be greatly appreciated.