I am reaching out regarding a critical bug in my application currently running in production. The setup consists of three servers, where one server (the master node) is connected to the database, and the other two servers (slave nodes) join the cluster to replicate data from the master node.
Recently, our system experienced downtime, during which the master node was down for a few minutes. After the master node came back online, instead of loading the latest data from the database, it joined the cluster and inadvertently copied stale data from the other nodes.
Here are the approaches I’ve tried so far to resolve this issue:
1 – Set the cache rebalance mode to NONE on the master node and ASYNC on the slave nodes.
2 – Attempted dynamic master assignment using:
ignite.services().deployClusterSingleton(“DatabaseSyncService”, new DatabaseSyncService(vertx, routerService, emiService));
3 – Tried using backups, but this did not help since the service and database connectivity were lost when the master node went down.
Kartikey Srivastava is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.