I’m working with GridDB to manage a distributed cluster, and recently I’ve been encountering the following error during data synchronization across nodes in the cluster:
20021 SYNC_LOG_NOT_FOUND ERROR
Data synchronization of cluster failed. The log file or data file required for cluster synchronization may have been deleted by a checkpoint execution.
No countermeasure is required as the cluster will automatically detect this and continue retrying, but in this case, since the time to create the replica node will become longer, if this error event is output frequently, either set the checkpoint time longer or increase the count in /dataStore/retainedFileCount of the configuration.
And this is a simplified version of the code I’m using to handle node synchronization
from griddb_python import StoreFactory, GSException
def configure_cluster():
try:
factory = StoreFactory.get_default()
gridstore = factory.get_store({
"host": "239.0.0.1",
"port": 41999,
"cluster_name": "defaultCluster",
"username": "admin",
"password": "admin"
})
# Retrieve container for synchronization
container = gridstore.get_container("containerName")
if container is None:
raise Exception("Container not found")
# Synchronize data across the cluster
data = container.get(1)
print(f"Data retrieved: {data}")
# Perform an update to test sync
container.put(2, {"id": 2, "value": "newValue"})
print("Data updated across cluster")
except GSException as e:
print(f"Error during cluster sync operation: {e.what()}")
raise
if __name__ == "__main__":
configure_cluster()
In this code:
I connect to the GridDB cluster and attempt to retrieve and update a container.
The goal is to synchronize data across nodes in the cluster.
However, during synchronization, I often receive the SYNC_LOG_NOT_FOUND ERROR 20021. According to the error message, this could be caused by a checkpoint execution deleting log files necessary for cluster synchronization. Although the cluster retries automatically, the synchronization process takes significantly longer.
Here are some additional details about my setup:
Operating System: Ubuntu 20.04
Python Version: 3.8
GridDB Version: 4.5
Cluster Size: 5 nodes
Checkpoint Interval: Default setting
Data Retention Settings: Not explicitly configured
I’m looking for advice on:
How to optimize the checkpoint configuration to prevent frequent SYNC_LOG_NOT_FOUND ERROR 20021 events.
Whether adjusting the /dataStore/retainedFileCount in the configuration can help reduce the occurrence of this issue.
Best practices for managing synchronization and replication in a GridDB cluster to avoid such delays.
Has anyone faced a similar issue with GridDB, and how did you resolve it?
Samar Mohamed is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.