I have been trying to setup logging using the logging module in a Python script, and I have got it working properly. It can now log to both the console and a log file. But if fails when I setup a Dask cluster. The first log message is now replaced with invalid/null values in the log file, while the console output is fine.
A minimal working example is given below:
import logging
from dask.distributed import LocalCluster as Cluster
logfile_name='test.log'
logger=logging.getLogger(__name__)
logger.setLevel('DEBUG')
log_format=logging.Formatter(
'%(asctime)s %(levelname)s: %(message)s',
datefmt='%Y-%m-%d %H:%M:%S',
)
console_handler=logging.StreamHandler()
console_handler.setLevel('DEBUG')
console_handler.setFormatter(
log_format)
file_handler=logging.FileHandler(
logfile_name, mode='w')
file_handler.setLevel('DEBUG')
file_handler.setFormatter(log_format)
logger.addHandler(file_handler)
logger.addHandler(console_handler)
if __name__=='__main__':
logger.info('First log message')
cluster=Cluster(
name='test', n_workers=1,
threads_per_worker=1, memory_limit='1GiB',
dashboard_address=':8000',
)
logger.info('Second log message')
logger.info('Third log message')
logger.info('Fourth log message')
logger.info('Fifth log message')
cluster.close()
logging.shutdown()
The console log:
2024-09-22 15:28:30 INFO: First log message
2024-09-22 15:28:31 INFO: Second log message
2024-09-22 15:28:31 INFO: Third log message
2024-09-22 15:28:31 INFO: Fourth log message
2024-09-22 15:28:31 INFO: Fifth log message
And the log file contents (copy-pasted from Gedit):
000000000000000000000000000000000000000000002024-09-22 15:41:46 INFO: Second log message
2024-09-22 15:41:46 INFO: Third log message
2024-09-22 15:41:46 INFO: Fourth log message
2024-09-22 15:41:46 INFO: Fifth log message
Notice the invalid/null values in the first line, and the first log message is missing.
After some digging around, I found out that if I remove the Dask cluster code, everything is fine. What is strange is that if I change the mode to append in FileHandler, the issue goes away without removing the Dask cluster initialisation.
How do I fix this issue without removing the Dask code or changing the file open mode?