I want to use structlog
in a multiprocess application in python. The processes shall be able to log as well. Hence I need to ensure that writes to any I/O are process-safe.
Python standard logging should only be thread-safe and with this question I am exploring whether structlog
is as well?
So far I have decided to configure structlog
in combination with the standard Python logging as per the docs.
My configuration is performed by this function
import logging.config
import sys
from typing import IO
import structlog
from logging import DEBUG
def configure_structured_logging(
console_stream: IO[str] = sys.stdout
):
"""
Configures the standard logging to log the given IO stream and log file path.
If called without parameters this will only configure output to sys.stdout.
@param console_stream: The stream to output the live console logs to.
@param log_file_path: Path to the log file to write. Defaults to "log.jsonl".
"""
min_log_level = DEBUG
# TODO: Patch the timezone into the timestamper
timestamper_no_tz = structlog.processors.TimeStamper(
fmt="%Y-%m-%dT%H:%M:%S.%f", utc=False
)
structlog.configure(
processors=[
structlog.contextvars.merge_contextvars,
structlog.processors.add_log_level,
structlog.processors.StackInfoRenderer(),
structlog.dev.set_exc_info,
timestamper_no_tz,
structlog.stdlib.ProcessorFormatter.wrap_for_formatter,
],
context_class=dict,
logger_factory=structlog.stdlib.LoggerFactory(),
wrapper_class=structlog.stdlib.BoundLogger,
cache_logger_on_first_use=False,
)
pre_formatter_chain = [
structlog.stdlib.add_log_level,
structlog.stdlib.PositionalArgumentsFormatter(),
]
# std logging configuration dict
formatters = {
"console": {
"()": structlog.stdlib.ProcessorFormatter,
"foreign_pre_chain": pre_formatter_chain,
"processors": [
timestamper_no_tz,
structlog.stdlib.ProcessorFormatter.remove_processors_meta,
structlog.dev.ConsoleRenderer(
colors=True,
force_colors=True,
exception_formatter=structlog.dev.plain_traceback
),
],
},
}
handlers = {
"console_handler": {
"()": logging.StreamHandler,
"level": min_log_level,
"formatter": "console",
"stream": console_stream,
},
}
logging_config = {
"version": 1,
"disable_existing_loggers": False,
"formatters": formatters,
"handlers": handlers,
"loggers": {
"root": {"level": min_log_level, "handlers": list(handlers.keys())}
},
}
logging.config.dictConfig(logging_config)
In a nutshell, this configures the Python standard logging to use structlog formatters and output to sys.stdout
.
Now to the juicy part: Is this setup process-safe?
As a note: I am on Python 3.9.18
. To my knowledge the Python standard logging is thread-safe but not process-safe (see logging module source. The only mention of multiprocessing
in the structlog
docs is here and is regarding the pickleability of a logger. As far as I have looked through the structlog
source code I could not find anything it would consider thread- or process-safety. In my setup this also would not be its job, but the job of the standard logging handlers.
To verify this, I created a minimal example.
It consists of a simple worker which is provided a structlog
logger.
Through this logger it logs 1000 info messages.
Trough a multiprocessing.Pool
I spawn 1000 of those workers.
Without process-safety this should IMO result in malformed console output.
It does however work without a problem. Every row of the log is perfectly fine.
I tried the same while logging to a file and that also worked flawlessly.
The minimal example using the above configuration function:
import sys
import os
from multiprocessing import Pool
from structlog import get_logger
from structlog_config import configure_structured_logging
configure_structured_logging(console_stream=sys.stdout)
logger = get_logger(module_level="playground")
# A worker provided with a logger
def worker(count, local_logger):
pid = os.getpid()
for i in range(count):
local_logger.info("This is logging %s for TEST", str(i), pid=pid)
if __name__ == '__main__':
log_count = 1000
proc_count = 1000
pool = Pool(proc_count)
inputs = [(log_count, logger)]*proc_count
pool.starmap(worker, inputs)
I feel like I am missing something here. I need to make sure that this does not break and while this test is nice and all, I would like to understand how there are no race conditions. As a little side note, I also experimented with cache_logger_on_first_use
being set to True
, which according to the (aforementioned) docs would make the loggers unpickleable. However my example still works fine with this option set. Is this information deprecated?
P.S.: I know I can configure this setup to use QueueHandler
and definitely be process-safe, however I have decided that if I have to pass around a Queue
anyway, I might as well leverage the benefits of structlog
and pass around the loggers themselves.