I am trying to modify the ray logger so that it logs in the format specified by Datadog, but I am running into problems. I have the configurations from the prior link and the configurations for Python log collection and am still not getting logs in the correct format. I am connecting to a kuberay cluster running in the same k8s cluster as the driver, but I want to be able to log in the format that will be easily parsed in dd. Right now it is logging to dd in Ray’s default format, which dd is parsing as an error. Here is an example log:
[36m(Writer pid=6655, ip=10.0.14.111)[0m Queue empty, sleeping 5
Anyway I can modify either the python code configuration, deployment of the driver service configuration, or the ray cluster nodes configurations to make this work? From what I can tell worker_process_setup_hook is not a viable option as I am using ray client to connect to the ray cluster.
Here is a redacted snippet of the code I am running. There are multiple actors and they are more or less setup for logging similarly to the one below.
import ddtrace.auto
import logging
import ray
from ddtrace import patch
from pythonjsonlogger import jsonlogger
logger = logging.getLogger("ray")
logger.handlers.clear()
FORMAT = ('%(asctime)s %(levelname)s [%(name)s] [%(filename)s:%(lineno)d] '
'[dd.service=%(dd.service)s dd.env=%(dd.env)s dd.version=%(dd.version)s dd.trace_id=%(dd.trace_id)s dd.span_id=%(dd.span_id)s] '
'- %(message)s')
logging.basicConfig(format=FORMAT, force=True)
logHandler = logging.StreamHandler()
formatter = jsonlogger.JsonFormatter(FORMAT)
logHandler.setFormatter(formatter)
logger.addHandler(logHandler)
patch(logging=True, botocore=True)
ray.init(address="auto")
@ray.remote(num_cpus=6)
class Writer:
def __init__(self, queue: Queue):
self.queue = queue
def write(self):
while True:
try:
logger.info("Writing", extra={"role": self})
logger.info("Wrote", extra={"role": self})
except Empty:
logger.info("Queue empty, sleeping 5", extra={"role": self})
time.sleep(5)
except Exception as e:
logger.error("Error writing data! {}".format(e), extra={"role": self})
I also have the following labels on the deployment for the pod which is the driver
labels: {
"tags.datadoghq.com/env": "env",
"tags.datadoghq.com/service": "service",
"tags.datadoghq.com/version": "1.0",
"admission.datadoghq.com/enabled": "true"
},