I currently use Opensearch to ingest container logs directly from Docker/Nomad/Kubernetes containers/jobs/pods using the Fluentd logging driver. Each log entry is automatically tagged with the name of the container that produced it, in addition to other container metadata.
In some cases, when a container hits an error/exception and starts its retry logic for the failing action, it generates dramatically more log entries than when it is in steady state. Is there an approach within Opensearch, using anomaly detection for example, to alert when this situation occurs? Ideally this would alert on increased log entries per container, or better yet, an increase in log severity levels per container.
I could naively write a detector that relies on an enumerated list of jobs and hardcoded thresholds. However, I’m looking for an elegant end-to-end solution that works for any arbitrary container that sends logs to Opensearch.
Thanks in advance.