I have couple of services deployed on Kubernetes. Some of NodeJS based, others are Java based. In the cluster there’s OTEL Collector deployed which then provides data for Prometheus. Grafana is used for dashboarding. For Java I’m using -javaagent:/jars/opentelemetry-javaagent.jar
and for NodeJS simple tracing file such as:
const sdk = new NodeSDK({
// Service name is configured by OTEL_SERVICE_NAME
traceExporter: new OTLPTraceExporter(),
metricReader: new PeriodicExportingMetricReader({
exporter: new OTLPMetricExporter(),
exportIntervalMillis: 5000,
}),
instrumentations: [getNodeAutoInstrumentations()], // will contain https://www.npmjs.com/package/@opentelemetry/instrumentation-http
});
Rest of OTEL config is defined in ENVs (traces configuration is omitted for readability):
OTEL_EXPORTER_OTLP_PROTOCOL=grpc
OTEL_METRICS_EXPORTER=otlp
OTEL_SERVICE_NAME=[service name]
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector-listens-here:4317
Apps are deployed on Kubernetes, with 2 or more pods each. And I think this is the problem why I’m getting strange results for http_server_duration_milliseconds_count
metric. See examples:
- Service A with 5 pods running:
-
Service B with 2 pods running:
-
Service C with 3 pods running:
Available labels for those metrics are:
http_flavor
http_method
http_route
http_scheme
http_status_code
job
net_host_name
net_host_port
net_protocol_name
net_protocol_version
Is my assumption correct, that there’s no way to differentiate pods and those metrics are treated as coming from one source? I’m thinking like ServiceA#pod1 exports value 1, then ServiceA#pod2 (which got more requests) exports 12 and after that ServiceA@pod1 exports 3 (as it got 2 new requests) and so on?
If so, what’s the best solution to solve this?
- I could probably use [
net_host_ip
* (https://opentelemetry.io/docs/specs/semconv/attributes-registry/host/) which I would expect to be set to pod IP, but this attribute isn’t set automatically in Java and NodeJS based instrumentation. - Or maybe I should add label like
k8s_pod_name
or something to
differentiate the pods? - Also
service.instance.id
seems like “native” solution to my problem, but it’s experimental state
Any suggestions or clarifications will be much appreciated 🙂