From the OpenTelemetry Docs / Specs / OTel 1.37.0 / Metrics / Supplementary Guidelines:
Choosing the correct instrument is important, because: … It generates clarity to the semantic of the metrics stream, so the consumers have better understanding of the results. For example, if we want to report the process heap size, by using an Asynchronous UpDownCounter rather than an Asynchronous Gauge, we’ve made it explicit that the consumer can add up the numbers across all processes to get the “total heap size”.
I understand the semantic difference: an Asynchronous Gauge instrument reports non-additive values, whereas an Asychronous UpDownCounter instrument reports additive values.
Correspondingly, by default, Gauge and UpDownCounter instrument metrics are exported to OTLP differently, “preserving” some semantic difference: Gauge as a gauge
, UpDownCounter as a sum
(with is_monotonic: false
).
From the OpenTelemetry Docs / Specs / OTel 1.37.0 / Metrics / Data Model / Metrics Data Model:
OpenTelemetry has identified three kinds of semantics-preserving Metric data transformation … The OpenTelemetry Metrics data model is designed to support these transformations … as a reprocessing stage inside the OpenTelemetry collector … [reaggregation details]
…
in OpenTelemetry Sums always have an aggregate function where you can combine via addition. So, for non-monotonic sums in OpenTelemetry we can aggregate (naturally) via addition. In the timeseries model, you cannot assume that any particular Gauge is a sum, so the default aggregation would not be addition.
…
Gauges do not provide an aggregation semantic, instead “last sample value” is used when performing operations like temporal alignment or adjusting resolution.
Based on all of that, I was expecting to find something in the OpenTelemetry Collector docs that would explicitly prevent reaggregation of OTLP gauge
metrics by a sum function.
In fact, I see the opposite: statements in the OpenTelemetry Collector transform processor docs that appear to explicitly support reaggregation of gauge
metrics by a sum function:
The following metric types can be aggregated:
…
- gauge
Supported aggregation functions are:
…
- sum
What answer can I offer to someone who asks me, “Why bother exporting values from Asynchronous UpDownCounter and Asynchronous Gauge instruments differently in OTLP? Why not just export them all in the OTLP as gauges?”
I’m stumped for an answer. “Because it generates clarity to the semantic of the metrics stream” sounds like weak sauce to me, since, for example, the Collector transform processor seems not to care whether it’s aggregrating gauges or sums.
I’m wondering: do some telemetry backends that ingest OTLP treat gauges and sums differently?
I see that New Relic maps both OTLP gauges and OTLP non-monotonic cumulative sums to the New Relic gauge. So, no difference in treatment there. That matches the collapsing/mapping to the timeseries model described in the OpenTelemetry docs. Do any backends not do the same? (Collapse/map both OTLP gauges and OTLP non-monotonic cumulative sums to the same type in the backend’s data model, whether or not that type is actually called “gauge”.)
2