I have a data stream with time series data in elasticsearch, now I want to aggregate my data for a specific time range, get the metrics that summed up is between x and y and order it to get the top z values.
I ingest documents as:
{
"@timestamp": "2024-01-01T00:00:00.000Z",
"unit_id": "xyz",
"metric": 10
}
However, when I combine my aggregation with a bucket selector and a bucket sort, the query only works of my bucket sort is added after the selector, but my Java client don’t let me dictate this. I am not really sure if this is the correct approach to query this data. Below is my current ES query that only works if sort is after selector, and there is a corresponding SQL statement that does the trick in Postgres already to day.
My query looks like this:
{
"aggs": {
"group_by_id": {
"terms": {
"field": "unit_id",
"size": 100000000
},
"aggs": {
"agg_metric": {
"sum": {
"field": "metric"
}
},
"agg_selector": {
"bucket_selector": {
"bucket_path": { "tmppath": "agg_metric" },
"script": {
"params": {
"myMin": 2,
"myMax": 4
}
},
"lang": "painless",
"source": "params.tmppath => params.myMin && params.tmppath <= params.myMax"
}
},
"agg_sort":
"bucket_sort": {
"size": 2,
"sort": [ { "agg_metric": { "order": "asc" } } ]
}
}
}
}
}
}
The corresponding SQL (that works) would look like:
SELECT id, sum(metric)
FROM public.test
GROUP BY id
HAVING sum(metric) BETWEEN 2 AND 4
ORDER BY 2 ASC
LIMIT 2