In Kusto, arg_max takes an expression and returns the selected columns from the row which maximizes it. What’s the best way to get it (or an equivalent expression) to maximize the value of one column then another, then return the matching row?
For instance, one way which ends up not using arg_max at all is:
datatable (SN: int, id: string, ts: datetime, IT: datetime, V: double, S: string) [
int(null), "t/1", datetime(2024-01-01), datetime(2024-01-01), 1.0, "asdf",
int(null), "t/1", datetime(2024-01-01), datetime(2024-01-02), 2.0, "fdas",
int(null), "t/1", datetime(2024-01-01), datetime(2024-01-03), 3.0, "qwer",
1, "t/1", datetime(2024-01-01), datetime(2024-01-02), 3.0, "zcvbx",
1, "t/1", datetime(2024-01-01), datetime(2024-01-02), 4.0, "qwer",
]
| partition hint.strategy=native by id (
partition hint.strategy=native by ts (
order by SN desc nulls last, IT desc
| take 1
)
)
which maximizes the value of SN then IT over the grouping by id and ts. Is there a way to do something similar with an expression in arg_max that would be faster? Or is the example above the fastest way to do things?
e.g. datatable […] | summarize arg_max( , *) by id, ts
If it helps, the actual data table has billions of rows, is narrowed down to 10M rows or so by several where clauses, and in the resulting data set id has thousands to hundreds of thousands of values, and ts has tens to thousands of values per id.
hash and hash_combine would be one way to merge multiple columns into one, but you can’t maximize over that. Concatenation is another, but that doesn’t work with integers and datetimes either….
Is there a way to arg_max over multiple non-string columns in Kusto?
Yes, it’s possible to apply arg_max
over multiple non-string columns. Below example shows how to finds the row with the latest timestamp ts
for each id
and keeps the points
and cgpa
columns in the output. Both points
and cgpa
are non-string columns.
datatable(id: string, ts: datetime, points: long, cgpa: double)
[
"a", datetime(2024-09-09 10:00:00), 10, 0.5,
"a", datetime(2024-09-09 11:00:00), 15, 1.5,
"b", datetime(2024-09-09 09:00:00), 5, 2.5,
"b", datetime(2024-09-09 08:00:00), 3, 0.3,
]
| summarize arg_max(ts, points, cgpa) by id
Output:
1