I have a couple of log lines as below:
<timestamp> category='a' name='1'
<timestamp> category='a' name='2'
<timestamp> category='a' name='1'
<timestamp> category='a' name='1'
<timestamp> category='b' name='1'
<timestamp> category='b' name='1'
I am trying to get distinct count by category; which means, the number of unique names seen by category. The above should yield a result like
category count
a 2
b 1
I have a query that works to do that on a smaller scale
count by (category) (
count by (category, name) (
count_over_time({<something>} | logfmt category, name [$__auto])
)
)
The above does give me a result, but looks like for very large data sets it error’s out because of and error ‘the maximum of series was reached for a single query’. The point is, that the categories are less than 100. But names can be upwards of a few million.
Is there a way to optimize this or achieve the same in a more efficient manner?