We store the device log content in elasticsearch, which is about 150GB of data a month.
The report we need is to summarize various data according to the input time period, and most of the query performance is consumed on aggregating, grouping, and sorting to obtain the latest data
We tried to summarize the data by day and dump it to a new index, but this found that it did not solve the problem of duplicate data
For example:
0901 has 3 Type A devices (a, b, c); 0902 There are 3 Type A devices (b, c, d)
Store data as reportsA: {“date”: “2024-09-01”, “type”: “A”, “count”: 3}
I queried the number of Type A devices in 0901-0902, it should be 4, but without storage details, we cannot judge whether there are duplicate devices in the middle. If the details are stored, does this index lose its meaning
I hope to hear everyone’s suggestions
yin liu is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.