I am currently preparing to write a query related to statistics value.
Currently, there are two indexes in the DB’s vehicle_trip_statistics_history
table.
The vehicle_trip_statistics_history
table contains information fields related to daily driving distance (trip_distance
column) with device_id
and base_date
column.
The index of this table consists of idx_vehicle_trip_statistics_history_01
(device_id
, base_date
), idx_vehicle_trip_statistics_history_02
(base_date
, device_id
).
Currently, there are about 2500 device_id
s, and base_date
is accumulated from 20231101
(YYYYMMDD).
Accordingly, I want to find the summation of the driving distance for each_id for dates in a specific range.
select device_id, sum(trip_distance) as tp
from vehicle_trip_statistics_history
where base_date >= '20240101' and base_date <= '20240613'
group by device_id
limit 10;
The above query takes about 130 ms and the result of EXPLAIN
query is as follows
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | vehicle_trip_statistics_history | index | idx_vehicle_trip_statistics_history_01,idx_vehicle_trip_statistics_history_02 | idx_vehicle_trip_statistics_history_01 | 36 | 2826 | 50.0 | Using where |
However, the problem is that setting limits
based on order by
is time consuming.
select device_id, sum(trip_distance) as tp
from vehicle_trip_statistics_history
where base_date >= '20240101' and base_date <= '20240613'
group by device_id
order by tp desc
limit 10;
This query takes about 2073 ms and the result of EXPLAIN
query is as follows.
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | vehicle_trip_statistics_history | index | idx_vehicle_trip_statistics_history_01,idx_vehicle_trip_statistics_history_02 | idx_vehicle_trip_statistics_history_01 | 36 | 340,778 | 50.0 | Using where; Using temporary; Using filesort |
What steps should be taken to solve this extremely time-consuming problem? In general, in these cases, is it processed order by
and limit
step at application level rather than the query?