I am running a simple join query, where the table incident_tracker_details is covered by a suitable index. The query is slow because it doesn`t use all the columns in Index Condition, but does use the column as a Filter leading to a large heap searches
Query:
SELECT
relevant_incidents_per_inventory_item.inventory_id,
tracker_id
FROM
relevant_incidents_per_inventory_item
JOIN
incident_tracker_details
ON
incident_tracker_details.incident_id = relevant_incidents_per_inventory_item.incident_id
WHERE
incident_tracker_details.company_id = :companyId
and incident_tracker_details.division_id = :divisionId
AND incident_tracker_details.value > 0
the index in use:
create index companyid_divisionId_incidentId_idx
on incident_tracker_details (company_id, division_id, incident_id)
where (value > 0) include (tracker_id);
the relevant_incidents_per_inventory_item
does not have any relevant indexes and the table is very small (few hundreds of rows).
the actual query plan:
QUERY PLAN
Hash Join (cost=6901.58..6913.72 rows=1 width=8) (actual time=90.663..315.278 rows=522 loops=1)
Hash Cond: (incident_tracker_details.incident_id = relevant_incidents_per_inventory_item.incident_id)
-> Index Only Scan using companyid_divisionId_incidentId_idx on incident_tracker_details (cost=0.57..12.68 rows=6 width=16) (actual time=0.039..202.787 rows=246774 loops=1)
Index Cond: ((company_id = '777704491747470679'::bigint) AND (division_id = '777795770460846005'::bigint))
Heap Fetches: 124039
-> Hash (cost=6900.88..6900.88 rows=10 width=8) (actual time=86.704..86.709 rows=259 loops=1)
...not relevant part
Planning Time: 0.518 ms
Execution Time: 315.402 ms
Why is the “incident_id” not part of the Index Cond? this leads to retrieving all the rows under companyId, divisionId , and then the engine has to filter all the rows for those having incident_tracker_details.incident_id = relevant_incidents_per_inventory_item.incident_id
I expected to see all the columns used in the Index Condition, leading to fast retrieval of only the relevant data
11
The reason incident_id
isn’t part of Index Cond:
is because it isn’t part of the selection criteria associated with incident_tracker_details
. The only indexed columns mentioned in the WHERE
clause are company_id
and division_id
. Although incident_id
is used in the join criteria, it is not part of the selection criteria and thus has no part in the conditions used for the index only scan. The query engine has to get all incident_id
values associated with the selection criteria to identify the rows in relevant_incidents_per_inventory_item
that satisfy the join condition.
If relevant_incidents_per_inventory_item
has an index with incident_id
as its first column, then the join criteria can be checked with an index scan. If there is no such index, then the join condition will be evaluated by using a full table scan and filtering for rows that have incident_id
values matching those found by the index only scan of companyid_divisionId_incidentId_idx
. Since you elided the portion of the explain plan associated with relevant_incidents_per_inventory_item
and didn’t include all relevant DDL in the post, there isn’t any way for us to determine which of these was the case.
2
I tried to find the answer to the incomplete index usage behaviour, and in order to simplify the case I removed the join condition and built a simple query targeting the companyId, divisionId, incidentId
index. What I discovered is that when the query has a single incident_id
value, the index is used fully and all columns are in Index Cond
.
On the other hand, when I have multiple (more than 1) incident_id
s either in OR or in IN form, the index is used partially what leads to the slow Heap Fetches as seen before.
Simplified query:
SELECT
tracker_id
FROM
bi.incident_tracker_details
WHERE
company_id = :companyId
AND division_id = :divisionId
(incident_id = 234234234234234234 OR incident_id = 333333333333333333)
AND value > 0;
Query plan for 2 incident_id
values using OR. Notice the Heap Fetches, and index on Index Cond
only company_id
and division_id
columns, the incident_id
is in Filter
.
QUERY PLAN
Index Only Scan using companyid_divisionId_incident_id_idx on incident_tracker_details (cost=0.57..8.72 rows=1 width=8) (actual time=103.485..103.485 rows=0 loops=1)
Index Cond: ((company_id = '4478704491747470679'::bigint) AND (division_id = '4514095770460846005'::bigint))
Filter: ((incident_id = '234234234234234234'::bigint) OR (incident_id = '333333333333333333'::bigint))
Rows Removed by Filter: 249705
Heap Fetches: 20397
Planning Time: 0.155 ms
Execution Time: 103.512 ms
On the other hand, same query with a single incident_id
value. The Heap Fetches is 0 and the execution time is almost 0:
QUERY PLAN
Index Only Scan using companyid_divisionId_incident_id_idx on incident_tracker_details (cost=0.57..8.59 rows=1 width=8) (actual time=0.019..0.019 rows=0 loops=1)
Index Cond: ((company_id = '4478704491747470679'::bigint)
AND (division_id = '4514095770460846005'::bigint)
AND (incident_id = '234234234234234234'::bigint))
Heap Fetches: 0
Planning Time: 0.125 ms
Execution Time: 0.042 ms
this leads me to believe that the behaviour we seen in the original question and in this simplified case is a result of planner wrongfully deciding to scan the incident_id
level instead of traversing the tree for each incident_id value.
I will glad to understand why it does so though…
5