Why Postgres query is not using column in Index Condition but uses it in Filter condition

I am running a simple join query, where the table incident_tracker_details is covered by a suitable index. The query is slow because it doesn`t use all the columns in Index Condition, but does use the column as a Filter leading to a large heap searches

Query:

SELECT
    relevant_incidents_per_inventory_item.inventory_id,
    tracker_id
FROM
    relevant_incidents_per_inventory_item
JOIN
    incident_tracker_details
ON
    incident_tracker_details.incident_id = relevant_incidents_per_inventory_item.incident_id
WHERE
    incident_tracker_details.company_id = :companyId
    and incident_tracker_details.division_id = :divisionId
    AND incident_tracker_details.value > 0

the index in use:

create index companyid_divisionId_incidentId_idx
on incident_tracker_details (company_id, division_id, incident_id)
where (value > 0) include (tracker_id);

the relevant_incidents_per_inventory_item does not have any relevant indexes and the table is very small (few hundreds of rows).

the actual query plan:

QUERY PLAN
    Hash Join  (cost=6901.58..6913.72 rows=1 width=8) (actual time=90.663..315.278 rows=522 loops=1)
      Hash Cond: (incident_tracker_details.incident_id = relevant_incidents_per_inventory_item.incident_id)
      ->  Index Only Scan using companyid_divisionId_incidentId_idx on incident_tracker_details  (cost=0.57..12.68 rows=6 width=16) (actual time=0.039..202.787 rows=246774 loops=1)
            Index Cond: ((company_id = '777704491747470679'::bigint) AND (division_id = '777795770460846005'::bigint))
            Heap Fetches: 124039
      ->  Hash  (cost=6900.88..6900.88 rows=10 width=8) (actual time=86.704..86.709 rows=259 loops=1)
            ...not relevant part
    Planning Time: 0.518 ms
    Execution Time: 315.402 ms

Why is the “incident_id” not part of the Index Cond? this leads to retrieving all the rows under companyId, divisionId , and then the engine has to filter all the rows for those having incident_tracker_details.incident_id = relevant_incidents_per_inventory_item.incident_id

I expected to see all the columns used in the Index Condition, leading to fast retrieval of only the relevant data

The reason incident_id isn’t part of Index Cond: is because it isn’t part of the selection criteria associated with incident_tracker_details. The only indexed columns mentioned in the WHERE clause are company_id and division_id. Although incident_id is used in the join criteria, it is not part of the selection criteria and thus has no part in the conditions used for the index only scan. The query engine has to get all incident_id values associated with the selection criteria to identify the rows in relevant_incidents_per_inventory_item that satisfy the join condition.

If relevant_incidents_per_inventory_item has an index with incident_id as its first column, then the join criteria can be checked with an index scan. If there is no such index, then the join condition will be evaluated by using a full table scan and filtering for rows that have incident_id values matching those found by the index only scan of companyid_divisionId_incidentId_idx. Since you elided the portion of the explain plan associated with relevant_incidents_per_inventory_item and didn’t include all relevant DDL in the post, there isn’t any way for us to determine which of these was the case.

I tried to find the answer to the incomplete index usage behaviour, and in order to simplify the case I removed the join condition and built a simple query targeting the companyId, divisionId, incidentId index. What I discovered is that when the query has a single incident_id value, the index is used fully and all columns are in Index Cond.
On the other hand, when I have multiple (more than 1) incident_ids either in OR or in IN form, the index is used partially what leads to the slow Heap Fetches as seen before.

Simplified query:

SELECT
    tracker_id
FROM
    bi.incident_tracker_details
WHERE
    company_id = :companyId
    AND division_id = :divisionId
        (incident_id = 234234234234234234 OR incident_id = 333333333333333333)   
    AND value > 0;

Query plan for 2 incident_id values using OR. Notice the Heap Fetches, and index on Index Cond only company_id and division_id columns, the incident_id is in Filter.

    QUERY PLAN
    Index Only Scan using companyid_divisionId_incident_id_idx on incident_tracker_details  (cost=0.57..8.72 rows=1 width=8) (actual time=103.485..103.485 rows=0 loops=1)
      Index Cond: ((company_id = '4478704491747470679'::bigint) AND (division_id = '4514095770460846005'::bigint))
      Filter: ((incident_id = '234234234234234234'::bigint) OR (incident_id = '333333333333333333'::bigint))
      Rows Removed by Filter: 249705
      Heap Fetches: 20397
    Planning Time: 0.155 ms
    Execution Time: 103.512 ms

On the other hand, same query with a single incident_id value. The Heap Fetches is 0 and the execution time is almost 0:

    QUERY PLAN
    Index Only Scan using companyid_divisionId_incident_id_idx on incident_tracker_details  (cost=0.57..8.59 rows=1 width=8) (actual time=0.019..0.019 rows=0 loops=1)
      Index Cond: ((company_id = '4478704491747470679'::bigint) 
                    AND (division_id = '4514095770460846005'::bigint) 
                    AND (incident_id = '234234234234234234'::bigint))
      Heap Fetches: 0
    Planning Time: 0.125 ms
    Execution Time: 0.042 ms

this leads me to believe that the behaviour we seen in the original question and in this simplified case is a result of planner wrongfully deciding to scan the incident_id level instead of traversing the tree for each incident_id value.
I will glad to understand why it does so though…

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: Kiến thức lập trình - @ 12:04

Thẻ: postgresql

Thiết kế website giá rẻ

Danh mục

Why Postgres query is not using column in Index Condition but uses it in Filter condition