I’ve got the following table:
name | type |
---|---|
id | PrimaryKey, UUID, Unique |
test_id | ForeignKey, UUID, Indexed |
created_at | DateTime+TZ, Indexed |
2 418 139 Distinct rows (test_id)
283 036 392 Distinct rows (created_at)
283 085 093 Total rows
Now I want to get some rows for a specific test_id
, ordered by created_at
.
As both columns are indexed, I assumed that the query should be quite fast.
However, I’ve seen a dramatic decrease in performance when running with LIMIT 1 vs. any other limits:
With LIMIT 2
:
EXPLAIN ANALYZE VERBOSE SELECT "my_table"."id" FROM "my_table" WHERE ("my_table"."test_id" = '00018843-d632-42d4-b832-cf9d6df8e454'::uuid) ORDER BY "my_table"."created_at" ASC LIMIT 2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=13137.72..13137.72 rows=2 width=56) (actual time=0.033..0.034 rows=1 loops=1)
Output: id, identifier, type, created_at
-> Sort (cost=13137.72..13145.82 rows=3242 width=56) (actual time=0.033..0.033 rows=1 loops=1)
Output: id, created_at
Sort Key: my_table.created_at
Sort Method: quicksort Memory: 25kB
-> Index Scan using my_table_test_id_d24b61ed on public.my_table (cost=0.57..13105.30 rows=3242 width=56) (actual time=0.014..0.015 rows=1 loops=1)
Output: id, created_at
Index Cond: (my_table.test_id = '00018843-d632-42d4-b832-cf9d6df8e454'::uuid)
Query Identifier: 5848686225449084285
Planning Time: 0.093 ms
Execution Time: 0.065 ms
(12 rows)
With LIMIT 1
:
EXPLAIN ANALYZE VERBOSE SELECT "my_table"."id" FROM "my_table" WHERE ("my_table"."test_id" = '00018843-d632-42d4-b832-cf9d6df8e454'::uuid) ORDER BY "my_table"."created_at" ASC LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..10735.90 rows=1 width=56) (actual time=73459.135..73459.135 rows=1 loops=1)
Output: id, created_at
-> Index Scan using my_table_created_at_231d382e on public.my_table (cost=0.57..34803947.27 rows=3242 width=56) (actual time=73459.134..73459.134 rows=1 loops=1)
Output: id, created_at
Filter: (my_table.test_id = '00018843-d632-42d4-b832-cf9d6df8e454'::uuid)
Rows Removed by Filter: 110868840
Query Identifier: 5848686225449084285
Planning Time: 0.085 ms
Execution Time: 73459.155 ms
(9 rows)
As you can see, running with LIMIT 1
takes significant more time than running with LIMIT 2
.
Running with any other limit, e.g. LIMIT 10
or LIMIT 100
, has the same performance then LIMIT 2
.
I can see why the optimizer choose a different path, as the main cost of 0.57..10735.90
seems to be lower than 13137.72..13137.72
.
However, I do not understand why the actual time differs so much from the calculated one.
Additionally, the upper costs are dramatically higher (34803947.27
vs 13105.30
), thus the optimizer should (at least IMO) see somehow that this won’t scale in the long run.
I’ve tried to:
- Run
VACUUM ANALYZE
on the table manually, no changes (above query plans are the ones after running it). - Played around with
work_mem
(defaults to4MB
) with no improvements. - Played around with
effective_cache_size
(defaults to266313056kB
) with no improvements.
Are there other ways to fix this or to improve the optimizer?
I’ve got the following environment:
- PostgreSQL: 14.10
- Running in AWS as an RDS Aurora PostgreSQL database, deployed with the AWS CDK
As I’m using Django 4.2 as the application framework, I would prefer any solutions which can be used with its standard ORM.