I have a simple model with a foreign key and a many-to-many relationship:
class Car(models.Model):
uuid = models.UUIDField()
is_red = models.BooleanField()
class Owner(models.Model):
car = models.ForeignKey(Car, to_field="uuid", on_delete=models.CASCADE)
class Driver(models.Model):
cars = models.ManyToManyField(ProtectedArea, related_name="cars")
Now a lot of my application logic relies on cars on which at least one of the three conditions: it is red, it has at least one owner, it has at least one driver is true. It might be an important information, that in reality the Car-model contains some rather big polygonal data, maybe that is relevant for performance here?
I have a custom manager for this but now matter how I built the query it seems extremely slow. Times are taken from my local dev machine with ~50k cars, 20k Owners, 1.2k Drivers. The view is a default FilterView
from django-filter
without any filters being actually active.
My manager currently looks like this:
class ActiveCarManager(models.Manager):
def get_queryset(self):
cars_with_owners = Owner.objects.values("car__uuid").distinct()
cars_with_drivers = Drivers.objects.values("cars__uuid").distinct()
return (
super()
.get_queryset()
.filter(
Q(uuid__in=cars_with_owners)
| Q(uuid__in=cars_with_drivers)
| Q(is_red=True)
)
)
The view generates 2 queries from this, one count query and one query to fetch the actual items. The query that is so slow is the count query. On our staging instance this reguarly takes several seconds to complete. Here are the generated SQL and the query plan:
SELECT COUNT(*) AS "__count"
FROM "app_car"
WHERE ("app_car"."uuid" IN (SELECT DISTINCT U0."car" FROM "app_owner" U0) OR "app_car"."uuid" IN (SELECT DISTINCT U2."uuid" FROM "app_driver" U0 LEFT OUTER JOIN "app_driver_cars" U1 ON (U0."id" = U1."driver_id") LEFT OUTER JOIN "app_car" U2 ON (U1."car_id" = U2."id")) OR "app_car"."is_red" = TRUE)
Aggregate (cost=32426.52..32426.53 rows=1 width=8) (actual time=951.281..951.289 rows=1 loops=1)
-> Seq Scan on app_car (cost=13300.79..32335.30 rows=36485 width=0) (actual time=20.850..950.695 rows=3687 loops=1)
Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2) OR (is_red IS TRUE))
Rows Removed by Filter: 44147
SubPlan 1
-> HashAggregate (cost=556.68..580.34 rows=2366 width=16) (actual time=6.713..6.940 rows=2366 loops=1)
Group Key: u0.park
Batches: 1 Memory Usage: 241kB
-> Seq Scan on app_owner u0 (cost=0.00..508.57 rows=19245 width=16) (actual time=0.007..3.970 rows=19245 loops=1)
Filter: approved
Rows Removed by Filter: 12
SubPlan 2
-> HashAggregate (cost=12669.76..12705.58 rows=3582 width=16) (actual time=12.404..12.718 rows=3009 loops=1)
Group Key: u2.uuid
Batches: 1 Memory Usage: 369kB
-> Nested Loop Left Join (cost=0.86..12660.81 rows=3582 width=16) (actual time=0.090..10.961 rows=3585 loops=1)
-> Merge Left Join (cost=0.56..238.42 rows=3582 width=8) (actual time=0.065..1.403 rows=3585 loops=1)
Merge Cond: (u0_1.id = u1.driver_id)
-> Index Only Scan using driver_id_pkey on driver_id u0_1 (cost=0.28..70.02 rows=2249 width=8) (actual time=0.036..0.388 rows=2249 loops=1)
Heap Fetches: 0
-> Index Only Scan using driver_id_park_id_car_cb09e8dc_uniq on driver_id_carss u1 (cost=0.28..118.01 rows=3582 width=16) (actual time=0.026..0.435 rows=3582 loops=1)
Heap Fetches: 0
-> Memoize (cost=0.30..4.11 rows=1 width=24) (actual time=0.002..0.002 rows=1 loops=3585)
Cache Key: u1.car_id
Cache Mode: logical
Hits: 576 Misses: 3009 Evictions: 0 Overflows: 0 Memory Usage: 377kB
-> Index Scan using app_car_pkey on app_car u2 (cost=0.29..4.10 rows=1 width=24) (actual time=0.002..0.002 rows=1 loops=3009)
Index Cond: (id = u1.car_id)
Planning Time: 5.902 ms
Execution Time: 952.436 ms
Interestingly the Query Plan states that it took 952ms, however Django Debug Toolbar reports 2162ms …
Do you have any idea how I can speed this up? Or should I use a dedicated “is_active” column and update it using signals?