Im new to postgres, sql and all RDBMS in general so am working on a project to further my knowledge. Im building a room finding website. Each room is part of a property and each property can have multiple rooms. I want to do a proximity search and am using postgis.
property
table (other irrelevant columns removed):
id | title | rent | available | geoLocation |
---|---|---|---|---|
SERIAL PK | TEXT | INTEGER | BOOLEAN | GEOGRAPHY(Point) |
room
table (other irrelevant columns removed):
id | propertyId | rent | available | geoLocation |
---|---|---|---|---|
SERIAL PK | INTEGER REFERENCES property(id) | INTEGER | BOOLEAN | GEOGRAPHY(Point) |
I have a query where i try to find the closest properties within a given radius $3
to a given point (lng $1
, lat $2
) and where at least one room is within 50 dollars and 10,000 dollars and is available. I then return then each property once with all rooms part of it that matched as an element in rooms
I created an index on the geolocation on the property table:
property_geolocation_idx_geography" gist (geolocation)
I also have an index on the room table:
room_geolocation_idx" gist (geolocation)
I have a dataset of ~70,000 properties and ~150,000 rooms. When i run the query below not only is it incredibly slow at 13 seconds, but it also doesn’t use the index I’ve created. Running an explain on the query below shows:
Limit (cost=429967.49..429969.26 rows=1 width=1192)
-> Result (cost=429953.31..429967.49 rows=8 width=1192)
-> Sort (cost=429953.31..429953.33 rows=8 width=1176)
Sort Key: ((property.geolocation <-> '0101000020E61000000432E0D16B58ACBF84ABF2F395C54940'::geography))
-> GroupAggregate (cost=429945.11..429953.19 rows=8 width=1176)
Group Key: property.id
-> Gather Merge (cost=429945.11..429946.03 rows=8 width=1198)
Workers Planned: 1
-> Sort (cost=428945.10..428945.12 rows=5 width=1198)
Sort Key: property.id
-> Parallel Hash Join (cost=425488.95..428945.04 rows=5 width=1198)
Hash Cond: (room.propertyid = property.id)
-> Parallel Seq Scan on room (cost=0.00..3335.06 rows=46104 width=66)
Filter: (available AND (rent > 50) AND (rent < 10000))
-> Parallel Hash (cost=425488.91..425488.91 rows=3 width=1136)
-> Parallel Index Scan using property_pkey on property (cost=0.29..425488.91 rows=3 width=1136)
Filter: st_dwithin((geolocation)::geography, '0101000020E61000000432E0D16B58ACBF84ABF2F395C54940'::geography, '13118244
So my questions:
Is this query garbage?
If so why?
How could I improve?
Why is my index not being picked up?
SELECT
property.*,
json_agg(
json_build_object(
'id', room.id,
'rent', room.rent,
)
) AS rooms,
ST_X(ST_AsText(property.geoLocation::geometry)) AS longitude,
ST_Y(ST_AsText(property.geoLocation::geometry)) AS latitude
FROM
property
LEFT JOIN
room ON property.id = room.propertyId
WHERE
room.available = TRUE
AND room.rent > 50
AND room.rent < 10000
AND ST_DWITHIN(property.geoLocation::geography, ST_MakePoint($1, $2)::geography, $3)
GROUP BY
property.id
ORDER BY
property.geoLocation <-> ST_MakePoint($1, $2)::geography
LIMIT
${PROPERTIES_PER_PAGE}
OFFSET
${offset};