Try flattening the query into one query, without unnecessary sub-queries:
SELECT houses.*
FROM houses, suburbs
WHERE suburbs.suburb_name = 'FOO' AND ST_Intersects(houses.geom, suburbs.geom);
Question
I have a table of suburbs and each suburb has a geom value, representing its multipolygon on the map. There is another houses table where each house has a geom value of its point on the map.
Both the geom columns are indexed using gist, and suburbs table has the name column indexed as well. Suburbs table has 8k+ records while houses table has 300k+ records.
Now my task is to find all houses within a suburb named 'FOO'.
QUERY #1:
SELECT * FROM houses WHERE ST_INTERSECTS((SELECT geom FROM "suburbs" WHERE "suburb_name" = 'FOO'), geom);
Query Plan Result:
Seq Scan on houses (cost=8.29..86327.26 rows=102365 width=136)
Filter: st_intersects($0, geom)
InitPlan 1 (returns $0)
-> Index Scan using suburbs_suburb_name on suburbs (cost=0.28..8.29 rows=1 width=32)
Index Cond: ((suburb_name)::text = 'FOO'::text)
running the query took ~3.5s, returning 486 records.
QUERY #2: (prefix ST_INTERSECTS function with _ to explicitly ask it not to use index)
SELECT * FROM houses WHERE _ST_INTERSECTS((SELECT geom FROM "suburbs" WHERE "suburb_name" = 'FOO'), geom);
Query Plan Result: (exactly the same as Query #1)
Seq Scan on houses (cost=8.29..86327.26 rows=102365 width=136)
Filter: st_intersects($0, geom)
InitPlan 1 (returns $0)
-> Index Scan using suburbs_suburb_name on suburbs (cost=0.28..8.29 rows=1 width=32)
Index Cond: ((suburb_name)::text = 'FOO'::text)
running the query took ~1.7s, returning 486 records.
QUERY #3: (Using && operator to add a boundary box overlap check before the ST_Intersects function)
SELECT * FROM houses WHERE (geom && (SELECT geom FROM "suburbs" WHERE "suburb_name" = 'FOO')) AND ST_INTERSECTS((SELECT geom FROM "suburbs" WHERE "suburb_name" = 'FOO'), geom);
Query Plan Result:
Bitmap Heap Scan on houses (cost=21.11..146.81 rows=10 width=136)
Recheck Cond: (geom && $0)
Filter: st_intersects($1, geom)
InitPlan 1 (returns $0)
-> Index Scan using suburbs_suburb_name on suburbs (cost=0.28..8.29 rows=1 width=32)
Index Cond: ((suburb_name)::text = 'FOO'::text)
InitPlan 2 (returns $1)
-> Index Scan using suburbs_suburb_name on suburbs suburbs_1 (cost=0.28..8.29 rows=1 width=32)
Index Cond: ((suburb_name)::text = 'FOO'::text)
-> Bitmap Index Scan on houses_geom_gist (cost=0.00..4.51 rows=31 width=0)
Index Cond: (geom && $0)
running the query took 0.15s, returning 486 records.
Apparently only query #3 is gaining benefit from the spatial index which improves the performance significantly. However, the syntax is ugly and repeating itself to some extend. My question is:
Solution
Try flattening the query into one query, without unnecessary sub-queries:
SELECT houses.*
FROM houses, suburbs
WHERE suburbs.suburb_name = 'FOO' AND ST_Intersects(houses.geom, suburbs.geom);