Question

I have a table of suburbs and each suburb has a geom value, representing its multipolygon on the map. There is another houses table where each house has a geom value of its point on the map.

Both the geom columns are indexed using gist, and suburbs table has the name column indexed as well. Suburbs table has 8k+ records while houses table has 300k+ records.

Now my task is to find all houses within a suburb named 'FOO'.

QUERY #1:

SELECT * FROM houses WHERE ST_INTERSECTS((SELECT geom FROM "suburbs" WHERE "suburb_name" = 'FOO'), geom);

Query Plan Result:

Seq Scan on houses  (cost=8.29..86327.26 rows=102365 width=136)
  Filter: st_intersects($0, geom)
  InitPlan 1 (returns $0)
    ->  Index Scan using suburbs_suburb_name on suburbs  (cost=0.28..8.29 rows=1 width=32)
          Index Cond: ((suburb_name)::text = 'FOO'::text)

running the query took ~3.5s, returning 486 records.

QUERY #2: (prefix ST_INTERSECTS function with _ to explicitly ask it not to use index)

SELECT * FROM houses WHERE _ST_INTERSECTS((SELECT geom FROM "suburbs" WHERE "suburb_name" = 'FOO'), geom);

Query Plan Result: (exactly the same as Query #1)

Seq Scan on houses  (cost=8.29..86327.26 rows=102365 width=136)
  Filter: st_intersects($0, geom)
  InitPlan 1 (returns $0)
    ->  Index Scan using suburbs_suburb_name on suburbs  (cost=0.28..8.29 rows=1 width=32)
          Index Cond: ((suburb_name)::text = 'FOO'::text)

running the query took ~1.7s, returning 486 records.

QUERY #3: (Using && operator to add a boundary box overlap check before the ST_Intersects function)

SELECT * FROM houses WHERE (geom && (SELECT geom FROM "suburbs" WHERE "suburb_name" = 'FOO')) AND ST_INTERSECTS((SELECT geom FROM "suburbs" WHERE "suburb_name" = 'FOO'), geom);

Query Plan Result:

Bitmap Heap Scan on houses  (cost=21.11..146.81 rows=10 width=136)
  Recheck Cond: (geom && $0)
  Filter: st_intersects($1, geom)
  InitPlan 1 (returns $0)
    ->  Index Scan using suburbs_suburb_name on suburbs  (cost=0.28..8.29 rows=1 width=32)
          Index Cond: ((suburb_name)::text = 'FOO'::text)
  InitPlan 2 (returns $1)
    ->  Index Scan using suburbs_suburb_name on suburbs suburbs_1  (cost=0.28..8.29 rows=1 width=32)
          Index Cond: ((suburb_name)::text = 'FOO'::text)
  ->  Bitmap Index Scan on houses_geom_gist  (cost=0.00..4.51 rows=31 width=0)
        Index Cond: (geom && $0)

running the query took 0.15s, returning 486 records.


Apparently only query #3 is gaining benefit from the spatial index which improves the performance significantly. However, the syntax is ugly and repeating itself to some extend. My question is:

  1. Why postgis is not smart enough to use spatial index in query #1?
  2. Why query #2 has (much) better performance compare to query #1, considering they are both not using index?
  3. Any suggestions to make query #3 prettier? Or is there a better way to construct a query to do the same thing?
Was it helpful?

Solution

Try flattening the query into one query, without unnecessary sub-queries:

SELECT houses.*
FROM houses, suburbs
WHERE suburbs.suburb_name = 'FOO' AND ST_Intersects(houses.geom, suburbs.geom);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top