I have a large query that attempts to match up centroids with the polygons that they fit inside. While I do constrain by Z values of the blocks and the polygons, it still does a lot of point-in-poly calculations and takes a long time to run.

For some background:

  • The table that contains the centroids has 2.5M rows
  • All of the spatial data in the table is in quite a small area of the world, the bounding box of the entire thing is only 7643 x 2351 metres
  • Of those rows, 660K fit match the Z critera
  • The table that contains the polygons has 10K rows
  • All of the spatial data in the table is in an even smaller area of the world
  • Of those rows, 2366 match the name criteria
  • Running the query without any indexes takes 11 hours and returns 91K matches

The query is something like this:

select blocks.Id, blocks.WGS84Centroid, polygons.Shape
from 
blocks inner join polygons
    on
    blocks.ZCentre >= (polygons.ZCentre - (polygons.ZLength/2))  and blocks.ZCentre <= (polygons.ZCentre + (polygons.ZLength/2)) and
    polygons.Shape.STIntersects(blocks.WGS84Centroid) = 1
inner join name
    on
    polygons.nameId = name.ID
where name.Name = 'blah'

So, in an effort to speed up this query, I added a spatial index on blocks.WGS84Centroid, and one on polygons.Shape.
The query analyser also suggested a non-clustered index on blocks.ZCentre, including blocks.Id and blocks.WGS84Centroid.

After all that, here's the query plan:
SSMS query plan

And the filter cost:
SSMS filter cost

However, after adding those 3 indexes the query still takes just as long to run.
What can I do now?

有帮助吗?

解决方案

I think that the reason why the spatial indexes didn’t help much is probably something to do with the density of the data on such a small area of the earth.
I’ve experimented a bit with this and the best option appears to be as high a density on the index as possible.

In SQL Server 2008 this is by using HIGH on each of the 4 levels of spatial index grid. By hinting to the optimiser to use this index I knocked down the join to ~1 hour rather than 10!

In SQL Server 2012 I found another few interesting aspects:
The first is that STIntersects() is better optimised if one of the geography objects is a point, as in my case. On my machine, the same query ran twice as fast in 2012 as it did in 2008.

The second is much more impressive! A new type of spatial index in 2012 uses up to 8 levels of tessellation. I’m guessing that the dense data is particularly suited to this geometrically higher level of tessellation in the index, because the same query ran 45x as fast when hinted to use the new index rather than the old 4-level one.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top