Query on spatial index not responding

https://stackoverflow.com/questions/12772012

05-07-2021
|

Question

I have a database sitting on a SQL Server 2008 with ~12 billion rows that all contain lat, lon and a corresponding geography fields. I recently needed to add the ability to query on the geography field. I added spatial index, which took 6 days to process over 4TB of data.

CREATE SPATIAL INDEX IX_Location_Geo ON Location
(
    Geo
) USING  GEOGRAPHY_GRID 
WITH (
    GRIDS =(LEVEL_1 = MEDIUM,LEVEL_2 = MEDIUM,LEVEL_3 = MEDIUM,LEVEL_4 = MEDIUM), 
    CELLS_PER_OBJECT = 16, PAD_INDEX  = OFF, SORT_IN_TEMPDB = OFF, 
    DROP_EXISTING = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON PRIMARY
GO

Add intended using a query as such...

SELECT TOP 100 
    ci.LocationID, ci.Geo.STDistance(@g)  
FROM Location ci WITH(INDEX(IX_Location_Geo))
WHERE ci.Geo.Filter(@region) = 1 
ORDER BY ci.Geo.STDistance(@g)

Here's the estimate execution plan...

Execution Plan

I tested this query on sample set of 100 mill rows, and it worked splendidly. But on 12 bill rows the query does not respond after ~4 hours and finally fails with a disk write error, which is strange because the disk has 5TB unused.

Msg 1101, Level 17, State 10, Line 4 Could not allocate a new page 
for database 'TEMPDB' because of insufficient disk space in filegroup 
'DEFAULT'. Create the necessary space by dropping objects in the filegroup, 
adding additional files to the filegroup, or setting autogrowth on for 
existing files in the filegroup.

Hoping that there's someone that might see an obvious oversight on my part. Big Thanks!

Solution

Instead of using vertical scalability (add more memory, CPU, hard drive space - making a single beefy machine) consider using horizontal scalability (splitting load between many commodity servers). Any operation takes time and space. Big-O notation describes, that for any computation that takes longer than O(N), you are doomed to compute such volume at all. This is why, from a high level, you get errors and huge time for a query to complete.

Possible solution

Change the pattern of data access. Use sharding - split data into smaller chunks. Use WHERE clause extensively and Skip/Take pagination pattern (I am not sure about the proper syntax in T-SQL). There is also Map-Reduce pattern making a buzz. In short, stop scaling vertically at that volume.

OTHER TIPS

The error message you posted says that the disk space ran out for tempdb, not for your main database. So you could make space available to it, but SQL Server should consume that much in the first place! So it is not a solution.

Please post the estimated execution plan (since you cannot obtain the actual one). I will update this answer with my thoughts on the plan.

As a general comment: Debugging a query perf problem usually starts with the plan because it tells you what SQL Server does at execution time.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow