Are SPATIAL Geometry indices performance dependant on the size and density of geometry shapes?

https://stackoverflow.com/questions/5486510

14-11-2019
|

Question

Spatial Indexes

Given a spatial index, is the index utility, that is to say the overall performance of the index, only as good as the overall geometrys.

For example, if I were to take a million geometry data types and insert them into a table so that their relative points are densely located to one another, does this make this index perform better to identical geometry shapes whose relative location might be significantly more sparse.

Question 1

For example, take these two geometry shapes.

Situation 1

LINESTRING(0 0,1 1,2 2)
LINESTRING(1 1,2 2,3 3)

Geometrically they are identical, but their coordinates are off by a single point. Imagine this was repeated one million times.

Now take this situation,

Situation 2

LINESTRING(0 0,1 1,2 2)
LINESTRING(1000000 1000000,1000001 10000001,1000002 1000002)
LINESTRING(2000000 2000000,2000001 20000001,2000002 2000002)
LINESTRING(3000000 3000000,3000001 30000001,3000002 3000002)

In the above example:

the lines dimensions are identical to the situation 1,
the lines are of the same number of points
the lines have identical sizes.

However,

the difference is that the lines are massively futher apart.

Why is this important to me?

The reason I ask this question is because I want to know if I should remove as much precision from my input geometries as I possibly can and reduce their density and closeness to each other as much as my application can provide without losing accuracy.

Question 2

This question is similar to the first question, but instead of being spatially close to another geometry shape, should the shapes themselves be reduced to the smalest possible shape to describe what it is that the application requires.

For example, if I were to use a SPATIAL index on a geometry datatype to provide data on dates. If I wanted to store a date range of two dates, I could use a datetime data type in mysql. However, what if I wanted to use a geometry type, so that I convery the date range by taking each individual date and converting it into a unix_timestamp().

For example:

 Date("1st January 2011") to Timestamp =  1293861600
 Date("31st January 2011") to Timestamp =  1296453600

Now, I could create a LINESTRING based on these two integers.

 LINESTRING(1293861600 0,1296453600 1)

If my application is actually only concerned about days, and the number of seconds isn't important for date ranges at all, should I refactor my geometries so that they are reduced to their smallest possible size in order to fulfil what they need.

So that instead of "1293861600", I would use "1293861600" / (3600 * 24), which happens to be "14975.25".

Can someone help fill in these gaps?

Solution

When inserting a new entry, the engine chooses the MBR which would be minimally extended.

By "minimally extended", the engine can mean either "area extension" or "perimeter extension", the former being default in MySQL.

This means that as long as your nodes have non-zero area, their absolute sizes do not matter: the larger MBR's remain larger and the smaller ones remain smaller, and ultimately all nodes will end up in the same MBRs

These articles may be of interest to you:

As for the density, the MBR are recalculated on page splits, and there is a high chance that all points too far away from the main cluster will be moved away on the first split to their own MBR. It would be large but be a parent to all outstanding points in few iterations.

This will decrease the search time for the outstanding points and will increase the search time for the cluster points by one page seek.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow