What spatial indexing algorithm should I use?

https://stackoverflow.com//questions/12679094

12-12-2019
|

Question

I want to implement some king of spatial indexing data structure for my MKAnnotations. Currently it's horribly slow when I try to filter them based on distance criteria ( 3-4k of locations, currently extremely slow with a simple double for ... ).

I'd like to create clusters of MKAnnotations, to decide if it is close to another. Also, these locations are in a somewhat (creation) order and a "previous"/"next" functionality would be needed to "jump" between (this is not a must). I've read about kd-tree and r-tree structures and they both seem to meet the fast distance/neighbor obtaining option for filtering/clustering, but I'm not sure which is the best for me or if there are other options too. What algorithm/data structure should I use ?

Update: I store these locations in a Core Data database, they represent a path. When the map is opened they are fetched into an array and then I just use that array for distance calculations and annotation creation. When the user moves/zooms the map, I loop through them and decide what needs to be changed on map, so it is kinda static the whole stuff. As I understood, if I'd be using a tree, I could store the locations there and when a zoom/move happens I just search through it and obtain the ones in the new region. Is this true ?

Even in the dynamic case, when I can add new locations to this array, it would be a single insertion and it's happening rarely.

Solution

It depends a lot on what your usage patterns are (how my writes, for example, in-memory or on-disk) and how your data looks like (that is how it is distributed).

R-trees are good because they are balanced, and allow updating. The R*-tree in my experience is clearly better than the other variants because of the split strategy it has. The benefit is that it produces more square pages than the other strategies, so that for many queries you will need to scan fewer pages.

kd-trees are good if you are in-memory and static. Updating them is very bad, you will need to rebuild the index quite often.

And if your data does not change very often, bulk-loading for the R-tree works very well. You can do Sort-Tile-Recursive bulk loading, which essentially requires (partially) sorting your data on X and Y alternatingly, so it takes a low O(n log n) to build the tree; very similar to bulk-loading an kd-tree, except that you multi-split instead of binary splitting. This is very popular.

Furthermore, you can keep track of the number of objects in each page. When displaying things on a map, you may want to stop early when a page would display too small on the screen (i.e. smaller than a marker). At this point, you would not scan that page, but only take the number of objects and display that as a clustered marker until the user zooms in.

For 2D data, with a limited value domain, do not overlook the simple things. Quadtrees can work really well, too! Simplicity can make it a lot easier to optimize things. Or a classic grid approach. If your users tend to spread their annotations in an area (and not put them all into one place), you can just compute integer x,y grid coordinates, and then hash them and make a list for each grid cell.

OTHER TIPS

I am no iOS developer, but I looked over the docs and found this:

MKMapView.annotationsInMapRect:

Returns the annotation objects located in the specified map rectangle.

(NSSet *)annotationsInMapRect:(MKMapRect)mapRect

Parameters

mapRect: The portion of the map that you want to search for annotations.

Return Value The set of annotation objects located in mapRect.

Discussion This method offers a fast way to retrieve the annotation objects in a particular portion of the map. This method is much faster than doing a linear search of the objects in the annotations property yourself.

This suggests that the NKMapView already organizes annotations in a spatial index structure. Would this method meet your needs?

If not, I would look for existing open source implementations of any 2D spatial indexing structure and pick the one with best documentation, cleanest interfaces, etc. rather than worrying about efficiency. If you need to write the code form scratch, I think a quadtree would be the easiest to implement. On the other hand, the Wikipedia article on R-tree seems more specifically targeted towards mapping than the K-D Tree or Quadtree.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow