Question

I have a dataset with about 100000 points and another dataset with roughly 3000 polygons. For each of the points I need to find the nearest polygon (spatial match). Points inside a polygon should match to that polygon.

Computing all-pairs distances is feasible, but takes a bit longer than necessary. Is there an R package that will make use of a spatial index for this kind of matching problem?

I am aware of the sp package and the over function, but the documentation doesn't tell anything about indexes.

Was it helpful?

Solution

You could try and use the gDistance function in the rgeos package for this. As an example look at the below example, which I reworked from this old thread. Hope it helps.

require( rgeos )
require( sp )

# Make some polygons
grd <- GridTopology(c(1,1), c(1,1), c(10,10))
polys <- as.SpatialPolygons.GridTopology(grd)

# Make some points and label with letter ID
set.seed( 1091 )
pts = matrix( runif( 20 , 1 , 10 ) , ncol = 2 )
sp_pts <- SpatialPoints( pts )
row.names(pts) <- letters[1:10]

# Plot
plot( polys )
text( pts , labels = row.names( pts ) , col = 2 , cex = 2 )
text( coordinates(polys) , labels = row.names( polys ) , col = "#313131" , cex = 0.75 )

enter image description here

# Find which polygon each point is nearest
cbind( row.names( pts ) , apply( gDistance( sp_pts , polys , byid = TRUE ) , 2 , which.min ) )
#   [,1] [,2]
#1  "a"  "86"
#2  "b"  "54"
#3  "c"  "12"
#4  "d"  "13"
#5  "e"  "78"
#6  "f"  "25"
#7  "g"  "36"
#8  "h"  "62"
#9  "i"  "40"
#10 "j"  "55"

OTHER TIPS

I don't know anything about R but I will offer one possible solution using PostGIS. You may be able to load the data in PostGIS and process it faster than you can using R alone.

Given two tables planet_osm_point (80k rows) and planet_osm_polygon (30k rows), the following query executes in around 30s

create table knn as 
select 
    pt.osm_id point_osm_id, 
    poly.osm_id poly_osm_id
from planet_osm_point pt, planet_osm_polygon poly
where poly.osm_id = (
    select p2.osm_id 
    from planet_osm_polygon p2 
    order by pt.way <-> p2.way limit 1
);

The result is an approximation based on the distance between the point and the centre-point of the polygon's bounding box (not the centre-point of the polygon itself). With a bit more work this query can be adapted to get the nearest polygon based on the centre-point of the polygon itself although it won't execute as quickly.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top