Distance of point feature to nearest polygon in R

Question 1

Here I am using the gDistance function in the rgeos topology library. I am using a brute force double loop but it is surprisingly fast. It takes less than 2 seconds for 142 points and 10 polygons. I am sure that there is a more elgant way to perform the looping.

   require(rgeos)

    # CREATE SOME DATA USING meuse DATASET
    data(meuse)
      coordinates(meuse) <- ~x+y
        pts <- meuse[sample(1:dim(meuse)[1],142),]  
    data(meuse.grid) 
      coordinates(meuse.grid) = c("x", "y") 
        gridded(meuse.grid) <- TRUE 
          meuse.grid[["idist"]] = 1 - meuse.grid[["dist"]]    
        polys <- as(meuse.grid, "SpatialPolygonsDataFrame")
          polys <- polys[sample(1:dim(polys)[1],10),]   
    plot(polys)
      plot(pts,pch=19,cex=1.25,add=TRUE)      

    # LOOP USING gDistance, DISTANCES STORED IN LIST OBJECT
    Fdist <- list()
      for(i in 1:dim(pts)[1]) {
        pDist <- vector()
          for(j in 1:dim(polys)[1]) { 
            pDist <- append(pDist, gDistance(pts[i,],polys[j,])) 
          }
        Fdist[[i]] <- pDist
      } 

    # RETURN POLYGON (NUMBER) WITH THE SMALLEST DISTANCE FOR EACH POINT  
    ( min.dist <- unlist(lapply(Fdist, FUN=function(x) which(x == min(x))[1])) ) 

    # RETURN DISTANCE TO NEAREST POLYGON
    ( PolyDist <- unlist(lapply(Fdist, FUN=function(x) min(x)[1])) ) 

    # CREATE POLYGON-ID AND MINIMUM DISTANCE COLUMNS IN POINT FEATURE CLASS
    pts@data <- data.frame(pts@data, PolyID=min.dist, PDist=PolyDist)

    # PLOT RESULTS
    require(classInt)
    ( cuts <- classIntervals(pts@data$PDist, 10, style="quantile") )
       plotclr <- colorRampPalette(c("cyan", "yellow", "red"))( 20 )
         colcode <- findColours(cuts, plotclr)
    plot(polys,col="black")
      plot(pts, col=colcode, pch=19, add=TRUE)

The min.dist vector represents the row number of the polygon. For instance you could subset the nearest polygons by using this vector as such.

near.polys <- polys[unique(min.dist),]

The PolyDist vector contain the actual Cartesian minimum distances in the projection units of the features.

Question 2

In a polygon you have quite a lot of lines. The distance between a polygon is either zero, if the point lies within the polygon or lies on an edge.

So actually you are looking for two cases:

Check if the point lies inside of any polygon (remember it might be more then a single polygon
Get a collection of all edges and calculate each distance of the point from the edge. The nearest distance gives you the distance of the polygon the edge belongs too.

So this is a simple algorithm that if we consider 10 edges per polygon takes O(10 * 10) * 142 for all of your points. That makes 100 * 142 = 14200 operations. => O(m * deltaE) * n (m is number of polygones, deltaE is the average number of edges per polygon, n is the number of points).

Now we check if we can speed this up. The first thing comming to mind is that we can use bounding box checks or bounding circles for each polygon.

Another idea is to prepare the nearest edges for each polygon for a set of angles. For example if you have 8 angles (every 45°) you can remove all edges from the list that are superseeded by another edge (therefore any point of the removed edge will yield always a greater distance than any point of the other edge of the same polygon.

This way typically you can might reduce the complexity in great margin. If you think of a rectangle you only have one or two edges instead of 4. If you think of a regular 8 edge polygon you might end up with typically one or two and at max 3 edges per polygon.

If you add the normal vector to each edge you can calculate if the point might be inside and you have to perform a scanline or whatever check (or you know its konvex) to be sure.

There are also mapping indexes possible like separating the two dimensional space by x and y in a equidistant mannor. Thisway you only have to test polygones lying in 9 sectors.

The next version might be using an R-tree that each bounding box (circle) of each node has to be checked for the minimum expected distance and the maximum expected distance. So one does not need to check the polygons of a node that result in far greater minimum distances than the maximum distances of another node.

Another thing is if you have a given tree like order like you have with map data. In a street map you always have world -> region -> country -> county -> city -> city sector ->...

Thisway you can search for the nearest location in a map of the whole world containing millions of polygons in a reasonable time mostly <10ms.

So to speak you have a lot of options here. And preprocessing your polygon list and extract relevant edges either by using binary space partition trees of polygons or use an angular approach or even go with something even more fancier. Its up to you. I expect that you end up by doing something in the logarithmic range like O(log(n) * log(deltaE)) becoming O(log(n)) as an average complexity.