Question

I have the 2D spatial data in the form (xBin, yBin, value). e.g.:

DT = data.table(x=c(rep(1,3),rep(2,3),rep(3,3)),y=rep(c(1,2,3),3),value=100*c(1:9))

For each bin I want to compute the sum of variable "value" over all neighboring bins. A bin is considered a neighbor if both of its indices - x and y are within one unit from the current bin

e.g. for x=2, y=2, I want to compute

valueNeighbors(x=2,y=2) = value(x=1,y=1)+value(1,2)+value(1,3)
+value(2,1)+value(2,3)
+value(3,1)+value(3,2)+value(3,3)

My real data has ~1,000^2 bins, how can I do this efficiently?

Was it helpful?

Solution

Maybe with a raster

X <- matrix(1:20, 4)
r <- raster(X)
r
agg <- as.matrix(focal(r,matrix(1,3,3),sum, pad = T, padValue = 0))
agg

     [,1] [,2] [,3] [,4] [,5]
[1,]   14   33   57   81   62
[2,]   24   54   90  126   96
[3,]   30   63   99  135  102
[4,]   22   45   69   93   70

Which method is the faster for large datasets?

X <- matrix(1:1000000, 1000)
S <- matrix(NA, nrow(X), ncol(X))
r <- raster(X)

system.time(
as.matrix(focal(r,matrix(1,3,3),sum, pad = T, padValue = 0))
)
user  system elapsed 
0.39    0.08    0.47 

With a 1000x1000 matrix I was unable to get a result within a reasobable time using the solution proposed by Winsemius (Win 7 x64 8GB RAM)

OTHER TIPS

So this is a possible solution using some of the spatial packages in R. Note that it is not very refined but it does the job. I haven't checked the results manually. I also don't know how quick this method is compared to some of the offered matrix solutions.

DT<-data.frame(x=c(rep(1,3),rep(2,3),rep(3,3)),y=rep(c(1,2,3),3),value=100*c(1:9))
require(sp)
coordinates(DT)<-~x+y # Create spatial object (points)
rast<-raster(extent(DT),ncol=3,nrow=3)
grid<-rasterize(DT,rast)
grid<-rasterToPolygons(grid) # Create polygons

require(spdep)
neigh<-poly2nb(grid) # Create neighbour list
weights<-nb2listw(neigh,style="B",zero.policy=TRUE) # Create weights (binary)
grid$spatial.lag<-lag.listw(weights,grid$value,zero.policy=TRUE) # Add to raster

You can change the spatial object back into a data frame simply by using

DT2<-data.frame(grid)

Note that the ID variable corresponds with the rownumber in the initial data.

I don't think a data.table is the right vehicle. It's concepts of row-indexing is not well suited to this operation (although I may be spouting old information):

 X <- matrix(1:20, 4)
 S <- matrix(NA, nrow(X), ncol(X))
for (x in row(X)){ 
       for (y in col(X)){ 
              S[x,y] <-  sum(X[abs( row(X) - x)<2 & abs( col(X)-y)<2 ])
                 }}
 S
#---------
     [,1] [,2] [,3] [,4] [,5]
[1,]   14   33   57   81   62
[2,]   24   54   90  126   96
[3,]   30   63   99  135  102
[4,]   22   45   69   93   70

With greater consideration of efficiency this algorithm would much quicker ... but still much slower than raster::focal

rows <- dim(X)[1]; cols<-dim(X)[2]
 for (x in row(X)){
    for (y in col(X)){ 
        S[x,y] <-  sum(X[max(1,x-1):min(rows, x+1) ,max(1,y-1):min(cols,y+1) ])
                   }  }

Perhaps faster could be:

system.time(  S2 <- X+
         rbind ( cbind(X[-1,-1], 0), 0)+  #diagonal shifts of the matrix
         rbind( cbind( 0, X[-1,-1000]) , 0)+
                       rbind( 0, cbind( X[-1000, -1] , 0))+
                       rbind(0, cbind( 0,X[-1000,-1000]) )+
          rbind(  X[ -1, ], 0)+    # these create the sums on the same rows or columns
          rbind(0,  X[-1000, ])+
                        cbind( X[ , -1],0)+
                        cbind(0, X[ , -1000])  )
   user  system elapsed 
  0.563   0.065   0.630 
> identical(S,S2) # compare to the focal-method above
[1] TRUE
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top