Question

I'm looking for some help understanding how to implement a 2-dimensional kernel density method, with a isotropic variance, and a bivariate normal kernel, kind of, but instead of using the typical distance, because the data is on the surface of the earth, I need to use a great-circle distance.

I'd like to replicate this in R, but I can't figure out how to use a distance metric other than the simple euclidean distance for any of the built in estimators, and since it uses a complex method with convolutions to add the kernels. Does anyone have a way to program an arbitrary kernel?

Was it helpful?

Solution

I ended up modifying the kde2d function from the MASS library. Some significant revision was needed, as is shown below. That said, the code is very flexible, allowing an arbitrary 2-d kernel to be used. (rdist.earth() was used for the great circle distance, h is the chosen bandwidth, in this case, in km, and n is the number of grid points in each direction to be used. rdist.earth requires the "fields" library)

The function could be modified to perform calculations in more than 2d, but the grid gets large very fast in higher dimensions. (Not that it's small now.)

Comments and suggestions on elegance or performance are welcome!

kde2d_mod <- function (data, h, n = 200, lims = c(range(data$lat), range(data$lon))) {
#Data is a matrix: lon,lat for each source. (lon,lat to match rdist.earth format.)
print(Sys.time()) #for timing

nx <- dim(data)[1]
if (dim(data)[2] != 2) 
stop("data vectors have only lat-long data")
if (any(!is.finite(data))) 
stop("missing or infinite values in the data are not allowed")
if (any(!is.finite(lims))) 
stop("only finite values are allowed in 'lims'")
#Grid:
g<-grid(n,lims) #Function to create grid.

#The distance matrix gets large... Can we work around it? YES WE CAN!
sets<-ceiling(dim(g)[1]/10000)
#Allocate our output:
z<-rep(as.double(0),dim(g)[1])

for (i in (1:sets)-1) {
   g_subset=g[(i*10000+1):(min((i+1)*10000,dim(g)[1])),]
   a_matrix<-rdist.earth(g_subset,data,miles=FALSE)

   z[(i*10000+1):(min((i+1)*10000,dim(g)[1]))]<- apply( #Here is my kernel...
    a_matrix,1,FUN=function(X)
    {sum(exp(-X^2/(2*(h^2))))/(2*pi*nx)}
   )
rm(a_matrix)
}

print(Sys.time())
#Un-transpose the final data.
z<-t(matrix(z,n,n))
dim(z)<-c(n^2,1)
z<-as.vector(z)
return(z)
}

The key point here is that any kernel can be used in that inner loop; the downside is that this is evaluated at grid points, so a high-res grid is needed to run this; FFT would be great, but I didn't attempt it.

Grid Function:

grid<- function(n,lims) {
num <- rep(n, length.out = 2L)
gx <- seq.int(lims[1L], lims[2L], length.out = num[1L])
gy <- seq.int(lims[3L], lims[4L], length.out = num[2L])

v1=rep(gy,length(gx))
v2=rep(gx,length(gy))
v1<-matrix(v1, nrow=length(gy), ncol=length(gx))
v2<-t(matrix(v2, nrow=length(gx), ncol=length(gy)))
grid_out<-c(unlist(v1),unlist(v2))

grid_out<-aperm(array(grid_out,dim=c(n,n,2)),c(3,2,1) ) #reshape
grid_out<-unlist(as.list(grid_out))
dim(grid_out)<-c(2,n^2)
grid_out<-t(grid_out)
return(grid_out)
}

You can plot the values using image.plot, with the v1 and v2 matrices for your x,y points:

kde2d_mod_plot<-function(kde2d_mod_output,n,lims) ){
 num <- rep(n, length.out = 2L)
 gx <- seq.int(lims[1L], lims[2L], length.out = num[1L])
 gy <- seq.int(lims[3L], lims[4L], length.out = num[2L])

 v1=rep(gy,length(gx))
 v2=rep(gx,length(gy))
 v1<-matrix(v1, nrow=length(gy), ncol=length(gx))
 v2<-t(matrix(v2, nrow=length(gx), ncol=length(gy)))

 image.plot(v1,v2,matrix(kde2d_mod_output,n,n))
 map('world', fill = FALSE,add=TRUE)
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top