Question

The user defined function (dist.func) runs and provides the correct output when I use it on a single line of data but does not provide the correct output (still executes) when I embed it into the apply() command. In this case, I want to calculate by rows.

Apologies for the complex sample data included, but values have to fall within a threshold to return meaningful outputs, and this is the easiest way to ensure that happens.

library(fields)

The function is essentially measuring between XY coordinates (Euclidian distances with rdist() command), but it first takes a subset of the data keeping only those rows of the 'TO' data that fall within a given similarity (Euclidian distance between the first and second principal components, PC1 and PC2).

This makes the sample data:

# This data is the reference points to measure FROM
FROM <- data.frame(X=c(-4187500,-4183500,-4155500,-4179500,-2883500),
               Y=c(10092500,10084500,10020500,10012500,9232500),
               PC1=c(-0.525,-0.506,-1.146,-0.733,-1.160),
               PC2=c(3.606,3.609,4.114,3.681,0.882))

# This data is the destination points to measure TO
TO <- data.frame(X=c(-4207500,-4183500,-4203500,-4187500,-2827500,-4203500,-4199500,-4183500,-4195500,-4191500),
             Y=c(10100500,10100500,10096500,10092500,10092500,10088500,10084500,10084500,10072500,10064500),
             PC1=c(-0.371,0.447,-0.344,-0.026,-0.652,-0.460,-0.313,0.010,-0.293,-0.319 ),
             PC2=c(3.149,4.619,3.318,3.885,0.407,3.164,3.300,3.892,3.226,3.337))

# This is the threshold of the data similarity match (distance between PC1 and PC2 in both data sets)
threshold <- 0.5

Here is my user-defined function (with each line explained):

dist.func <- function(REF){
  # Calculate the similarity (PC1 and PC2 distance) to all points in the destination
  # Select only those under the threshold
  bt <- as.matrix(TO[(rdist(REF[3:4],TO[3:4])[1,]<threshold)==T,c("X","Y")])
  # Calculate the number of points under the threshold (the "sample size")
  # If there are no points uder the threshold, the SS is set to zero (otherwise 'NA' kills the loop)
  ss <- ifelse(nrow(bt)>=50, 50 ,nrow(bt))
  # If/else to deal with SS=0
  if (nrow(bt)>0) {
    # Calculate the euclidian distance between the reference point and all points under the threshold
    # This calculates the distances, sorts them in ascending order, and trims to the sample size
    dst <- rdist(REF[1:2],bt)[1,][order(rdist(REF[1:2],bt)[1,])][1:ss]
  } else {
  dst <- c(NA)
  }
# Report (in a list or table or whatever) the summary stats for the distances 
list(
  p05=ifelse(nrow(bt)==0, NA, quantile(dst,0.05)),
  MIN=ifelse(nrow(bt)==0, NA, min(dst)),
  AVG=ifelse(nrow(bt)==0, NA, mean(dst)),
  N=ifelse(nrow(bt)==0, 0, nrow(bt)))
}

And here's the test with a single line of the FROM data (working) and embedded into the apply() command (not returning correct values):

# Using the function on a single line of data returns correct values for the given line
dist.func(FROM[1,])

# Embedding the function into apply() returns incorrect outputs
# I'm committed to using apply() here (or some variant) to avoid a for() loop by rows
apply(FROM, 1, dist.func)

I'm fairly new with user-defined functions, so any suggestions there would be appreciated, if that's where the problem lies. Also, if there's a way to make the function or the code in general more efficient (a package I'm not familiar with), that would be most welcome also.

Was it helpful?

Solution 2

lapply gives the correct output

  my.list<-as.list(1:nrow(FROM))

k- lapply(my.list,function(i)dist.func(FROM[i,])
kk<-do.call(rbind,k) # convert to data.frame

sapply(my.list,function(i)dist.func(FROM[i,]))
    [,1]     [,2]     [,3] [,4] [,5]
p05 14939.76 16242.64 NA   NA   NA  
MIN 14422.21 16000    NA   NA   NA  
AVG 19795.44 21179.25 NA   NA   NA  
N   6        6        0    0    0  

OTHER TIPS

The problem is that apply converts FROM to a matrix. Compare:

> dist.func(FROM[1,])
$p05
[1] 14939.76
$MIN
[1] 14422.21
$AVG
[1] 19795.44
$N
[1] 6

> dist.func(as.matrix(FROM)[1,])
$p05
[1] 1400
$MIN
[1] 1e-10
$AVG
[1] 179500
$N
[1] 8

> apply(FROM, 1, dist.func)[[1]]
$p05
[1] 1400
$MIN
[1] 1e-10
$AVG
[1] 179500
$N
[1] 8
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top