Euclidean distance in R using two variables in a matrix

https://stackoverflow.com/questions/16147692

11-04-2022
|

Frage

I am quite new to R and I am trying to compute the gross distance (or the sum of the Euclidean distance on all data points) from two variables in my matrix and net distance (Euclidean distance between the first and last point of my data. So just a background on my data. My data is normally a csv file comprising of 5 variables: tracks of cells (called A), time interval, X and Y position of each cell, V=velocity. There is around 90 tracks per data and each track should be treated independent of each other.

dput(head(t1))
structure(list(A = c(0L, 0L, 0L, 0L, 0L, 0L), T = 0:5, X = c(668L, 
668L, 668L, 668L, 668L, 668L), Y = c(259L, 259L, 259L, 259L, 
259L, 259L), V = c(NA, 0, 0, 0, 0, 0)), .Names = c("A", "T", 
"X", "Y", "V"), row.names = c(NA, 6L), class = "data.frame")

I was not aware of the dist() function before, so I made my own function:

GD.data <- function (trackdata)
{A= trackdata(, 1); V=trackdata(, 5);
 for (i in min(A):max(A))
   while (A<=i) {GD(i) = (sum (V)*(1/25))
                 return (GD(i))}

This did not work. I used A as an identifier of the track and since gross distance could be also computed as: distance=velocity (t1-t0), I just did summation of all velocity times my time interval (since it is constantly 1/25 secs.

How do I use the dist() function with my A as identifier? I need this since the computation of each track should be separate. Thanks!

Lösung

Since you have velocity measured at constant time intervals, which you can sum over to get the total euclidean distance moved, you can actually just use the base R function aggregate to sum the V data by each track identifier A, which is what the command below does:

aggregate( V ~ A , data = t1 , sum , na.rm = TRUE )

Basically this says, aggregate V for each value of A. The aggregation function is sum (you can imagine this could easily be the mean velocity for each track by using mean instead of sum). We pass an additional argument to sum which is na.rm, telling it to ignore NAs in the data (which I assume are at t = 0 for each track).

Calculating 'as the crow flies' distance between first and last position by track:

For this we can split the dataframe into sub-dataframes by the track identifier A and then operate on each subset of the data, using lapply to apply a simple hypotenuse calculation to the first and last row of each sub-dataframe.

## Split the data
dfs <- split(t1,t1$A)

## Find hypotenuse between first and last rows for each A
lapply( dfs , function(x){
  j <- nrow(x)
  str <- x[1,c("X","Y")]
  end <- x[j,c("X","Y")]
  dist <- sqrt( sum( (end - str)^2 ) )
  return( dist )
} )

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow