vectorization of differentiation in R
-
04-07-2021 - |
سؤال
I have a table stored in a dataframe in R.
I want to calculate the first derivative along each column. Columns are measured variables, rows are time.
Can I vectorize this function ?
df$C <- df$A + df$B
In principle I'd like something like :
df$DiffA <- diff(df$A)
The problem is, that I don't know how to vectorize functions that need A(n)
and A(n+1)
, where n is the row within the dataframe (Pseudocode).
المحلول
Based on the comments:
df <- data.frame(n=1:100)
df$sqrt <- sqrt(df$n)
df$diff <- c(NA,diff(df$sqrt,lag=1))
diff
returns one value less then there are values in the input vector (for obvious reasons). You can fix that by prepending or appending an NA
value.
Some timings:
#create a big data.frame
vec <- 1:1e6
df <- data.frame(a=vec,b=vec,c=vec,d=vec,e=vec,sqroot=sqrt(vec))
#for big datasets data.table is usually more efficient:
library(data.table)
dt <- data.table(df)
#benchmarks
library(microbenchmark)
microbenchmark(df$diff <- c(NA,diff(df$sqroot,lag=1)),
dt[,diff:=c(NA,diff(sqroot,lag=1))])
Unit: milliseconds
expr min lq median uq max
1 df$diff <- c(NA, diff(df$sqroot, lag = 1)) 75.42700 116.62366 140.98300 151.11432 174.5697
2 dt[, `:=`(diff, c(NA, diff(sqroot, lag = 1)))] 37.39592 45.91857 52.21005 62.89996 119.7345
diff
is fast, but for big datasets using a data.frame
is not efficient. Use data.table
instead. The speed gain gets more pronounced, the bigger the dataset is.
نصائح أخرى
You might try the lag()
or diff()
functions. They would seem to do what you want.