문제

I have a table stored in a dataframe in R.

I want to calculate the first derivative along each column. Columns are measured variables, rows are time.

Can I vectorize this function ?

df$C <- df$A + df$B

In principle I'd like something like :

df$DiffA <- diff(df$A)

The problem is, that I don't know how to vectorize functions that need A(n) and A(n+1), where n is the row within the dataframe (Pseudocode).

도움이 되었습니까?

해결책

Based on the comments:

df <- data.frame(n=1:100) 
df$sqrt <- sqrt(df$n)
df$diff <- c(NA,diff(df$sqrt,lag=1))

diff returns one value less then there are values in the input vector (for obvious reasons). You can fix that by prepending or appending an NA value.

Some timings:

#create a big data.frame
vec <- 1:1e6
df <- data.frame(a=vec,b=vec,c=vec,d=vec,e=vec,sqroot=sqrt(vec))

#for big datasets data.table is usually more efficient:
library(data.table)
dt <- data.table(df)

#benchmarks
library(microbenchmark)

microbenchmark(df$diff <- c(NA,diff(df$sqroot,lag=1)),
               dt[,diff:=c(NA,diff(sqroot,lag=1))])
Unit: milliseconds
                                            expr      min        lq    median        uq      max
1     df$diff <- c(NA, diff(df$sqroot, lag = 1)) 75.42700 116.62366 140.98300 151.11432 174.5697
2 dt[, `:=`(diff, c(NA, diff(sqroot, lag = 1)))] 37.39592  45.91857  52.21005  62.89996 119.7345

diff is fast, but for big datasets using a data.frame is not efficient. Use data.table instead. The speed gain gets more pronounced, the bigger the dataset is.

다른 팁

You might try the lag() or diff() functions. They would seem to do what you want.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top