Pergunta

I have a table stored in a dataframe in R.

I want to calculate the first derivative along each column. Columns are measured variables, rows are time.

Can I vectorize this function ?

df$C <- df$A + df$B

In principle I'd like something like :

df$DiffA <- diff(df$A)

The problem is, that I don't know how to vectorize functions that need A(n) and A(n+1), where n is the row within the dataframe (Pseudocode).

Foi útil?

Solução

Based on the comments:

df <- data.frame(n=1:100) 
df$sqrt <- sqrt(df$n)
df$diff <- c(NA,diff(df$sqrt,lag=1))

diff returns one value less then there are values in the input vector (for obvious reasons). You can fix that by prepending or appending an NA value.

Some timings:

#create a big data.frame
vec <- 1:1e6
df <- data.frame(a=vec,b=vec,c=vec,d=vec,e=vec,sqroot=sqrt(vec))

#for big datasets data.table is usually more efficient:
library(data.table)
dt <- data.table(df)

#benchmarks
library(microbenchmark)

microbenchmark(df$diff <- c(NA,diff(df$sqroot,lag=1)),
               dt[,diff:=c(NA,diff(sqroot,lag=1))])
Unit: milliseconds
                                            expr      min        lq    median        uq      max
1     df$diff <- c(NA, diff(df$sqroot, lag = 1)) 75.42700 116.62366 140.98300 151.11432 174.5697
2 dt[, `:=`(diff, c(NA, diff(sqroot, lag = 1)))] 37.39592  45.91857  52.21005  62.89996 119.7345

diff is fast, but for big datasets using a data.frame is not efficient. Use data.table instead. The speed gain gets more pronounced, the bigger the dataset is.

Outras dicas

You might try the lag() or diff() functions. They would seem to do what you want.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top