Вопрос

Hi I am working on the following dataframe(nows = 62208):

> head(workfile) V1 V5 V7 V8 V9 4309 2014-03-01 13:30:00 1582.899 D.1Elec-0001 D.1 Elec-0001 6801 2014-03-01 13:45:00 1582.900 D.1Elec-0001 D.1 Elec-0001 6805 2014-03-01 14:00:00 1582.919 D.1Elec-0001 D.1 Elec-0001 5710 2014-03-01 14:15:00 1582.939 D.1Elec-0001 D.1 Elec-0001 5714 2014-03-01 14:30:00 1582.944 D.1Elec-0001 D.1 Elec-0001 6814 2014-03-01 14:45:00 1582.945 D.1Elec-0001 D.1 Elec-0001

I would like to compute the differences between each element in column (V5) and its previous one inserted in the same column (V5) but in a previous row. In column V7 I have 72 different levels (in my case 72 different rooms).

If I use this code:
pippo<-ddply(workfile, .(V7), transform, diff = c(tail(V5,-1)-head(V5,-1)), NA)
it occurs the following error message:
Error in data.frame(list(V1 = c(1393680600, 1393681500, 1393682400, 1393683300,: arguments imply differing number of rows: 864, 863, 1

If I use this code:
pippo<-ddply(workfile, .(V7), transform, diff = c(tail(workfile$V5,-1)-head(workfile$V5,-1)), NA)
it occours this other error message:
Error in data.frame(list(V1 = c(1393680600, 1393681500, 1393682400, 1393683300,: arguments imply differing number of rows: 864, 62207, 1

I cannot dput my dataframe because it is very big.

Any suggestion, please?

Это было полезно?

Решение

If what you want is just the simple differences, this should work fine (you can substitute 0 for NA if you want):

pippo <- ddply(df, .(V7), transform, diff = c(0,diff(V5)))

You should also check dplyr, it should be faster with big data.frames:

library(dplyr)
pippo<- df%.%group_by(V7)%.%mutate(diff=c(NA, diff(V5)))

Другие советы

This might be an easy solution:

workfile$diff <- c(NA,diff(workfile$V5))
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top