Error with ddply: arguments imply different number of rows

https://stackoverflow.com/questions/22864500

r
plyr

27-06-2023
|

Вопрос

Hi I am working on the following dataframe(nows = 62208):

> head(workfile) V1 V5 V7 V8 V9 4309 2014-03-01 13:30:00 1582.899 D.1Elec-0001 D.1 Elec-0001 6801 2014-03-01 13:45:00 1582.900 D.1Elec-0001 D.1 Elec-0001 6805 2014-03-01 14:00:00 1582.919 D.1Elec-0001 D.1 Elec-0001 5710 2014-03-01 14:15:00 1582.939 D.1Elec-0001 D.1 Elec-0001 5714 2014-03-01 14:30:00 1582.944 D.1Elec-0001 D.1 Elec-0001 6814 2014-03-01 14:45:00 1582.945 D.1Elec-0001 D.1 Elec-0001

I would like to compute the differences between each element in column (V5) and its previous one inserted in the same column (V5) but in a previous row. In column V7 I have 72 different levels (in my case 72 different rooms).

If I use this code:
pippo<-ddply(workfile, .(V7), transform, diff = c(tail(V5,-1)-head(V5,-1)), NA)
it occurs the following error message:
Error in data.frame(list(V1 = c(1393680600, 1393681500, 1393682400, 1393683300,: arguments imply differing number of rows: 864, 863, 1

If I use this code:
pippo<-ddply(workfile, .(V7), transform, diff = c(tail(workfile$V5,-1)-head(workfile$V5,-1)), NA)
it occours this other error message:
Error in data.frame(list(V1 = c(1393680600, 1393681500, 1393682400, 1393683300,: arguments imply differing number of rows: 864, 62207, 1

I cannot dput my dataframe because it is very big.

Any suggestion, please?

Решение

If what you want is just the simple differences, this should work fine (you can substitute 0 for NA if you want):

pippo <- ddply(df, .(V7), transform, diff = c(0,diff(V5)))

You should also check dplyr, it should be faster with big data.frames:

library(dplyr)
pippo<- df%.%group_by(V7)%.%mutate(diff=c(NA, diff(V5)))

Другие советы

This might be an easy solution:

workfile$diff <- c(NA,diff(workfile$V5))

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow