Question

I want to autocorrelate data of different levels, producing plots for each level. However, I can't seem to find a way to split the dataframe over the separate levels of ID:

##My data is:
*data.1*
 ID   y.var.1
1   1  2.284620
2   1  2.820829
3   1  3.889701
4   1  5.180010
5   1  6.080572
6   2  6.972568
7   2  8.082126
8   2  9.075686
9   2  9.864694
10  2 10.942456
11  3 11.853353
12  3 13.112986
13  3 13.893405
14  3 15.037400
15  3 16.015836

## I use dlply (from the plyr package) to split the dataframe by the level ID
data_ID<-dlply(data.1, .(ID), function(X) acf(y.var.1, na.action = na.pass))
head(data_ID)

##and although this produces three groups, they all have the same values which are the same as when I do autocorrelation on the entire dataframe..
> head(data_ID)
$`1`
Autocorrelations of series ‘y.var.1’, by lag
 0      1      2      3      4      5      6      7      8      9     10     11 
 1.000  0.804  0.600  0.409  0.230  0.071 -0.075 -0.194 -0.293 -0.370 -0.409 -0.418 
$`2`
Autocorrelations of series ‘y.var.1’, by lag
 0      1      2      3      4      5      6      7      8      9     10     11 
 1.000  0.804  0.600  0.409  0.230  0.071 -0.075 -0.194 -0.293 -0.370 -0.409 -0.418 
$`3`
Autocorrelations of series ‘y.var.1’, by lag
 0      1      2      3      4      5      6      7      8      9     10     11 
 1.000  0.804  0.600  0.409  0.230  0.071 -0.075 -0.194 -0.293 -0.370 -0.409 -0.418 


> dput(data.1)
structure(list(ID = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 
3, 3), y.var.1 = c(2.28462022795481, 2.82082936729163, 3.88970139114628, 
5.18001014821836, 6.08057215599522, 6.97256785426474, 8.08212595903149, 
9.07568620628701, 9.8646935842879, 10.9424555128125, 11.8533529745958, 
13.1129856348251, 13.8934049954063, 15.0374003752388, 16.0158355330431
)), .Names = c("ID", "y.var.1"), row.names = c(NA, -15L), class = "data.frame")

Anybody has any ideas on how to tackle this problem that would be great!

Was it helpful?

Solution

You got this strange behavior because variable y.var.1 is defined in your session (maybe you used function attach() or just defined it as separate vector). If you just use y.var.1 in function acf() then this variable from session is used. You should add X$ inside acf() to use y.var.1 that is defined as one column of data frame data.1.

 dlply(data.1, .(ID), function(X) acf(X$y.var.1, na.action = na.pass))
$`1`
Autocorrelations of series ‘X$y.var.1’, by lag   
     0      1      2      3      4 
 1.000  0.446 -0.142 -0.447 -0.357     
$`2`    
Autocorrelations of series ‘X$y.var.1’, by lag    
     0      1      2      3      4 
 1.000  0.373 -0.084 -0.373 -0.416     
$`3`    
Autocorrelations of series ‘X$y.var.1’, by lag   
     0      1      2      3      4 
 1.000  0.377 -0.086 -0.381 -0.411 

OTHER TIPS

May use by or tapply function:

R > a <- by(dat$y.var.1, dat$ID, function(x) acf(x)$acf)
R > a
dat$ID: 1
, , 1

        [,1]
[1,]  1.0000
[2,]  0.4457
[3,] -0.1424
[4,] -0.4467
[5,] -0.3566

------------------------------------------------------------ 
dat$ID: 2
, , 1

         [,1]
[1,]  1.00000
[2,]  0.37311
[3,] -0.08434
[4,] -0.37320
[5,] -0.41557

------------------------------------------------------------ 
dat$ID: 3
, , 1

         [,1]
[1,]  1.00000
[2,]  0.37742
[3,] -0.08618
[4,] -0.38068
[5,] -0.41057
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top