Autocorrelation on each level of a factor

https://stackoverflow.com/questions/15141845

16-03-2022
|

Question

I want to autocorrelate data of different levels, producing plots for each level. However, I can't seem to find a way to split the dataframe over the separate levels of ID:

##My data is:
*data.1*
 ID   y.var.1
1   1  2.284620
2   1  2.820829
3   1  3.889701
4   1  5.180010
5   1  6.080572
6   2  6.972568
7   2  8.082126
8   2  9.075686
9   2  9.864694
10  2 10.942456
11  3 11.853353
12  3 13.112986
13  3 13.893405
14  3 15.037400
15  3 16.015836

## I use dlply (from the plyr package) to split the dataframe by the level ID
data_ID<-dlply(data.1, .(ID), function(X) acf(y.var.1, na.action = na.pass))
head(data_ID)

##and although this produces three groups, they all have the same values which are the same as when I do autocorrelation on the entire dataframe..
> head(data_ID)
$`1`
Autocorrelations of series ‘y.var.1’, by lag
 0      1      2      3      4      5      6      7      8      9     10     11 
 1.000  0.804  0.600  0.409  0.230  0.071 -0.075 -0.194 -0.293 -0.370 -0.409 -0.418 
$`2`
Autocorrelations of series ‘y.var.1’, by lag
 0      1      2      3      4      5      6      7      8      9     10     11 
 1.000  0.804  0.600  0.409  0.230  0.071 -0.075 -0.194 -0.293 -0.370 -0.409 -0.418 
$`3`
Autocorrelations of series ‘y.var.1’, by lag
 0      1      2      3      4      5      6      7      8      9     10     11 
 1.000  0.804  0.600  0.409  0.230  0.071 -0.075 -0.194 -0.293 -0.370 -0.409 -0.418 


> dput(data.1)
structure(list(ID = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 
3, 3), y.var.1 = c(2.28462022795481, 2.82082936729163, 3.88970139114628, 
5.18001014821836, 6.08057215599522, 6.97256785426474, 8.08212595903149, 
9.07568620628701, 9.8646935842879, 10.9424555128125, 11.8533529745958, 
13.1129856348251, 13.8934049954063, 15.0374003752388, 16.0158355330431
)), .Names = c("ID", "y.var.1"), row.names = c(NA, -15L), class = "data.frame")

Anybody has any ideas on how to tackle this problem that would be great!

Solution

You got this strange behavior because variable y.var.1 is defined in your session (maybe you used function attach() or just defined it as separate vector). If you just use y.var.1 in function acf() then this variable from session is used. You should add X$ inside acf() to use y.var.1 that is defined as one column of data frame data.1.

 dlply(data.1, .(ID), function(X) acf(X$y.var.1, na.action = na.pass))
$`1`
Autocorrelations of series ‘X$y.var.1’, by lag   
     0      1      2      3      4 
 1.000  0.446 -0.142 -0.447 -0.357     
$`2`    
Autocorrelations of series ‘X$y.var.1’, by lag    
     0      1      2      3      4 
 1.000  0.373 -0.084 -0.373 -0.416     
$`3`    
Autocorrelations of series ‘X$y.var.1’, by lag   
     0      1      2      3      4 
 1.000  0.377 -0.086 -0.381 -0.411

OTHER TIPS

May use by or tapply function:

R > a <- by(dat$y.var.1, dat$ID, function(x) acf(x)$acf)
R > a
dat$ID: 1
, , 1

        [,1]
[1,]  1.0000
[2,]  0.4457
[3,] -0.1424
[4,] -0.4467
[5,] -0.3566

------------------------------------------------------------ 
dat$ID: 2
, , 1

         [,1]
[1,]  1.00000
[2,]  0.37311
[3,] -0.08434
[4,] -0.37320
[5,] -0.41557

------------------------------------------------------------ 
dat$ID: 3
, , 1

         [,1]
[1,]  1.00000
[2,]  0.37742
[3,] -0.08618
[4,] -0.38068
[5,] -0.41057

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow