Question

I have a question about hierarchical grouping of time-series in R. I currently have this matrix:

           A      B     C      F     G      H      I
[1,] -33.697  8.610 42.31 17.465 24.84 14.210 10.632
[2,]  -4.698 15.993 20.69  6.222 14.47  3.423 11.047
[3,] -37.458  9.687 47.14 14.659 32.49 12.759 19.726
[4,] -23.851 16.517 40.37 14.392 25.98  9.438 16.538
[5,]   3.329 15.629 12.30  3.449  8.85  2.635  6.215
[6,] -38.071  5.746 43.82 15.932 27.89 14.113 13.772

Just by inspection, I can figure out that:

  • G = H + I
  • C = F + G
  • A = B - C

Is there a way that I can find these sum relationships (positive and negative) automatically on large time-series in R? I have tried using an lm() to figure out the relationships but that is too time consuming to do on every series. Not to mention many times there are collinearity problems.

Many Thanks!

structure(list(A = c(-33.6970557915047, -4.69841752527282, -37.457728596637, 
-23.8508993089199, 3.32904924079776, -38.0712462896481), B = c(8.60984595282935, 
15.9929901333526, 9.68719404516742, 16.5167794595473, 15.6285679822322, 
5.74573907931335), C = c(42.306901744334, 20.6914076586254, 47.1449226418044, 
40.3676787684672, 12.2995187414344, 43.8169853689615), F = c(17.4649945173878, 
6.22195235290565, 14.6593122615013, 14.3921482057776, 3.44929573708214, 
15.9315551938489), G = c(24.8419072269462, 14.4694553057197, 
32.4856103803031, 25.9755305626895, 8.8502230043523, 27.8854301751126
), H = c(14.2098777298816, 3.42268325854093, 12.7592747195158, 
9.43778987810947, 2.63517117220908, 14.1129822209477), I = c(10.6320294970647, 
11.0467720471788, 19.7263356607873, 16.5377406845801, 6.21505183214322, 
13.7724479541648)), .Names = c("A", "B", "C", "F", "G", "H", 
"I"), row.names = c(NA, -6L), class = "data.frame")
Was it helpful?

Solution

This also uses regression but it

  • uses lm.fit which is faster than lm. (There also exists fastLm in rcppArmadillo and rcppEigen that you could try as well.)

  • avoids duplicating regressions by using only unique combinations.

  • assumes that only triples need to be investigated cutting down the amount of computation (since that seems the case in the post)

  • assumes all coefficients are integer to clean up the output

The code is:

eps <- .1
combos <- combn(ncol(DF), 3)
for(j in 1:ncol(combos)) {
    ix <- combos[, j]
    fit <- lm.fit(as.matrix(DF[ix[-1]]), DF[[ix[1]]])
    SSE <- sum(resid(fit)^2)
    if (SSE < eps) {
        ecoef <- round(c(-1, coef(fit)))
        names(ecoef)[1] <- names(DF)[ix[1]]
        print(ecoef)
    }
}

which gives this with the data in the post:

 A  B  C 
-1  1 -1 
 C  F  G 
-1  1  1 
 G  H  I 
-1  1  1 

OTHER TIPS

You can try a hierarchical clustering method. This will not give you the exact relationships and the coefficients but can give you an idea of the relationships you should test for. First we prepare your data.

a<-rbind(c(-33.697,8.610,42.31, 17.465, 24.84, 14.210, 10.632), 
  c(-4.698,15.993,20.69,6.222, 14.47,3.423, 11.047),
  c(-37.458,9.687, 47.14, 14.659, 32.49, 12.759, 19.726),
  c(-23.851,16.517,40.37,14.392,25.98,9.438,16.538),
  c(3.329,15.629,12.30,3.449,8.85,2.635,6.215),
  c(-38.071,5.746,43.82,15.932,27.89,14.113,13.772))
colnames(a)<-c("A", "B", "C", "F", "G", "H", "I")

Then we calculate the correlation between your variables and create distances which we then cluster.

dd <- as.dist((1 - cor(a))/2)
plot(hclust(dd))

That should give you an idea of the relationship between the different time series. A plot of the result is shown below.

The plot of the cluster dendrogram

You can find linear dependence relations with MASS::Null. They are equivalent to, but not as sparse as those you found by visual inspection.

library(MASS)
Null(t(d)) # One relation per column
#             [,1]        [,2]        [,3]
# [1,]  0.41403998 -0.04178588  0.45582586
# [2,] -0.41403998  0.04178588 -0.45582586
# [3,] -0.02626794 -0.52439443  0.49812649
# [4,]  0.44030792  0.48260856 -0.04230063
# [5,]  0.62687195 -0.01159430 -0.36153375
# [6,] -0.18656403  0.49420285  0.31923312
# [7,] -0.18656403  0.49420285  0.31923312
as.matrix(d) %*% Null(t(d))  # zero
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top