R Speed up vectorize for square matrix

https://stackoverflow.com/questions/21547421

06-10-2022
|

Question

Anyone able to help me speed up some code:

n = seq_len(ncol(mat)) # seq 1 to ncol(mat)
sym.pr<-outer(n,n,Vectorize(function(a,b) {
    return(adf.test(LinReg(mat[,c(a,b)]),k=0,alternative="stationary")$p.value)
}))

Where mat is an NxM matrix of N observation and M objects, e.g:

    Obj1 Obj2 Obj3
1      .    .    .
2      .    .    .    
3      .    .    .

LinReg is defined as:

# Performs linear regression via OLS
LinReg=function(vals) {  
  # regression analysis
  # force intercept c at y=0
  regline<-lm(vals[,1]~as.matrix(vals[,2:ncol(vals)])+0)

  # return spread (residuals)
  return(as.matrix(regline$residuals))
}

Basically I am performing a regression analysis (OLS) on every combination of Objects (i.e. Obj1, Obj2 and Obj2,Obj3 and Obj1, Obj3) in mat, then using the adf.test function from the tseries package and storing the p-value. The end result sym.pr is a symmetric matrix of all p-values (but actually it's not 100% symmetric, see here for more info), nevertheless it will suffice.

With the above code, on a 600x300 matrix (600 observations and 300 objects), it takes about 15 minutes..

I thought of maybe only calculating the upper triangle of the symmetric matrix, but not sure how to go about doing it.

Any ideas?

Thanks.

Solution

Starting with some dummy data

mdf <- data.frame( x1 = rnorm(5), x2 = rnorm(5), x3 = rnorm(5) )

I would firstly determine the combinations of interest. So if I understood you right the result of your calculation should be the same for mdf[c(i,j)] and mdf[c(j,i)]. in this case you could use the combn function, to determine the relevant pairs.

pairs <- as.data.frame( t( combn( colnames( mdf  ),2 ) ) )
pairs
  V1 V2
1 x1 x2
2 x1 x3
3 x2 x3

Now you can just apply your function row-wise over the pairs (using a t.test here for simplicity):

pairs[["p.value"]] <- apply( pairs, 1, function( i ){
  t.test( mdf[i] )[["p.value"]]
})
pairs
  V1 V2   p.value
1 x1 x2 0.5943814
2 x1 x3 0.7833293
3 x2 x3 0.6760846

If you still need your p.values back in (upper triangular) matrix form you can cast them:

library(reshape2)
acast( pairs, V1 ~ V2 )
          x2        x3
x1 0.5943814 0.7833293
x2        NA 0.6760846

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow