Question

I am very green in R, so there is probably a very easy solution to this:

I want to calculate the average correlation between the column vectors in a square matrix:

x<-matrix(rnorm(10000),ncol=100)
aux<-matrix(seq(1,10000))
loop<-sapply(aux,function(i,j) cov(x[,i],x[,j])
cor_x<-mean(loop)

When evaluating the sapply line I get the error 'subscript out of bounds'. I know I can do this via a script but is there any way to achieve this in one line of code?

Was it helpful?

Solution 2

The problem is due to aux. The variable auxhas to range from 1 to 100 since you have 100 columns. But your aux is a sequence along the rows of x and hence ranges from 1 to 10000. It will work with the following code:

aux <- seq(1, 100)
loop <- sapply(aux, function(i, j) cov(x[, i], x[, j]))

Afterwards, you can calculate mean covariance with:

cor_x <- mean(loop)

If you want to exclude duplicate fields (e.g., cov(X,Y) is inherently identical to cov(Y,X)), you can use:

cor_x <- mean(loop[upper.tri(loop, diag = TRUE)])

If you also want to exclude cov(X,X), i.e., variance, you can use:

cor_x <- mean(loop[upper.tri(loop)])

OTHER TIPS

No need for any loops. Just use mean(cov(x)), which does this very efficiently.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top