Pergunta

In R, I use cov2cor() to calculate a correlation matrix like:

  A,B,C,...
A 1,0.5,0.2,...
B 0.5,1,0.4,...
C 0.2,0.4,1,...
...

How can I reshape the matrix so that the columns are stacked in rows like:

X,Y,Correlation
A,B,0.5,
A,C,0.2,
...
B,C,0.4,
...

Remind that A,As are excluded, and A,B B,A are treated as duplicates so that one are excluded.

Is there an easy way to implement this?

Foi útil?

Solução

The functions that you need are:

lower.tri {base} : This will allow you to take the correlation matrix and set the upper/lower triangle to NAs as well as exclude the diagonal. This will take care of the duplicate corr values i.e.,only one of these will be retained. cor(A,C)=cor(C,A)

melt{reshape2}: This will take the lower/upper triangle and melt it into a table with only three columns. The 3rd column will have the correlation between variable in col1 & col2.

is.na{Matrix}: Use this to remove rows where the 3rd column is NA.

Update: @KunRen has suggesed na.omit{base}as a better alternative to is.na which I agree with.

A sample solution would be like the following:

system.time(correlations<-cor(mydata,use="pairwise.complete.obs"))#get correlation matrix
upperTriangle<-upper.tri(correlations, diag=F) #turn into a upper triangle
correlations.upperTriangle<-correlations #take a copy of the original cor-mat
correlations.upperTriangle[!upperTriangle]<-NA#set everything not in upper triangle o NA
correlations_melted<-na.omit(melt(correlations.upperTriangle, value.name ="correlationCoef")) #use melt to reshape the matrix into triplets, na.omit to get rid of the NA rows
colnames(correlations_melted)<-c("X1", "X2", "correlation")
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top