Question

I have a question, about zooming on a found cluster in my dataset. I want to create as many new matrices as the given number of clusters as it returns. Specifically, I am not sure as to how to go back to the data and take a sub population of interest out. I know I can do:

mycl <- cutree(hr, 2);

But then what?

Here is what I have so far [complete code]:

Say you have a matrix 'm' You cluster, by distances in a correlation matrix by row 'hr' and by columns 'hc'

m = matrix(0, 10, 5, dimnames = list(c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J"), c(1, 2, 3, 4, 5)))
m[1,] = c(0,0,0,0,1)
m[2,] = c(0,0,0,1,1)
m[3,] = c(0,0,1,1,1)
m[4,] = c(0,0,1,1,0)
m[5,] = c(1,0,0,0,0)
m[6,] = c(1,1,1,0,0)
m[7,] = c(0,1,1,0,0)
m[8,] = c(0,1,1,0,0)
m[9,] = c(0,1,1,1,0)
m[10,] = c(1,1,1,0,1)
# Generates row and column dendrograms.
hr <- hclust(as.dist(1-cor(t(m), method="pearson")), method="ward"); 
hc <- hclust(as.dist(1-cor(m, method="spearman")), method="ward")

Now, I can do a heatmap of my data:

library(gplots)
mycl <- cutree(hr, 2); 
mycolhc <- rainbow(length(unique(mycl)), start=0.1, end=0.9); 
mycolhc <- mycolhc[as.vector(mycl)]
myheatcol <- redgreen(75)

# Creates heatmap for entire data set
heatmap.2(
           m, 
           Rowv=as.dendrogram(hr), 
           Colv=as.dendrogram(hc), 
           col=myheatcol, 
           scale="row", 
           density.info="none", 
           trace="none", 
           RowSideColors=mycolhc, 
           cexCol=0.6, 
           labRow=NA
           )

Heatmap of a custom toy matrix with clustering

Was it helpful?

Solution

Two things come to mind:

Solution 1:

# Convert to a dendrogram object
hor.dendro <- as.dendrogram(hr)
# Get values for the first branch
m.1 <- m[unlist(hor.dendro[[1]]),]

Solution 2:

# Cut the tree in 2
tree.cut <- cutree(hr, 2)
# Get the ids for cluster #1
clust.1 <- which(tree.cut==1)
# Get the values from m
m.1 <- m[clust.1,]

In a more generalised manner, you may want to use one of the *apply functions.

For instance:

clusters <- lapply(unique(tree.cut), function(grp)
       {
       m[which(tree.cut==grp),]
       })

This returns (calling cutree with 2 groups)

[[1]]
  1 2 3 4 5
A 0 0 0 0 1
B 0 0 0 1 1
C 0 0 1 1 1
D 0 0 1 1 0
I 0 1 1 1 0

[[2]]
  1 2 3 4 5
E 1 0 0 0 0
F 1 1 1 0 0
G 0 1 1 0 0
H 0 1 1 0 0
J 1 1 1 0 1

You can access the results with the [[ ]] operator, such as: clusters[[2]] to get the second cluster.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top