Question

I am trying to permute (column-wise only) my data matrix a 1000 times and then do hierarchical clustering in "R" so I have the final tree on my data after 1000 randomizations. This is where I am lost. I have this loop

    for(i in 1:1000) 
    { 
    permuted <- test2_matrix[,sample(ncol(test2_matrix), 12, replace=TRUE)]; (this permutes my columns)
    d = dist(permuted, method = "euclidean", diag = FALSE, upper = FALSE, p = 2);
    clust = hclust(d, method = "complete", members=NULL);
    } 
    png (filename="cluster_dendrogram_bootstrap.png", width=1024, height=1024, pointsize=10) 
    plot(clust)

I am not sure if the final tree is a product after the 1000 randomizations or just the last tree that it calculated in the loop. Also If I want to display the bootstrap values on the tree how should I go about it?

Many thanks!!

Was it helpful?

Solution

The value of clust in your example is indeed the final tree calculated in the loop. Here's a way of making and saving 1000 permutations of your matrix

make.permuted.clust <- function(i){ # this argument is not used
  permuted <- data.matrix[,sample(ncol(data.matrix), 12, replace=TRUE)]
  d <- dist(permuted, method = "euclidean", diag = FALSE, upper = FALSE, p = 2)
  clust <- hclust(d, method = "complete", members=NULL)
  clust # return value
}

all.clust <- lapply(1:1000, make.permuted.clust) # 1000 hclust trees

The second part of your question should be answered here.

OTHER TIPS

You may be interested in the RandomForest method implemented in the randomForest package, which implements both bootstrapping of the data and of the splitting variables and allows you to save trees and get a consensus tree.

library(randomForest)

The original random forest (in FORTRAN 77) developers site

The package manual

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top