Question

I have a list of leaves in a tree and the height at which I'd like them to merge, i.e. the height of their most recent common ancestor. All leaves are assumed to be at height 0. A toy example might look like:

as.data.frame(rbind(c("a","b",1),c("c","d",2),c("a","d",4)))
   V1 V2 V3
1  a  b  1
2  c  d  2
3  a  d  4

I want to plot a tree representing this data. I know that R can plot trees coming from hclust. How do I get my data into the format returned by hclust or into some other format that is easily plotted?

Edited to add diagram:

The tree for the above dataset looks like this:

   __|___
  |      |
  |     _|_
 _|_   |   |
|   |  |   |  
a   b  c   d   
Was it helpful?

Solution

What you have is a hierarchical clustering already specified (in your own data format convention), and you would like to use R's plotting facilities. This seems to be not easy. The only way I can see now to achieve this is to create an object such as that returned by hclust. It has attributes "merge", "height", "order", "labels", "method", "call", "dist.method" which are all fairly easy to understand. Someone already tried this: https://stat.ethz.ch/pipermail/r-help/2006-February/089170.html but apparently still had issues. What you could also try to do is to fill in a distance matrix with dummy values that are consistent with your clustering, then submit this to hclust. E.g.

a <- matrix(ncol=4,nrow=4, c(0,1,4,4,1,0,4,4,4,4,0,2,4,4,2,0))
b <- hclust(as.dist(a), method="single")
plot(b, hang=-1)

This could perhaps be useful.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top