Question

when you do the following clustering in R:

> d <- dist(as.matrix(mtcars))
> hc <- hclust(d)   
> plot(hc)

you get a tree with nodes that have unequal branch lengths. in ordinary hierarchical clustering (upgma) all the lengths have to be equal. can someone please explain the default behavior of hclust, and how it yields non-equal lengths? thanks.

Was it helpful?

Solution

I think too, that this is an odd behaviour. But it is not caused by hclust but plot.hclust instead. If you look at the help (? plot.hclust), you'll find the hang parameter which is set to 0.1 on default:

The fraction of the plot height by which labels should hang below the rest of the plot. A negative value will cause the labels to hang down from 0.

Accordingly, the behaviour known from upgma can be achieved with

plot( hc, hang = -1 )

I'd guess that the two different behaviours are the result of different definitions which height should be assigned to a cluster which contains only one observation, i.e. is not merged with another. Obviously the definition here is: They don't have a height at all. Formally, it would be correct to plot them with hang = 0. But since this looks ugly, I guess that hang = 0.1 is set as default.

In any case, you'll get branches with unequal lengths with hang >= 0.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top