Question

I have the following custom made NxN distance matrix in numpy/scipy:

dist_matrix =    array([array([5, 4, 2, 3, 2, 3]),
                        array([4, 5, 2, 3, 2, 2]), 
                        array([2, 2, 5, 2, 2, 1]), 
                        array([3, 3, 2, 5, 4, 2]), 
                        array([2, 2, 2, 4, 5, 1]), 
                        array([3, 2, 1, 2, 1, 5])])

how can I use this matrix to do hierarchical clustering and plot dendrograms in R / ggplot2? If I try to feed this distance matrix into R via rpy2 as:

r.hclust(dist_matrix)

I get the error:

   res = super(Function, self).__call__(*new_args, **new_kwargs)
rpy2.rinterface.RRuntimeError: Error in if (is.na(n) || n > 65536L) stop("size cannot be NA nor exceed 65536") : 
  missing value where TRUE/FALSE needed
Was it helpful?

Solution

The R function hclust() is taking "distance" objects:

from rpy2.robjects.packages import importr
stats = importr("stats")
d = stats.as_dist(m)
hc = r.hclust(d)

[note: the error message is also hinting at a possible conversion bug in rpy2. Can you file a bug report ? Thanks]

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top