Question

I developed a new document similarity measure (A method which calculate the amount of similarity/distance between two documents). I'm going to know how well this measure?

Clustering is an application which is based on a distance/similarity measure. So, I decided to evaluate the effectiveness of the proposed measure in different data clustering algorithms.

I read about different clustering algorithms in R. Suppose I have a document collection D which contains n documents, organized in k clusters. I want to evaluate the application of my similarity/distance measure in a variety of clustering algorithms (partitional, hierarchical and topic-based). The problem is that all examples and tutorials start from a "data" matrix, but I have a distance/similarity matrix.

Would you please help me with some hints in R?

Was it helpful?

Solution

hclust() requires a dissimilarity structure that is a dist object. If you start with a numeric matrix, m, you can create a dist object like so:

d <- as.dist(m)

You can then perform hierarchical clustering using hclust() like so:

hclust(d)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top