Question

I am using the following to perform a kmeans analysis:

km = kmeans(mat2, centers = 4)

I have also plotted the kmeans analysis using library(fpc) to get a visual as follows:

plotcluster(mat2, km$cluster)

Here is the result:enter image description here

Each row of mat2 corresponds to a point in the plot. I gave each row in the matrix a name with the following:

rownames(mat2) = names      #names is a vector corresponding to the rows of mat2

I can find the membership of each row in the matrix by the following attribute:

km$cluster

This gives the name of each row in the matrix, followed by the corresponding integer in the plot. However, I would like to access more data.

How do I access more data from these cluster points? For example I would like to find the correspondence between integers in the plot and rows in the matrix. To clarify, answering this question would allow me to know which row in the matrix corresponds to the highest 2 in the plot? Once I know which integers correspond to which rows in the matrix, then I have the names of each row in the matrix and can give a meaningful interpretation.

I would also like to find the distance measure between a point in the plot and the center of the cluster to which it belongs. Can I get a correspondence between (x, y) coordinates in the plot to the rows in the matrix? Can I get an interactive GUI so that when I click on a cluster point in the plot, I can see more some of the data described above? I am open to using a different library for plotting. Summarizing into two questions:

  1. How can I get the correspondence between the integers in the plot and the rows in the matrix?
  2. Is there an existing package or tool that would make this a lot easier for me?

All help is greatly appreciated!

Was it helpful?

Solution

This is answering some of your question, but there is a lot in there. If you want to interact with your plot to identify points, you can look at ?identify. Here's an answer to working with the specific rows you're after. If you want to ask about interactive GUIs perhaps post a specific question regarding that.

mat <- matrix(rnorm(160), ncol=2)
km <- kmeans(mat, centers=4)
df <- as.data.frame(cbind(mat, km$cluster))
names(df) <- c("Var1", "Var2", "cluster")

#Get the row of df with highest Var1 and cluster == 2
which(df$Var1 == max(df$Var1[df$cluster==2]))
# 76

#Use this to extract the row
df[which(df$Var1 == max(df$Var1[df$cluster==2])),]

#You can subset you data based on one of the variables
#Get the rows with cluster == 2
df.2 <- df[df$cluster == 2,]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top