You could use the stringdist
package to calculate the distance matrix:
str <- c("patio furniture",
"living room furniture",
"used chairs",
"new chairs")
library(stringdist)
d <- stringdistmatrix(str, str)
stringdist
supports a number of distance functions. The default is the 'restricted Damerau-Levenshtein distance'. You can then use this distance matrix in hclust
to perform hierarchical clustering:
cl <- hclust(as.dist(d))
plot(cl)
hclust
has a number of different methods. See ?hclust
. To create a fixed number of groups (here 2):
cutree(cl, 2)
But, this is probably one of many possible solutions.