시퀀스 (거리) 기반 클러스터링을위한 이상적인 수의 클러스터 수단 결정

https://stackoverflow.com//questions/22046436

21-12-2019
|

문제

나는 시퀀스 기반 데이터를 클러스터링하기위한 기능을 작성했습니다.

library(TraMineR)
library(cluster)

clustering <- function(data){
  data <- seqdef(data, left = "DEL", gaps = "DEL", right = "DEL")
  couts <- seqsubm(data, method = "CONSTANT")
  data.om <- seqdist(data, method = "OM", indel = 3, sm = couts)
  clusterward <- agnes(data.om, diss = TRUE, method = "ward")
  (clusterward)
}

rc <- clustering(rubinius_sequences)

cluster_cut <- function(data, clusterward, n_clusters, name_clusters){
  data <- seqdef(data, left = "DEL", gaps = "DEL", right = "DEL")
  cluster4 <- cutree(clusterward, k = n_clusters)
  cluster4 <- factor(cluster4, labels = c("Type 1", "Type 2", "Type 3", "Type 4"))
  (data[cluster4==name_clusters,])
}

rc1 <- cluster_cut(project_sequences, rc, 4, "Type 1")

그러나 여기서 클러스터 수를 임의로 할당받습니다.특정 수의 클러스터에 의해 캡처 된 분산 (또는 일부 유사한 조치)의 양이 특정 수의 클러스터에서 리턴을 감소시키는 지점에 도달하기 시작할 수있는 방법이 있습니까?나는 Scree Plot과 비슷한 것을 상상합니다.요인 분석 .

해결책

library(WeightedCluster)  
(agnesRange <- wcKMedRange(rubinius.dist, 2:10))
plot(agnesRange, stat = c("ASW", "HG", "PBC"), lwd = 5)

이렇게하면 그래프뿐만 아니라 이상적인 수의 클러스터 수를 찾는 데 여러 지표를 줄 것입니다.지표에 대한 자세한 내용은 여기에서 찾을 수 있습니다 (클러스터 품질) : "Nofollow"> http://mepisto.unige.ch/weightedcluster/

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow