Here's one way. The kmeans(...)
function in base R has an option to specify initial cluster centers. So you could calculate centers based on the groupings implied in seedgroup
. Calling your dataset df
:
centers <- aggregate(df[,-1],by=list(df$seedgroup),mean)
km <- kmeans(df[,2:6],centers=centers[,2:6])
df <- data.frame(cluster=km$cluster-1,df)
df
# cluster seedgroup RhodDec VaccVit VaccOxy RubuCam ChamCal
# SiteA 1 1 0.00 0.01 0.01 0.00 0.00
# SiteB 1 2 0.00 0.01 0.00 0.00 0.00
# SiteC 1 0 0.00 0.01 0.01 0.01 0.00
# SiteD 1 1 0.00 0.01 0.00 0.00 0.00
# SiteE 2 2 0.09 0.02 0.01 0.01 0.02
# SiteF 1 1 0.00 0.00 0.01 0.03 0.02
# SiteG 0 0 0.00 0.01 0.06 0.02 0.01
# SiteH 1 1 0.00 0.01 0.00 0.00 0.00
Note that kmeans(...)
returns 1-based cluster numbers, whereas yours are 0-based. In this limited example, SiteB was moved from cluster 2 -> 1 and SiteC was moved from 0 -> 1, which looks reasonable based on the data.