It sounds like you want to retain the first column (even though 10062 levels for 14634 observations is quite high). The way to convert a factor to numeric values is with the model.matrix
function. Before converting your factor:
data(iris)
head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3.0 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 4 4.6 3.1 1.5 0.2 setosa
# 5 5.0 3.6 1.4 0.2 setosa
# 6 5.4 3.9 1.7 0.4 setosa
After model.matrix
:
head(model.matrix(~.+0, data=iris))
# Sepal.Length Sepal.Width Petal.Length Petal.Width Speciessetosa Speciesversicolor Speciesvirginica
# 1 5.1 3.5 1.4 0.2 1 0 0
# 2 4.9 3.0 1.4 0.2 1 0 0
# 3 4.7 3.2 1.3 0.2 1 0 0
# 4 4.6 3.1 1.5 0.2 1 0 0
# 5 5.0 3.6 1.4 0.2 1 0 0
# 6 5.4 3.9 1.7 0.4 1 0 0
As you can see, it expands out your factor values. So you could then run k-means clustering on the expanded version of your data:
kmeans(model.matrix(~.+0, data=iris), centers=3)
# K-means clustering with 3 clusters of sizes 49, 50, 51
#
# Cluster means:
# Sepal.Length Sepal.Width Petal.Length Petal.Width Speciessetosa Speciesversicolor Speciesvirginica
# 1 6.622449 2.983673 5.573469 2.032653 0 0.0000000 1.00000000
# 2 5.006000 3.428000 1.462000 0.246000 1 0.0000000 0.00000000
# 3 5.915686 2.764706 4.264706 1.333333 0 0.9803922 0.01960784
# ...