basic clustering with r

Question 1

It sounds like you want to retain the first column (even though 10062 levels for 14634 observations is quite high). The way to convert a factor to numeric values is with the model.matrix function. Before converting your factor:

data(iris)
head(iris)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

After model.matrix:

head(model.matrix(~.+0, data=iris))
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Speciessetosa Speciesversicolor Speciesvirginica
# 1          5.1         3.5          1.4         0.2             1                 0                0
# 2          4.9         3.0          1.4         0.2             1                 0                0
# 3          4.7         3.2          1.3         0.2             1                 0                0
# 4          4.6         3.1          1.5         0.2             1                 0                0
# 5          5.0         3.6          1.4         0.2             1                 0                0
# 6          5.4         3.9          1.7         0.4             1                 0                0

As you can see, it expands out your factor values. So you could then run k-means clustering on the expanded version of your data:

kmeans(model.matrix(~.+0, data=iris), centers=3)
# K-means clustering with 3 clusters of sizes 49, 50, 51
# 
# Cluster means:
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Speciessetosa Speciesversicolor Speciesvirginica
# 1     6.622449    2.983673     5.573469    2.032653             0         0.0000000       1.00000000
# 2     5.006000    3.428000     1.462000    0.246000             1         0.0000000       0.00000000
# 3     5.915686    2.764706     4.264706    1.333333             0         0.9803922       0.01960784
# ...

Question 2

Try dat[,1] = factor(dat[,1]). I think NA is from the session id (first column) which is not number. factor would make session id to be indexed.

Question 3

k-means only works for continuous data.

You have two id columns that must not be used for clustering; they will make your result meaningless.

But even then I doubt that k-means is the appropriate algorithm for your problem. You first need to understand your data, then preprocess and transform it into an appropriate representation.

Don't expect a push-button solution. These don't exist / work.

Question 4

Don't use SPECIE column

km<- kmeans(iris[,1:4],3)

km

K-means clustering with 3 clusters of sizes 50, 38, 62

Cluster means:

  Sepal.Length Sepal.Width Petal.Length Petal.Width
1     5.006000    3.428000     1.462000    0.246000
2     6.850000    3.073684     5.742105    2.071053
3     5.901613    2.748387     4.393548    1.433871

Clustering vector:

[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 2 3 3 3 3 3
[59] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 2 2 2 2 3 2 2 2 2 2 2 3 3 2
[117] 2 2 2 3 2 3 2 3 2 2 3 3 2 2 2 2 2 3 2 2 2 2 3 2 2 2 3 2 2 2 3 2 2 3

Within cluster sum of squares by cluster:

[1] 15.15100 23.87947 39.82097

(between_SS / total_SS = 88.4 %)