문제

I have generated a dataset of pairwise distances as follows:

id_1 id_2 dist_12
id_2 id_3 dist_23

I want to cluster this data so as to identify the pattern. I have been looking at Spectral clustering and DBSCAN, but I haven't been able to come to a conclusion and have been ambiguous on how to make use of the existing implementations of these algorithms. I have been looking at Python and Java implementations so far.

Could anyone point me to a tutorial or demo on how to make use of these clustering algorithms to handle the situation in hand?

도움이 되었습니까?

해결책

In the scikit-learn implementation of Spectral clustering and DBSCAN you do not need to precompute the distances, you should input the sample coordinates for all id_1 ... id_n. Here is a simplification of the documented example comparison of clustering algorithms:

import numpy as np
from sklearn import cluster
from sklearn.preprocessing import StandardScaler

## Prepare the data
X = np.random.rand(1500, 2)
# When reading from a file of the form: `id_n coord_x coord_y`
# you will need this call instead:
# X = np.loadtxt('coords.csv', usecols=(1, 2))
X = StandardScaler().fit_transform(X)

## Instantiate the algorithms
spectral = cluster.SpectralClustering(n_clusters=2,
                                      eigen_solver='arpack',
                                      affinity="nearest_neighbors")
dbscan = cluster.DBSCAN(eps=.2)

## Use the algorithms
spectral_labels = spectral.fit_predict(X)
dbscan_labels = dbscan.fit_predict(X)
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 datascience.stackexchange
scroll top