Domanda

Once I have collected and organized data in a SOM how do I identify clusters?

(Items are aggregated and clustered using many traits - upwards of 10)

Specifically I want to find the 'center' of the cluster - therefor giving me the 'center' node(s).

È stato utile?

Soluzione

You could use a relative small map and consider each node a cluster, but this is far from optimal. If you want to apply an automated cluster detection method you should definitely read

Clustering of the Self−Organizing Map

and search similar bibliography.

You could also use more sophisticated versions of SOM algorithm (multi leveled, self growing, etc).

In any case, keep in mind that the problem of finding the "correct" number of clusters doesn't have a finite solution.

Altri suggerimenti

As far as I can tell, SOM is primarily a data-driven dimensionality reduction and data compression method. So it won't cluster the data for you; it may actually tend to spread clusters in the projection (i.e. split them into multiple cells).

However, it may work well for some data sets to either:

  • Instead of processing the full data set, work only on the SOM nodes (weighted by the number of elements assigned to them), which should be significantly smaller
  • Instead of working in the original space, work in the lower-dimensional space that the SOM represents

And then run a regular clustering algorithm on the transformed data.

Though an old question I've encountered the same issue and I've had some success implementing Estimating the Number of Clusters in Multivariate Data by Self-Organizing Maps, so I thought I'd share.

The linked algorithm uses the U-matrix to highlight the boundaries of the individual clusters and then uses an image processing algorithm called watershedding to identify the components. For this to work correctly the regions in the u-matrix are required to be concave within the resolution of your quantization (which when converted to a binary image, simply results in using a floodfill to identify the regions).

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top