the dimensional reduction issues in self-organizing map (SOM)

https://stackoverflow.com/questions/8130861

01-03-2021
|

Pregunta

Self organizing map is claimed to be able to visualize/cluster the high-dimensional data on a smaller dimensional space. I have some difficulties in understanding this statement.

Consider a six-dimensional data set, the codebook vector/reference vector is also of six-dimensional. According to the SOM algorithm, updating these reference vectors are also conducted in the six-dimensional vector space. If we are considering a two dimensional map, how should I understand the map between the six-dimensional data space and two-dimensional map space?

Solución

The map between the N-dimensional input space and the 2D SOM space is a non-linear projection preserving as much of the topology as possible.
It means that information about distance and angle is lost in the process but that proximity relationship between points is preserved (i.e. 2 points which are close one to another in the input space should be close in the SOM space).
I got my best insight in "what does a SOM do?" by using it on the 3D RGB color space: the work of the SOM can easily be visualized in this case and should help to grasp the concept.

Otros consejos

The 2D self organizing map (SOM) distributes the input vectors onto a 2D plane. Mathematically the SOM is a 3D matrix and the length of the third dimension is given by the length of your input data. To visualize the SOM it's usual to compute the U-matrix. The U-matrix gives for each neuron of the SOM the mean Euclidean distance between the considered neuron and its neighbors. The resulting 2D matrix allows the visualization of the high dimensional space onto a 2D plane. The high values give barrier between clusters represented as deep blue valleys in the following figure: U-matrix_example This U-matrix comes from the learning on this 3D data set: enter image description here And here the U-matrix in the 3D original space:

You cannot understand it but it's possible to use it so you can try to think of it as a discrete function that can map for example a 4d vector space to a 1d vector. Most important is that your function is some sort of recursion. A L-system for example uses recursion or repetition a lot. A better description about monster curves can be found here at Nick' spatial index hilbert curve blog.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow