Pergunta

From sklearn docs:

Note that the purpose of the MDS is to find a low-dimensional representation of the data (here 2D) in which the distances respect well the distances in the original high-dimensional space, unlike other manifold-learning algorithms, it does not seeks an isotropic representation of the data in the low-dimensional space.

Can someone elaborate, in layman's terms, what the distinction is?

Foi útil?

Solução

The images in the link you provide, of the severed sphere and its lower-dimensional representations, go some way towards explaining the difference.

The severed sphere is a set of points in a three-dimensional space, but we want a two-dimensional representation of it. The objective of manifold-learning is (shockingly) to find a manifold: a subset of that three-dimensional space which (a) closely fits all the points that make up the severed sphere, and (b) can be described with a two-dimensional coordinate system.

If you look at some of the other lower-dimensional representations of the severed sphere, it's like they're taking it and flattening it out into a rectangle so it'll fit in two dimensions. It's taking the severed sphere and figuring out a new coordinate system that maps as closely as possible onto all the points that make up the severed sphere.

The MDS lower-dimensional representation, though, is more like a shadow that the severed sphere casts on a wall. Rather than finding a new coordinate system that closely fits the sphere, it's just "forgetting" whichever of the dimensions it thinks it can most afford to lose while maintaining the same distance to and from all the points.

A good analogy would be maps of the earth. A good map of the earth makes a new coordinate system that fits a sphere onto a 2D surface. To do this it has to distort the relative distances between places, but you end up with effective 2D coordinates that relate well to places on the globe.

Instead of doing this, you could just take two photos of the earth from above the north and south pole and glue them back to back. You'd still have a 2D representation of the earth, but it doesn't work so well as a coordinate system.

This isn't to say that MDS is "bad". It's just doing something different. You probably wouldn't use MDS for dimensionality reduction prior to carrying out some sort of statistical procedure, but if you're trying to produce a graphic that gives some idea of how close multidimensional points are to one another, it might be a good choice.

Licenciado em: CC-BY-SA com atribuição
scroll top