Why is it good to concentrate on variance/covariance structure of the multidimensional data?

Question

This is because very often your data are described by a Gaussian distribution that is parameterized by a covariance matrix and also mean values (don’t forget the means!). For a d-dimensional Gaussian there are d means and d*d/2+d/2 covariance values. So there are may be a lot of parameters to learn. And Gaussian distribution is one of the simplest models. Trey a more complicated model and you will be swarmed by parameters.

For example, you can be given a set of images with skin and background objects and you want to know how to model a skin color vs. background with a simple Gaussain distribution (may be you want to create a skin detector). Well, it is not that simple since color is 3 dimensional so you would have 3 means (r,g,b) and 3x3 symmetric covariance matrix with 6 independent parameters. So a first counterintuitive conclusion is that skin is described by 9 parameters in rob color space. I bet most people would just go with 3 (the means).

In fact, if you calculate the covariance matrix you can discover more counterintuitive facts such as that skin red-green covariance is especially low and red-blue covariance of the skin has much different value from red-blue covariance of the background. Finally, it is very easy to calculate covariance with matrix and matrix representations: cov=sum(v*v^T)/n, where v=data-mean;

Finally to reduce the number of parameters you can consider some dimensionality reduction methods such as PCA, Factor analysis and K-means.