Why is it good to concentrate on variance/covariance structure of the multidimensional data?

StackOverflow https://stackoverflow.com/questions/22340108

  •  13-06-2023
  •  | 
  •  

質問

Why is it good to concentrate on variance/co-variance structure of the multidimensional data?

役に立ちましたか?

解決

This is because very often your data are described by a Gaussian distribution that is parameterized by a covariance matrix and also mean values (don’t forget the means!). For a d-dimensional Gaussian there are d means and d*d/2+d/2 covariance values. So there are may be a lot of parameters to learn. And Gaussian distribution is one of the simplest models. Trey a more complicated model and you will be swarmed by parameters.

For example, you can be given a set of images with skin and background objects and you want to know how to model a skin color vs. background with a simple Gaussain distribution (may be you want to create a skin detector). Well, it is not that simple since color is 3 dimensional so you would have 3 means (r,g,b) and 3x3 symmetric covariance matrix with 6 independent parameters. So a first counterintuitive conclusion is that skin is described by 9 parameters in rob color space. I bet most people would just go with 3 (the means).

In fact, if you calculate the covariance matrix you can discover more counterintuitive facts such as that skin red-green covariance is especially low and red-blue covariance of the skin has much different value from red-blue covariance of the background. Finally, it is very easy to calculate covariance with matrix and matrix representations: cov=sum(v*vT)/n, where v=data-mean;

Finally to reduce the number of parameters you can consider some dimensionality reduction methods such as PCA, Factor analysis and K-means.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top