Why is the variance going down so much in this weight initialization problem(using pytorch)?

https://datascience.stackexchange.com/questions/62195

02-11-2019
|

質問

first look at this example

>>> x = t.randn(512)
>>> w = t.randn(512, 500000)
>>> (x @ w).var()
tensor(513.9548)

it makes sense that the variance is close to 512 because each one of 500000, is a dot product of a 512 vector and a 512 vector, that is sampled from a distribution with a standard deviation of 1 and mean of 0

However, I wanted the variance to go down to 1, and consequently the std to be 1 since standard deviation is square root of variance, where 1 is the variance.

To do this I tried the below

>>> x = t.randn(512)
>>> w = t.randn(512, 500000) * (1/512)
>>> (x @ w).var()
tensor(0.0021)

However the variance is actually now 512 / 512 / 512 instead of 512/ 512

In order to do this correctly, I needed to try

>>> x = t.randn(512)
>>> w = t.randn(512, 500000) * (1 / (512 ** .5))
>>> (x @ w).var()
tensor(1.0216)

Why is that the case?

正しい解決策はありません

ライセンス： CC-BY-SA と帰属

所属していません datascience.stackexchange