문제

I implemented a fully connected MLP of shape [783 (input), 128 (hidden layer) and 10 (output)] the hidden layer had a sigmoid activation function and the output a sofmax.

I tested with the dataset of keras: Classify images of clothing.

At first I got the ouput was 0.1 at all the exits not matter the input. I then read this and because someone asked about the weights initialization I changed my weights initialization from a normal distribution between [0, 1) to [-1, 1). After that my network started working.

Why did this happen? I believe the prection of 0.1 is some kind of local minima because it just says the same probability for all, at least is what makes sense if you knew nothing about the data. But why? I would love to be refered to a paper that talks about this issue and how to prevent it because I am trying with another dataset now and I got the same problem (but this time I could not make it work. I even try Xavier initialization and still no good result).

올바른 솔루션이 없습니다

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 datascience.stackexchange
scroll top