Question

Right now I'm training a deep neural network for a binary classification problem, with a feature set of winrates. As such, each winrate is bigger or equal to 0 but smaller than 100.

I've been getting promising results without normalizing the input data, until I normalized it and got staggeringly worse accuracy.

The input feature is a 2d matrix of size 20, and the network has four layers with differing numbers of nodes in each layer. I'm using sgd optimizer and ReLU activation for the hidden layers, and the softmax activation function for the output layer.

The thing I'm wondering is why I'm getting better results with the neural network without the normalization? Is it because the optimal hyperparameters required for the network with the normalized input are different from when it is not normalized?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top