How much can bias decrease performance of the network at the beginnng of the training?

https://datascience.stackexchange.com/questions/69423

09-12-2020
|

Pergunta

I am writing a custom framework and in it I'm trying to train a simple network to predict the addition function.

The network:

1 hidden layer of 3 Neurons
1 output layer
cost function used is Squared error, (Not MSE to avoid precision problems)
Identity transfer function to make things simple at first
no specal updaters, just the step size
no learning rate decay
no regularization

The training set:

~500 samples
inputs: [n1][n2]; labels: [n1 + n2]
Every element is between 0 and 1. e.g.: [0.5][0.3] => [0.8]

The algorithm I'm using to optimize:

samples 64 elements for an epoch
for each sample: it evaluates the error
- then propagates the error back
- and then based on the error values calculates the gradients
- the gradients for each elements are added up into one vector, then normalized by dividing by the number of samples evaluated
After the gradients are calculated a step size of 1e-2 is used to modify the weights.
The training stops when the sum of the errors for the 500 data elements are below 1e-2

I don't have a test dataset yet, as first I'd like to overfit to a training set, to see if it could even do that. Withouot bias the training converges to an optimum in about ~4k epochs.

When I include the tuning of bias into the training, it seems to have a much worse performance, the network is not converging to the optimum, instead the biases and the weights oscillate next to one another..

Is this a normal effect of introducing a bias?

Here is a chart abuot the weight values throughout the training:

Solução 2

It should not decrease performance that badly (as shown in the question). Biases help in generalizing and learning it adds complexity to the training, but it doesn't add much in this example.

Here the problem was with the implementation. After ~weeks of drooling above it, I finally got to the point where I started to use ancient methods ( pen + paper ) to verify the gradients, and therein I found a bug in the cost function:

some network outputs were compared to the wrong label values, hence the gradient calculation was faulty.

After the bug was fixed the network now converges as it should:

Outras dicas

Bias takes care of variables which are latent for i.e you did not include it in the training set.

So you are trying to overfitt on training set, but now you introduce pertrubance. Ofcourse it will have worse performance.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a datascience.stackexchange