How much can bias decrease performance of the network at the beginnng of the training?

datascience.stackexchange https://datascience.stackexchange.com/questions/69423

  •  09-12-2020
  •  | 
  •  

Pergunta

I am writing a custom framework and in it I'm trying to train a simple network to predict the addition function.

The network:

  • 1 hidden layer of 3 Neurons
  • 1 output layer
  • cost function used is Squared error, (Not MSE to avoid precision problems)
  • Identity transfer function to make things simple at first
  • no specal updaters, just the step size
  • no learning rate decay
  • no regularization

The training set:

  • ~500 samples
  • inputs: [n1][n2]; labels: [n1 + n2]
  • Every element is between 0 and 1. e.g.: [0.5][0.3] => [0.8]

The algorithm I'm using to optimize:

  • samples 64 elements for an epoch
  • for each sample: it evaluates the error
    • then propagates the error back
    • and then based on the error values calculates the gradients
    • the gradients for each elements are added up into one vector, then normalized by dividing by the number of samples evaluated
  • After the gradients are calculated a step size of 1e-2 is used to modify the weights.
  • The training stops when the sum of the errors for the 500 data elements are below 1e-2

I don't have a test dataset yet, as first I'd like to overfit to a training set, to see if it could even do that. Withouot bias the training converges to an optimum in about ~4k epochs.

When I include the tuning of bias into the training, it seems to have a much worse performance, the network is not converging to the optimum, instead the biases and the weights oscillate next to one another..

Is this a normal effect of introducing a bias?

Here is a chart abuot the weight values throughout the training: enter image description here

Foi útil?

Solução 2

It should not decrease performance that badly (as shown in the question). Biases help in generalizing and learning it adds complexity to the training, but it doesn't add much in this example.

Here the problem was with the implementation. After ~weeks of drooling above it, I finally got to the point where I started to use ancient methods ( pen + paper ) to verify the gradients, and therein I found a bug in the cost function:

some network outputs were compared to the wrong label values, hence the gradient calculation was faulty.

After the bug was fixed the network now converges as it should:

enter image description here

Outras dicas

Bias takes care of variables which are latent for i.e you did not include it in the training set.

So you are trying to overfitt on training set, but now you introduce pertrubance. Ofcourse it will have worse performance.

Licenciado em: CC-BY-SA com atribuição
scroll top