Question

Why can't we just use a step function then when calculating the weights use,

weightChange = n * (t-o) * i

Where, n: learning rate;
t: target out;
o: actual out;
i: input

This works with single layer networks. I've heard a sigmoid is needed to deal with non linear problems but why?

Was it helpful?

Solution 2

Strictly speaking, you don't need a sigmoid activation function. What you need is a differentiable function that serves as an approximation to the step function. As an alternative to the sigmoid, you could instead use a hyperbolic tangent function.

For multi-layer perceptron networks, the simple perceptron learning rule does not provide a means for determining how a weight several layers from the output should be adjusted, based on a given output error. The backpropagation learning rule relies on the fact that the sigmoid function is differentiable, which makes it possible to characterize the rate of change in the output layer error with respect to a change in a particular weight (even if the weight is multiple layers away from the output). Note that as the k parameter of the sigmoid tends toward infinity, the sigmoid approaches the step function, which is the activation function used in the basic perceptron.

OTHER TIPS

Sigmoid activation allows for a smooth curve of real values numbers from [0,1]. This way, the errors can be calculated and tuned in such a way that the next time you perfmorm feed-forward, it will output not just integers, but predictions from [0,1]. This way you can choose which to ignore, and which to accept.

What you described would be a binary neuron, which is completely acceptable as well. But the sigmoid activated neurons give you that spectrum of [0,1]

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top