Question

Learning Perceptorn can be easily accomplished using the update rule w_i=w_i + n(y-\hat{y})x.

All resources I read so far say that the learning rate n can be set to 1 w.l.g.

My question is the following, is there any proof that the Speed of convergence will always be the same, given that the data is linearly separable? Should not this also depend of the initial w vector?

Was it helpful?

Solution

Citing Wikipedia:

The decision boundary of a perceptron is invariant with respect to scaling of the weight vector; that is, a perceptron trained with initial weight vector \mathbf{w} and learning rate \alpha \, behaves identically to a perceptron trained with initial weight vector \mathbf{w}/\alpha \, and learning rate 1. Thus, since the initial weights become irrelevant with increasing number of iterations, the learning rate does not matter in the case of the perceptron and is usually just set to 1.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top