Question

Here is my understanding of those 2 terms:

Hyper-parameter: A variable that is set by a human before the training process starts. Examples are the number of hidden-layers in a Neural Network, the number of neurons in each layer, etc. Some models don't have any hyper-parameters, like the linear model.

Parameter: A variable in which the training process will update. For instance, the weights of a Neural Network are parameters as they are updated as we train the network and there is no human intervention on the process. Another example would be the slope and the y-intercept in a simple linear model.

Having said that, what would the learning rate parameter ($\eta$) be?

$$ \Theta_{i+1} = \Theta_{i} + \eta \nabla J(\Theta_{i} ) $$

My understanding is that $\eta$ is set before the training starts to a large value but then, as the training progresses and the function gets closer and closer to a local minimum, the learning rate is decreased. In that case, doesn't the learning parameter satisfy both the definitions of a parameter and of a hyper-parameter?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top