Question

NN, when they talk about "number of parameters" in papers, they usually mean weight matrixes for each layer and bias for each unit with activation function? There are no other parameters needed for NN to work?

Well there are so called hyperparameters that define number of layers, number of units per layer and activation functions for those units - but lets put them aside.

I mean if we got non-convolutional neural net with one million parameters, that mean author counted number of weights and number of units(I assume each one usually got bias, correct?) used in the net?

Was it helpful?

Solution

You need to separate network-inherent parameters from training parameters. For a given topology, activation and accumulation function (this is what you call hyperparameters) the only remaining network-inherent parameters are indeed the weights. However, note that there is no definite reason to strictly divide these two kinds of parameters ("weights" and "hyperparameters").

Further, there are also training parameters like the step width of the gradient descent step, the momentum, the regularization (or weight decay) parameter, the optimization function (usually least squares), and possibly more. Some people might also include them into the parameter set.

Thus, I suggest you to look in detail at what papers call the "number of parameters", because it is hard to believe that is is denoted consistently throughout the literature.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top