You need to separate network-inherent parameters from training parameters. For a given topology, activation and accumulation function (this is what you call hyperparameters) the only remaining network-inherent parameters are indeed the weights. However, note that there is no definite reason to strictly divide these two kinds of parameters ("weights" and "hyperparameters").
Further, there are also training parameters like the step width of the gradient descent step, the momentum, the regularization (or weight decay) parameter, the optimization function (usually least squares), and possibly more. Some people might also include them into the parameter set.
Thus, I suggest you to look in detail at what papers call the "number of parameters", because it is hard to believe that is is denoted consistently throughout the literature.