What is 'parameter convergence'?

https://datascience.stackexchange.com/questions/15023

16-10-2019
|

Question

I'm trying to teach myself data science, with my particular interest being decision trees. A few steps in, I've come across a term, 'parameter convergence' that I can't find a definition for (because, after all, I'm learning on my own and have no access to teachers or peers):

However, even in studies with much lower numbers of predictor variables, the combination of all main and interaction effects of interest – especially in the case of categorical predictor variables – may well lead to cell counts too sparse for parameter convergence. (from Strobl et al., 2009)

A web search isn't overly helpful because convergence is such a common term, and I'm not sure which results specifically apply in the context of decision trees. And also, the results don't provide an entry-level definition.

So, while a definition or explanation of parameter convergence (in the context of recursive partitioning) would be great, it would also be handy to be directed to a resource (academic or otherwise) that might have a 'glossary' of this and similar terms...

Solution

A naive definition of Parameter convergence is when the weights or the values of the parameters reach a point asymptotically. What I mean is that, when your model training is not altering the parameter values(maybe less than epsilon-small values) it might be a good fit. For decision trees, I found this paper which explains rate of convergence and more. It might be a good read if you want to get more details.

OTHER TIPS

Many ML and minimization tasks make use of an objective function. At each iteration, a parameter set to try is defined, and the objective function returns some scoring value that reflects how good, or bad, that parameter set is. Then the parameter set is altered, and the process repeats.

So, when do you stop this process? You want to stop when the changes in fit (getting ever closer to a local or global minima) become acceptably small. What is acceptable? It's up to you, but partially influenced by the form of the gradient, or error space. There are guidelines, and many algorithms build those in by default.

As a set, these stopping rules comprise convergence criteria. When an algorithm converges, it has found a parameter set meeting your requirements. There are many ways for it to fail, particularly if a suitable parameter set can't be found within some maximum number of iterations. Conversely, you can set up unreasonable stopping criteria that will lead to convergence with very poor parameters.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange