Is there a quantitative way to determine if a class of algorithms tends produce low bias or low variance models?

https://datascience.stackexchange.com/questions/86196

17-12-2020
|

문제

I understand that some machine learning models tend to be low bias, whereas others tend to be low variance (source). As an example, a linear regression will tend to have low variance error and high bias error. In contrast, a decision tree will tend to have high variance error and low bias error. Intuitively this makes sense because a decision tree is prone to overfitting the data, whereas a linear regression is not. However, is there a more quantitative way to determine if a class of algorithms tends to produce low bias or low variance models?

해결책

It's more a matter of complexity of the model than of the class of algorithms. Of course some classes of algorithm produce more complex models than others by construction, but this is not always the case. For example the complexity of a Decision Tree usually depends on the options/hyper-parameters: maximum depth, pruning, minimum number of instances in a branch. If these parameters are set to produce a small tree then the risk is bias (underfitting), but if they are set to produce a large tree then the risk is variance (overfitting).

The complexity of a model depends mostly on the number and nature of its parameters, so as a first approximation the number of parameters is a reasonable quantitative measure of complexity (see this closely related question). Also keep in mind that most models try to use all the features they are provided with, so the number of features also has a high impact on the model complexity.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 datascience.stackexchange