What are bias and variance in machine learning?

https://datascience.stackexchange.com/questions/80157

13-12-2020
|

문제

I am studying machine learning, and I have encountered the concept of bias and variance. I am a university student and in the slides of my professor, the bias is defined as:

$bias = E[error_s(h)]-error_d(h)$

where $h$ is the hypotesis and $error_s(h)$ is the sample error and $error_d(h)$ is the true error. In particular, it says that we have bias when the training set and the test set are not independent.

After reading this, I was try to get a little deepr in the concept, so I searched on internet and found this video: https://www.youtube.com/watch?v=EuBBz3bI-aA , where it defines the bias as the impossibility to capture the true relationship by a machine learning momdel.

I don't understand, are the two definition equal or the two type of bias are different?

together with this, I am also studying the concept of variance, and in the slides of my professor it is said that if I consider two different samples from the sample error may vary even if the model is unbiased, but in the video I posted it says that the variance is the difference in fits between training set and test set.

Also in this case the definitions are different, why?

해결책

What are Bias and Variance?

Let's start with some basic definitions:

Bias: it's the difference between average predictions and true values.
Variance: it's the variability of our predictions, i.e. how spread out your model predictions are.

They can be understood from this image:

(source)

What to do about bias and variance?

If your model suffers from a bias problem you should increase its power. For example, if the prediction of your neural network is not good enough, add more parameters, add a new layer making it deeper, etc.

If your model suffers from a variance problem instead, the best possible solution is coming from ensembling. Ensembles of Machine Learning models can significantly reduce the variance in your predictions.

The Bias-Variance tradeoff

If your model is underfitting, you have a bias problem, and you should make it more powerful. Once you made it more powerful though, it will likely start overfitting, a phenomenon associated with high variance. For that reason, you must always find the right tradeoff between fighting the bias and the variance of your Machine Learning models.

(source)

Learning how to do that is more an art than a science!

다른 팁

Well this image explains it all : in ML, you have a bias/variance dilemma : you want to create a model that is precise-enough to learn things from your data, but not perfectly-precised so it learns a tendancy and not the exact values of your training set.

Variance and Bias are to be taken together : on a same model, when you tweak to lower Variance, you'll automatically increase Bias.

Your job is then to get the good compromise, as show in image : a variance high enough (ie a bias low enough) to make good predictions and learn something from your train, but not a too high variance (ie not a too low bias) to avoid overfitting.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 datascience.stackexchange