Overfitting results with Random Forest Regression

https://datascience.stackexchange.com/questions/76399

12-12-2020
|

Question

I have one image that contains for each pixel 4 different values. I have used RF in order to see if I can predict the 4th value based on the other 3 values of each pixel. for that I have used python and scikit learn. first I have fit the model, and after validate it I used it to predict this image. I was very happy and scared to see that I got very high accuracy for my model : 99.95%! but then when I saw the resulted image it absolutly wasn't 99.95% of accuracy:

original image:

result image:

(I have makrd the biggest and most visible difference).

My question is- why would I get this high accuracy when the visualization shows very well that there is much less accuracy? I understand it might come from overfitting but then how this different is not detected?

edit: Mean Absolute Error: 0.048246606512422616 Mean Squared Error: 0.00670919112477127 Root Mean Squared Error: 0.0819096522076078 Accuracy: 99.95175339348758

Solution

Where are you evaluating the performance of your algorithm?

Are you making a train test split and evaluating in the test split? It might be that you overfitted your train and you are just measuring the accuracy there.

If you have made correctly the train/test split and the evaluation it could be that the images that you are predicting do not have the same properties/configuration/topology than the with you are trainning

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange