Question

enter image description here

The graph above shows how accuracy stops increasing after reaching a certain number of features. There are also sudden drops in accuracy at some points. Can this be attrrubuted to overfitting? I am training a decision tree by the way.

Was it helpful?

Solution

I can tell from your screenshot that you are plotting the validation accuracy. When you overfit your training accuracy should be very high, but your validation accuracy should get lower and lower. Or if you think in terms of error rather than accuracy you should see the following plot in case of overfitting. In the figure below the x-axis contains the training progress, i.e. the number of training iterations. The training error (blue) keeps decreasing, while the validation error (red) starts increasing at the point where you start overfitting.

enter image description here

This picture is from the wikipedia article on overfitting by the way: https://en.wikipedia.org/wiki/Overfitting Have a look.

So to answer your question: No, I don't think you are overfitting. If increasing the number of features would make the overfitting more and more significant the validation accuracy should be falling, not stay constant. In your case it seems that more features are simply no longer adding additional benefit for the classification.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top