문제

I am entering several Kaggle Machine Learning competitions at the moment and I just have a quick question. Why do we use cross validation to assess our algorithms effectiveness in these competitions?

Surely in these competitions your score in the public leaderboard, where your algorithm is tested against actual live data would give you a more accurate representation of your algorithms efficacy?

도움이 되었습니까?

해결책

Cross-validation is a necessary step in model construction. If cross-validation gives you poor results, there is no sense in even trying it on live data. Your set on which you are training and validating is also live data, isn't it? So, the results should be similar. Without validating your model you don't have any insight into its performance whatsoever. Models which give 100% accuracy on training set could give random results on validation set.

Let me re-iterate, cross-validation is not a replacement for live data test, it is a part of model construction process.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top