문제

I have two classifiers which I am implementing, and they are both non-deterministic in the sense that they can each give different results (FPR and TPR) when you run them multiple times. I would like to compare these two algorithms to evaluate their performance. How do I go about this? Usually what people do most times is to run the classifier till they get the best FPR and TPR values, then they publish the results. But the problem with this approach is that it might not be a good representation of the performance of such a classifier. This is what I planned on doing so far, but don't know if that is correct:

  1. Split my evaluation data into train and test, and after training, predict using the test data to get the FPR and TPR, then repeat this prediction for 99 more times to form 100 FPR and TPR readings, then take an average of this. To get an ROC, use the mean FPR and TPR. OR
  2. Use k fold cross validation of say k=3 or 10 on the data, and this will return 3 or 10 different values for TPR and FPR, then I will take the mean to get the mean FPR and TPR, and also use this mean for plotting the mean ROC.

Which of the two methods I stated above is ok? And if they are both wrong, what do you suggest I do. Thanks.

도움이 되었습니까?

해결책

A good strategy is to do n times k-fold cross validation, which should give a pretty good estimation of the average performance of both algorithms. What this means is you perform your k-fold cross validation n times, with different random folds each time, and average the results. People commonly use k = n = 10, but the higher the values of k and n the better.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 datascience.stackexchange
scroll top