문제

I'm using an experimental design to test the robustness of different classification methods, and now I'm searching for the correct definition of such design.

I'm creating different subsets of the full dataset by cutting away some samples. Each subset is created independently with respect to the others. Then, I run each classification method on every subset. Finally, I estimate the accuracy of each method as how many classifications on subsets are in agreement with the classification on the full dataset. For example:

Classification-full     1    2    3    2    1    1    2

Classification-subset1  1    2         2    3    1   
Classification-subset2       2    3         1    1    2
...

Accuracy                1    1    1    1  0.5    1    1

Is there a correct name to this methodology? I thought it can fall under bootstrapping but I'm not sure about this.

도움이 되었습니까?

해결책

Random subsampling seems appropriate, bootstrapping is a bit more generic, but also correct.

Here are some references and synonyms: http://www.frank-dieterle.com/phd/2_4_3.html

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 datascience.stackexchange
scroll top