Question

I have a big database with 40k recors and 2 classification classes. In this big database the 76% of records belong to the first class.

I've used a 70-30 split partition with stratified sampling, and the K-nn gives the best accuracy on k = 20.

1) Is it too big value for k ?

2) Is it possible that this big value for k is due to the disproportion of the 2 classes in the database , even if i used a stratified sampling ?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top