why we need to handle data imbalance?
-
31-10-2019 - |
Pergunta
I need to know why we need to deal with data imbalance. I know how to deal with it and different methods to solve the issue which is by up sampling or down sampling or by using Smote.
For example, if I have a rare disease 1 percent out of 100, and lets say I decided to have a balanced data set for my training set which is: 50/50 sample Will not that make the machine think 50% of patients will have disease? even though the ratio is 1 of 100. So
- Why do we need to deal with data imbalance?
- What is the recommended ratio to have balance set
Nenhuma solução correta
Licenciado em: CC-BY-SA com atribuição
Não afiliado a datascience.stackexchange