Pergunta

I need to know why we need to deal with data imbalance. I know how to deal with it and different methods to solve the issue which is by up sampling or down sampling or by using Smote.

For example, if I have a rare disease 1 percent out of 100, and lets say I decided to have a balanced data set for my training set which is: 50/50 sample Will not that make the machine think 50% of patients will have disease? even though the ratio is 1 of 100. So

  1. Why do we need to deal with data imbalance?
  2. What is the recommended ratio to have balance set

Nenhuma solução correta

Licenciado em: CC-BY-SA com atribuição
scroll top