Question

Could someone please explain how the EasyEnsemble algorithm works? Im using it for a prediction model for imbalanced minority class.

Please don't refer me to this paper, as it makes no sense to me. EasyEnsemble and Feature Selection for Imbalance Data Sets

Im using the algorithm in Pandas, with the UnbalancedDataset library which is on GitHub UnbalancedDataset

I get an array of matrices as O/P, I don't know how to use this in the end, to train with random forests.

Thanks

Was it helpful?

Solution

The toolbox only manage the sampling so this is slightly different from the algorithm from the paper.

What it does is the following: it creates several subset of data which are balanced. These subsets are created by randomly under-sampling the majority class. That is what you are getting from the toolbox.

To obtain what in the paper, you need to train an AdaBoost classifier for each subset. Thus, what you get is an ensembles of ensembles.

Hope that's help.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top