Question

I'm using spark with scala to implement majority voting of decision trees and random forest (both are configured in the same way - same depth, the same amount of base classifiers etc.). Dataset is split equally among base classifiers for majority voting. Nemenyi test shows, that majority voting is significantly better (for 11 benchmarking datasets from keel).

From what I understand, the difference between those two methods is that data used to train random forest (base classifiers) might not sum up to the whole dataset. Is my understanding correct? If so, what might be the reason for the observed difference?

Also, could you point me to any articles comparing those two methods?

Edit: If someone was interested in this topic, here's an article comparing bagging with horizontal partitioning in favor of the latter.

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top