Question

I want to implement decision tree ID3/C4.5 on Hadoop. Can anyone through idea how to go ahead.

I am clear about the algorithms but I need to know how to parallelize them.

Pas de solution correcte

Autres conseils

I would consider approach of having one iteration of attribute selection as one MapReduce job. Following this idea you can assign to each mapper on attribute to check for the information gain, and, on the reduce phase (with single reducer) you can select the best attributes.
I would consider this approach practical if computation of single iteration on one machine (over all attribute) is somewhat longer then job start overhead - which is about 20-40 seconds.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top