Pergunta

I want to implement decision tree ID3/C4.5 on Hadoop. Can anyone through idea how to go ahead.

I am clear about the algorithms but I need to know how to parallelize them.

Nenhuma solução correta

Outras dicas

I would consider approach of having one iteration of attribute selection as one MapReduce job. Following this idea you can assign to each mapper on attribute to check for the information gain, and, on the reduce phase (with single reducer) you can select the best attributes.
I would consider this approach practical if computation of single iteration on one machine (over all attribute) is somewhat longer then job start overhead - which is about 20-40 seconds.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top