I want to implement decision tree ID3/C4.5 on Hadoop. Can anyone through idea how to go ahead.

I am clear about the algorithms but I need to know how to parallelize them.

没有正确的解决方案

其他提示

I would consider approach of having one iteration of attribute selection as one MapReduce job. Following this idea you can assign to each mapper on attribute to check for the information gain, and, on the reduce phase (with single reducer) you can select the best attributes.
I would consider this approach practical if computation of single iteration on one machine (over all attribute) is somewhat longer then job start overhead - which is about 20-40 seconds.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top