Domanda

I'm new to Hadoop, and now have to process a input file. I want to process each line and the output should be one file for each line.

I surf the internet and found MultipleOutputFormat, and generateFileNameForKeyValue.

But most people write it with JobConf class. As I'm using Hadoop 0.20.1, I think Job class takes place. And I don't know how to use Job class to generate multiple output files by key.

Could anyone help me?

È stato utile?

Soluzione

The Eclipse plugin is mainly used to submit and monitor jobs as well as interact with HDFS, against a real or 'psuedo' cluster.

If you're running in local mode, then i don't think the plugin gains you anything - seeing as your job will be run in a single JVM. With this in mind i would say include include the most recent 1.x hadoop-core in your Eclipse project's classpath.

Eitherway MultipleOutputFormat has not been ported to the new mapreduce package (neither in 1.1.2 or 2.0.4-alpha), so you'll either need to port it yourself or find another way (maybe MultipleOutputs - The Javadoc page has some usage on using MultipleOutputs)

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top