how to get multipleOutput in hadoop

https://stackoverflow.com/questions/16637402

29-05-2022
|

Domanda

I'm new to Hadoop, and now have to process a input file. I want to process each line and the output should be one file for each line.

I surf the internet and found MultipleOutputFormat, and generateFileNameForKeyValue.

But most people write it with JobConf class. As I'm using Hadoop 0.20.1, I think Job class takes place. And I don't know how to use Job class to generate multiple output files by key.

Could anyone help me?

Soluzione

The Eclipse plugin is mainly used to submit and monitor jobs as well as interact with HDFS, against a real or 'psuedo' cluster.

If you're running in local mode, then i don't think the plugin gains you anything - seeing as your job will be run in a single JVM. With this in mind i would say include include the most recent 1.x hadoop-core in your Eclipse project's classpath.

Eitherway MultipleOutputFormat has not been ported to the new mapreduce package (neither in 1.1.2 or 2.0.4-alpha), so you'll either need to port it yourself or find another way (maybe MultipleOutputs - The Javadoc page has some usage on using MultipleOutputs)

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow