Domanda

What I need to do and am having some trouble doing is to have two values output for one key as the output to my reduce function. The reduce function receives data in the form of an Id and a list of integers associated with that Id. It needs to output that Id, the average of the integers in the list and the length of the list.

However, the implementation of the reduce function is supposed to have OutputCollector <Text, IntWritable> as an argument which clearly limits the number of outputs associated with each key to 1.

Any help in this regard would be greatly appreciated. Thanks in advance.

Hadoop version: 2.0.0

È stato utile?

Soluzione

You have to use MultipleOutputs. In the Job:

MultipleOutputs.addMultiNamedOutput(job,
            "Name",
            SequenceFileOutputFormat.class, Text.class,
            Writable.class);

In the reducer :

multipleOutputs.getCollector

Altri suggerimenti

Here are a few answers to your vague question.

You can call collect() as many times as you want for the same key if you don't mind each of the data (length and mean) appearing on its own record in a blended output. This could be accomplished by writing the key differently to distinguish the different records types, as follows:

oc.collect( new Text( k.toString() + " mean", mean );
oc.collect( new Text( k.toString() + " length", length );

OR

You should choose a different value type V3 instead of IntWritable. Either create a PairOfIntWritable or use an ArrayWritable to shove whatever you want into a single call to collect. Then the length and mean can be "fields" of the value for a single record for each key.

OR

If you absolutely have to use IntWritable, use an invertible pairing function to combine two integers into one. You'll need to insure that you can't exceed the maximum value of an IntWritable with any possible pair you could generate from your input data.

OR

Use MultipleOutputs to send one record each to a different file distinguished by name, so the part-r-nnnnn contain means and length-r-nnnnn contain lengths, for example. The JavaDoc on MultipleOutputs explains its use.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top