My map function produces a

Key\tValue

Value = List(value1, value2, value3)

then my reduce function produces:

Key\tCSV-Line

Ex.


2323232-2322 fdsfs,sdfs,dfsfs,0,0,0,2,fsda,3,23,3,s,

2323555-22222 dfasd,sdfas,adfs,0,0,2,0,fasafa,2,23,s


Ex. RawData: 232342|@3423@|34343|sfasdfasdF|433443|Sfasfdas|324343 x 1000

Anyway I want to eliminate the key's at the beginning of that so my client can do a straight import into mysql. I have about 50 data files, my question is after it maps them once and the reducer starts does it need the key printed out with the value or can I just print the value?


More information:

Here this code might shine some better light on the situation

http://pastebin.ca/2410217

this is kinda what I plan to do.

有帮助吗?

解决方案 2

Your reducer can emit a line without \t, or, in your case, just what you're calling the value. Unfortunately, hadoop streaming will interpret this as a key with a null value and automatically append a delimiter (\t by default) to the end of each line. You can change what this delimiter is but, when I played around with this, I could not get it to not append a delimiter. I don't remember the exact details but based on this (Hadoop: key and value are tab separated in the output file. how to do it semicolon-separated?) I think the property is mapred.textoutputformat.separator. My solution was to strip the \t at the end of each line as I pulled the file back:

hadoop fs -cat hadoopfile | perl -pe 's/\t$//' > destfile

其他提示

If you do not want to emit the key set it to NullWritable in your code. For example :

public static class TokenCounterReducer extends
            Reducer<Text, IntWritable, NullWritable, IntWritable> {
        public void reduce(Text key, Iterable<IntWritable> values,
                Context context) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable value : values) {
                sum += value.get();
            }
            context.write(NullWritable.get(), new IntWritable(sum));
//          context.write(key, new IntWritable(sum));
        }

Let me know if this is not what you need, i'll update the answer accordingly.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top