I don't know how it happened, but suddenly my output file from my hadoop reducer contains a bunch of characters which I never put it in. The first few lines are like:

SEQ^F!org.apache.hadoop.io.LongWritable^Yorg.apache.hadoop.io.Text^@^@^@^@^@^@
<99><F1>a^O)(|7me<EB><C0><FB><E0><B2><E2>^@^@^@f^@^@^@^H^@^@^@^@^@^@^@^@]0 1.4593640091648059E-6 30 303761 30 303747 33 341837 30 303746 30 303743 30 312703 30 303759^@^@^@<8B>^@^@^@^H^@^@^@^@^@^@^@^@<8F><81>1

Anybody can help?

Thanks in advance

有帮助吗?

解决方案

This is a binary sequence file, that you're interpreting as text. The first 3 characters is the 'magic number' denoting it's a sequence file. Then you can see the key and value types (LongWritable and Text), followed by the sequence file block header and then blocks of key / value pairs (all binary encoded).

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top