Question

I wanna do this: protobuf -> byte array -> |HTTP post| -> byte array -> serialize to a local file -> MapReduce in Hadoop.

  1. The first way I wanna do is use BufferedWriter/FileWriter to write byte array as String to a local file with '\n' or another delimiter. The problem of this way is byte array written probably contain '\n' or some other delimiter.

  2. byte array -> protobuf -> json, then serialize json to a file. It will encounter the same problem as above.

  3. There is a InputFormat in Hadoop named SequenceFileInputFormat, it seems to be used in MapReduce of Hadoop. My question is how can I serialize byte array to a file in that format locally?

Or any different way can solve my problem? Thank you.

Was it helpful?

Solution

OK, the problem was solved now.

org.apache.hadoop.io.SequenceFile.Writer works for me.

Configuration conf = new Configuration();
LocalFileSystem fs = FileSystem.getLocal(conf);
SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf, new Path("~/test"), LongWritable.class, BytesWritable.class);
LongWritable key = new LongWritable(1L);
BytesWritable val = new BytesWritable(protobufObject.toByteArray());
writer.append(key, val);
writer.close();
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top