Question

I wanna do this: protobuf -> byte array -> |HTTP post| -> byte array -> serialize to a local file -> MapReduce in Hadoop.

  1. The first way I wanna do is use BufferedWriter/FileWriter to write byte array as String to a local file with '\n' or another delimiter. The problem of this way is byte array written probably contain '\n' or some other delimiter.

  2. byte array -> protobuf -> json, then serialize json to a file. It will encounter the same problem as above.

  3. There is a InputFormat in Hadoop named SequenceFileInputFormat, it seems to be used in MapReduce of Hadoop. My question is how can I serialize byte array to a file in that format locally?

Or any different way can solve my problem? Thank you.

Était-ce utile?

La solution

OK, the problem was solved now.

org.apache.hadoop.io.SequenceFile.Writer works for me.

Configuration conf = new Configuration();
LocalFileSystem fs = FileSystem.getLocal(conf);
SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf, new Path("~/test"), LongWritable.class, BytesWritable.class);
LongWritable key = new LongWritable(1L);
BytesWritable val = new BytesWritable(protobufObject.toByteArray());
writer.append(key, val);
writer.close();
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top