Question

I first pack all my images into Hadoop sequenceFile:

FSDataInputStream in = null;    
in = fs.open(new Path(uri)); //uri is the image location in HDFS
byte buffer[] = new byte[in.available()];
in.read(buffer);
context.write(imageID, new BytesWritable(buffer));

Then I want to get my original images back from Sequence file, in the reducer:

BufferedImage imag;    
imag = ImageIO.read(new ByteArrayInputStream(value.getBytes())); 

But the image is not properly got, since I have this error:

Error: javax.imageio.IIOException: Error reading PNG image data
Caused by: java.io.EOFException: Unexpected end of ZLIB input stream

My question is how to get original images from sequence file in hadoop?

Was it helpful?

Solution

The problem is I use the wrong way to read the stream. Here is the right way.:

import org.apache.commons.io.IOUtils;
Configuration confHadoop = new Configuration();
FileSystem fs = FileSystem.get(confHadoop);
Path file = new Path(fs.getUri().toString() + "/" + fileName);
in = fs.open(file);
byte[] buffer = IOUtils.toByteArray(in);

Then the buffer can be written to sequenceFile by new BytesWritable(buffer). Same when you read from the sequenceFile.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top