Question

I am trying read data from a Avro File file stored in HDFS. Now so far I am able to read the entire data by using DataFileReader or DataFileStream. Now I want to implement pagination. Is there any specific way to do it ?

I have already gone through their basic documentations and as per my understanding I think this can be done by using Synchronization Marker. I have tried by :

SeekableInput seekableInput = new AvroFSInput(dataInputStream, 5);    
    DatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>();
    DataFileReader<GenericRecord> fileReader = new DataFileReader<GenericRecord>(seekableInput, datumReader);
    fileReader.seek(startOffset);  // set to the start-offset
    while (fileReader.hasNext() && !fileReader.pastSync(endOffset)) {
          GenericRecord gr = fileReader.next();
          System.out.println(gr);
    }

But this code giving me a :

Exception in thread "main" org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync!
    at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210)
    at com.globalids.test.AvroTest.deserializeWithPageing(AvroTest.java:112)
    at com.globalids.test.AvroTest.main(AvroTest.java:45)
Caused by: java.io.IOException: Invalid sync!
    at org.apache.avro.file.DataFileStream.nextRawBlock(DataFileStream.java:293)
    at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:198)
    ... 2 more 

I have also tried setting sync interval during data writing process. Also tried to call sync() method after each record is inserted to the file using DataFileWriter. Can anyone point me out what I'm doing wrong ?

Thank you in advance.

Was it helpful?

Solution

You need to call sync() instead of seek() if startOffset is not from valid position in file :

SeekableInput seekableInput = new AvroFSInput(dataInputStream, 5);    
DatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>();
DataFileReader<GenericRecord> fileReader = new DataFileReader<GenericRecord>(seekableInput, datumReader);

**fileReader.sync(startOffset);**

while (fileReader.hasNext() && !fileReader.pastSync(endOffset)) {
    GenericRecord gr = fileReader.next();
    System.out.println(gr);
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top