문제

I am trying read data from a Avro File file stored in HDFS. Now so far I am able to read the entire data by using DataFileReader or DataFileStream. Now I want to implement pagination. Is there any specific way to do it ?

I have already gone through their basic documentations and as per my understanding I think this can be done by using Synchronization Marker. I have tried by :

SeekableInput seekableInput = new AvroFSInput(dataInputStream, 5);    
    DatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>();
    DataFileReader<GenericRecord> fileReader = new DataFileReader<GenericRecord>(seekableInput, datumReader);
    fileReader.seek(startOffset);  // set to the start-offset
    while (fileReader.hasNext() && !fileReader.pastSync(endOffset)) {
          GenericRecord gr = fileReader.next();
          System.out.println(gr);
    }

But this code giving me a :

Exception in thread "main" org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync!
    at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210)
    at com.globalids.test.AvroTest.deserializeWithPageing(AvroTest.java:112)
    at com.globalids.test.AvroTest.main(AvroTest.java:45)
Caused by: java.io.IOException: Invalid sync!
    at org.apache.avro.file.DataFileStream.nextRawBlock(DataFileStream.java:293)
    at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:198)
    ... 2 more 

I have also tried setting sync interval during data writing process. Also tried to call sync() method after each record is inserted to the file using DataFileWriter. Can anyone point me out what I'm doing wrong ?

Thank you in advance.

도움이 되었습니까?

해결책

You need to call sync() instead of seek() if startOffset is not from valid position in file :

SeekableInput seekableInput = new AvroFSInput(dataInputStream, 5);    
DatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>();
DataFileReader<GenericRecord> fileReader = new DataFileReader<GenericRecord>(seekableInput, datumReader);

**fileReader.sync(startOffset);**

while (fileReader.hasNext() && !fileReader.pastSync(endOffset)) {
    GenericRecord gr = fileReader.next();
    System.out.println(gr);
}
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top