Upload a RAMDirectory to AzureCloud creates EOF exceptions
-
08-07-2021 - |
Question
I'm currently trying to use AzureBlobStorage to work with Lucene. So I created a new directory and to avoid too many latency I use a RAMDirectory as cache (this might not be the best solution but it seemed easy to do, I'm open to suggestions). Anyway everything seems to work quite well except when I write .nrm
files to the cloud which always raises EOFExceptions when I upload them to the blob.
I'm going to explain quickly how the directory works cause it will help to understand: I've created a new IndexOutput called BlobOutputStream
it pretty much encapsulate a RAMOutputStream
however when it is closed it uploads everything to the azureBlobStorage. Here is how this is done:
String fname = name;
output.flush();
long length = output.length();
output.close();
System.out.println("Size of the upload: " + length);
InputStream bStream = directory.openCachedInputAsStream(fname);
System.out.println("Uploading cache version of: " + fname);
blob.upload(bStream, length);
System.out.println("PUT finished for: " + fname);
blob
is a CloubBlockBlob
and output
is a RAMOutputStream
. directory.openCacheInputAsStream
opens a new InputStream
on an IndexInput
.
So everything works most of the time except with .nrm
files which always raise an EOFException
when they are being uploaded. Though I checked they are 5 bytes long when only one document is in the index and contains "NRM-1 and the norm for that document".
I don't really understand why Azure tries to upload more than it exists in the file when I've specified the size of the stream in the upload call.
I'm sorry if I'm not clear it's quite challenging to explain. Please tell me if you need more code I'll make everything accessible on a github or something.
Thanks for your answers
EDIT
So maybe the code of my inputStream
might show the problem:
public class StreamInput extends InputStream {
public IndexInput input;
public StreamInput(IndexInput openInput) {
input = openInput;
}
@Override
public int read() throws IOException {
System.out.println("Attempt to read byte: "+ input.getFilePointer());
int b = input.readByte();
System.out.println(b);
return b;
}
}
And here are the traces I get:
Size of the upload: 5
Uploading cache version of: _0.nrm
Attempt to read byte: 0
78
Attempt to read byte: 1
82
Attempt to read byte: 2
77
Attempt to read byte: 3
-1
Attempt to read byte: 4
114
Attempt to read byte: 5
Attempt to read byte: 1029
java.io.EOFException: read past EOF: RAMInputStream(name=_0.nrm)
at org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:100)
at org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:73)
at org.lahab.clucene.core.StreamInput.read(StreamInput.java:18)
at java.io.InputStream.read(InputStream.java:151)
at com.microsoft.windowsazure.services.core.storage.utils.Utility.writeToOutputStream(Utility.java:1024)
at com.microsoft.windowsazure.services.blob.client.BlobOutputStream.write(BlobOutputStream.java:560)
at com.microsoft.windowsazure.services.blob.client.CloudBlockBlob.upload(CloudBlockBlob.java:455)
at com.microsoft.windowsazure.services.blob.client.CloudBlockBlob.upload(CloudBlockBlob.java:374)
at org.lahab.clucene.core.BlobOutputStream.close(BlobOutputStream.java:92)
at org.apache.lucene.util.IOUtils.close(IOUtils.java:141)
at org.apache.lucene.index.NormsWriter.flush(NormsWriter.java:172)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:71)
at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:581)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3587)
at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:3376)
at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3485)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3467)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3451)
at org.lahab.clucene.server.IndexerNode.addDocuments(IndexerNode.java:139)
It really seems like that the upload just goes too far...
Solution
So the problem was my inputStream and also the fact I can't read a doc and cast a byte ;).
My read function should be:
System.out.println("file:" + input.getFilePointer() + "/" + input.length());
if (input.getFilePointer() >= input.length()) {
return -1;
}
System.out.println("Attempt to read byte: "+ input.getFilePointer());
int b = (int) input.readByte() & 0xff;
System.out.println(b);
return b;
the javadoc says about inputStream.read():
Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned. This method blocks until input data is available, the end of the stream is detected, or an exception is thrown.
And then the & 0xff
is to mask the sign bit