Question

I'm currently trying to use AzureBlobStorage to work with Lucene. So I created a new directory and to avoid too many latency I use a RAMDirectory as cache (this might not be the best solution but it seemed easy to do, I'm open to suggestions). Anyway everything seems to work quite well except when I write .nrm files to the cloud which always raises EOFExceptions when I upload them to the blob.

I'm going to explain quickly how the directory works cause it will help to understand: I've created a new IndexOutput called BlobOutputStream it pretty much encapsulate a RAMOutputStream however when it is closed it uploads everything to the azureBlobStorage. Here is how this is done:

String fname = name;
output.flush();
long length = output.length();
output.close();
System.out.println("Size of the upload: " + length);
InputStream bStream = directory.openCachedInputAsStream(fname);
System.out.println("Uploading cache version of: " + fname);
blob.upload(bStream, length);
System.out.println("PUT finished for: " + fname);

blob is a CloubBlockBlob and output is a RAMOutputStream. directory.openCacheInputAsStream opens a new InputStream on an IndexInput.

So everything works most of the time except with .nrm files which always raise an EOFException when they are being uploaded. Though I checked they are 5 bytes long when only one document is in the index and contains "NRM-1 and the norm for that document".

I don't really understand why Azure tries to upload more than it exists in the file when I've specified the size of the stream in the upload call.

I'm sorry if I'm not clear it's quite challenging to explain. Please tell me if you need more code I'll make everything accessible on a github or something.

Thanks for your answers

EDIT

So maybe the code of my inputStream might show the problem:

public class StreamInput extends InputStream {
public IndexInput input;

public StreamInput(IndexInput openInput) {
    input = openInput;
}

@Override
public int read() throws IOException {
    System.out.println("Attempt to read byte: "+ input.getFilePointer());
    int b = input.readByte();
    System.out.println(b);
    return b;
}
}

And here are the traces I get:


Size of the upload: 5
Uploading cache version of: _0.nrm
Attempt to read byte: 0
78
Attempt to read byte: 1
82
Attempt to read byte: 2
77
Attempt to read byte: 3
-1
Attempt to read byte: 4
114
Attempt to read byte: 5
Attempt to read byte: 1029
java.io.EOFException: read past EOF: RAMInputStream(name=_0.nrm)
    at org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:100)
    at org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:73)
    at org.lahab.clucene.core.StreamInput.read(StreamInput.java:18)
    at java.io.InputStream.read(InputStream.java:151)
    at com.microsoft.windowsazure.services.core.storage.utils.Utility.writeToOutputStream(Utility.java:1024)
    at com.microsoft.windowsazure.services.blob.client.BlobOutputStream.write(BlobOutputStream.java:560)
    at com.microsoft.windowsazure.services.blob.client.CloudBlockBlob.upload(CloudBlockBlob.java:455)
    at com.microsoft.windowsazure.services.blob.client.CloudBlockBlob.upload(CloudBlockBlob.java:374)
    at org.lahab.clucene.core.BlobOutputStream.close(BlobOutputStream.java:92)
    at org.apache.lucene.util.IOUtils.close(IOUtils.java:141)
    at org.apache.lucene.index.NormsWriter.flush(NormsWriter.java:172)
    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:71)
    at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
    at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:581)
    at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3587)
    at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:3376)
    at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3485)
    at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3467)
    at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3451)
    at org.lahab.clucene.server.IndexerNode.addDocuments(IndexerNode.java:139)

It really seems like that the upload just goes too far...

Was it helpful?

Solution

So the problem was my inputStream and also the fact I can't read a doc and cast a byte ;). My read function should be:

System.out.println("file:" + input.getFilePointer() + "/" + input.length());
if (input.getFilePointer() >= input.length()) {
    return -1;
}
System.out.println("Attempt to read byte: "+ input.getFilePointer());
int b = (int) input.readByte() & 0xff;
System.out.println(b);
return b;

the javadoc says about inputStream.read():

Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned. This method blocks until input data is available, the end of the stream is detected, or an exception is thrown.

And then the & 0xff is to mask the sign bit

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top