Вопрос

I'm under the impression that a ByteArrayOutputStream is not memory efficient, since all it's contents are stored in memory.

Similarly, calling toByteArray on a large stream seems like it "scales poorly".

Why, then, in the example in The example in Tom White's book Hadoop: the Definitive Guide use them both:

    ByteArrayOutputStream out = new ByteArrayOutputStream;
    Decoder decoder = DecoderFactory().defaultFactory().createBinaryDecoder(out.toByteArray(), null);

Isn't "Big Data" the norm for Avro? What am I missing?

Edit 1: What I'm trying to do - Say I'm streaming avros over a websocket. What would the example look like if I wanted to deserialize multiple records, not just one that was put in it's own ByteArrayOutoputStream?

Is there a better way to supply BinaryDecoder with a byte[]? Or perhaps a different type of stream? Or I should be sending 1 record per stream instead of loading streams with multiple records?

Это было полезно?

Решение

ByteArrayOutputStream makes sense when dealing with small objects like small to medium images, or fixed-size request/response. It is in memory and doesn't touch the disk so this can be great for performance. It doesn't make any sense to use it for 1 TerraByte of data. Possibly this is a case of trying to keep an example in a book small and self-contained so as not to detract from the main point.


EDIT: Now that I see where your going I'd be looking to setup a pipeline. Pull a message off the stream (so I'm assuming you can get in InputStream from your HTTP object) and either process it with a memory-less method or throw it at a queue and have a thread-pool process the queue with a memory-less method. So the requirements for this are 1) being able to detect the boundary between Avro messages as you pull them off the stream and having the method for decoding.

The way to decode appears to be read the bytes for each message into a byte-array and hand that to your BinaryDecoder (after you find the message boundary).

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top