Question

I would like to stream some files in and out of cassandra since we already use it rather than setting up a full hadoop distributed filesystem. Is there any asynchronous puts in atyanax or hector that I provide a callback for when it is complete so I can avoid the 1 ms network delays for 1000 calls as I write 1000 entries(split between a few rows and colums as well so it is streamed to a few servers in parallel and then all the responses/callbacks come back when done streaming). Does Hector or astyanax support this?

It looks like astyanax supports a query callback so I think I can get with the primary keys to stream the file back with astyanax?

thanks, Dean

Était-ce utile?

La solution

Cassandra doesn't actually support streaming via the thrift API. Furthermore, breaking up the file into a a single mutation batch that spreads data across multiple row and columns can be very dangerous. That could result in blowing the heap on cassandra or you may also run into the 1MB socket write buffer limit which under certain error cases can actually cause your thrift connection to hang indefinitely (although I think this may be fixed in the latest version of cassandra).

The new chunked object store recipe in Astyanax (https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store) builds on our experience at Netflix with storing large objects in Cassandra and provides a simple API that handles all the chunking and parallelization for you. It could still make 1000's of calls to cassandra (depending on your file size and chunk size) but also handles all the retries and parallelization for you. The same goes for reading files. The API will read the chunks and reassemble them in order into an OutputStream.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top