Question

Does anyone know how I can store large binary values in Riak?

Was it helpful?

Solution

For now, they don't recommend storing files larger than 50MB in size without splitting them. See: FAQ - Riak Wiki

If your files are smaller than 50MB, than proceed as you would with storing non binary data in Riak.

Another reason one might pick Riak is for flexibility in modeling your data. Riak will store any data you tell it to in a content-agnostic way — it does not enforce tables, columns, or referential integrity. This means you can store binary files right alongside more programmer-transparent formats like JSON or XML. Using Riak as a sort of “document database” (semi-structured, mostly de-normalized data) and “attachment storage” will have different needs than the key/value-style scheme — namely, the need for efficient online-queries, conflict resolution, increased internal semantics, and robust expressions of relationships.Schema Design in Riak - Introduction

OTHER TIPS

@Brian Mansell's answer is on the right track - you don't really want to store large binary values (over 50 MB) as a single object, in Riak (the cluster becomes unusably slow, after a while).

You have 2 options, instead:

1) If a binary object is small enough, store it directly. If it's over a certain threshold (50 MB is a decent arbitrary value to start with, but really, run some performance tests to see what the average object size is, for your cluster, after which it starts to crawl) -- break up the file into several chunks, and store the chunks separately. (In fact, most people that I've seen go this route, use chunks of 1 MB in size). This means, of course, that you have to keep track of the "manifest" -- which chunks got stored where, and in what order. And then, to retrieve the file, you would first have to fetch the object tracking the chunks, then fetch the individual file chunks and reassemble them back into the original file. Take a look at a project like https://github.com/podados/python-riakfs to see how they did it.

2) Alternatively, you can just use Riak CS (Riak Cloud Storage), to do all of the above, but the code is written for you. That's exactly how RiakCS works -- it breaks an incoming file into chunks, stores and tracks them individually in plain Riak, and reassembles them when it comes time to fetch it back. And provides an Amazon S3 API for file storage, for your convenience. I highly recommend this route (so as not to reinvent the wheel -- chunking and tracking files is hard enough). Yes, CS is a paid product, but check out the free Developer Trial, if you're curious.

Just like every other value. Why would it be different?

Use either the Erlang interface ( http://hg.basho.com/riak/src/461421125af9/doc/basic-client.txt ) or the "raw" HTTP interface ( http://hg.basho.com/riak/src/tip/doc/raw-http-howto.txt ). It should "just work."

Also, you'll generally find a better response on the riak-users mailing list than you will here. http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com (No offense to z8000, who seems to also have answers.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top