Question

I am using a 3 node Riak Cluster with a number of buckets. I have a bucket called user_account which has user information like name, address etc.

user_account = {"id" => 1,"name"=>"abc", "address"=>"xyz"}

There is another bucket user_metadata which updates this data. A record in it looks like:

user_metadata = {"id" => 1, "created_at"=>"20140304", "updated_at" => "20140304"}

Both the buckets have the same key which is the id. I want to run map_reduce on bucket user_account for all the keys which are updated within a date range in user_metadata. Is there a way I can do this, also currently there is no index for updated_at key in user_metadata bucket.

Was it helpful?

Solution

As there are no joins in Riak, the common way to address this is to de-normalise your data model. In your case there is a 1-to-1 relationship between these records, which suggests it might be a good idea to merge them. This may also allow you to define secondary indexes that can improve performance of your MapReduce job and more efficiently define suitable input for it. It may also help you retrieve data from Riak more efficiently as it is usually preferable to retrieve a few larger objects rather than a large number of smaller ones.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top