Question

I have the following code in my rails app.

module UserItem
  class Rating
    include MongoMapper::Document
    key :user_id, Integer, :required => true
    key :item_id,  Integer, :required => true
    key :rating, Float, :required => true 
  end
end

And I have about 10K users and 10K items and i need to store rating of each user for each item, which is about 10^8 records. I have computed the values of 10^8 records into an array as follows

ratings = [
  {user_id: 1, item_id: 1, rating: 1.5}, 
  {user_id: 1, item_id: 2, rating: 3.5},
  ... and so on 10^8 records
]

Now, I need to insert all these 10^8 records computed into mongo. I tried with

UserItem::Rating.collection.insert(ratings)

and

UserItem::Rating.create(ratings)

But it takes hours together to insert the 10^8 records into mongo. Is there any better/efficient way to insert records into mongo?

Context: I am using it more like a cache store which stores all rating values. When I display list of items, I will just read from this cache and display the rating provided by the user alongside each item.

Any help is much appreciated!

Was it helpful?

Solution

One approach is to store one document per user, with a ratings field that is a hash of item ids to users, for example

class UserRating
  include MongoMapper::Document
  key :ratings
  key :user_id
end

UserRating.create(:user_id => 1, :ratings => {"1" => 4, "2" => 3})

You have to use string keys for the hash. This approach doesn't make it easy to retrieve all the ratings for a given document - if you do that a lot it might be easier to store a document per item instead. It's also probably not very efficient if you only ever need a small proportion of a user's ratings at a time.

Obviously you can combine this with other approaches to increasing write throughput, such as batching your inserts or sharding your database.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top