Question

Recently I'm exploring NoSQL Databases. I need an advice about how to store data in the most optimal and efficient way for a given problem. I'm targeting MongoDB, now. However it should be the same with CouchDB.

Let's say we have these 3 Models:

Story:
 id
 title

User:
 id
 name

Vote:
  id
  story_id
  user_id

I want to be able to ask the database these questions:

  • Who has voted for this Story?
  • What this User has Voted for?

I'm doing simple joins while working with a relational DB. The question is, how should I store the data for those objects in order to be most efficient.

For example, if I store the Vote objects as a subcollection of Stories it wont be easy to get the info - "What a user has voted for".

Was it helpful?

Solution

I would suggest storing votes as a list of story _ids in each user. That way you can find out what stories a user has voted for just by looking at the list. To get the users who have voted for a story you can do something like:

db.users.find({stories: story_id})

where story_id is the _id of the story in question. If you create an index on the stories field both of those queries will be fast.

OTHER TIPS

  • don't worry if your queries are efficient until it starts to matter
  • according to below quote, you're doing it wrong

The way I have been going about the mind switch is to forget about the database alltogether. In the relational db world you always have to worry about data normalization and your table structure. Ditch it all. Just layout your web page. Lay them all out. Now look at them. Your already 2/3 there. If you forget the notion that database size matters and data shouldn't be duplicated than your 3/4 there and you didnt even have to write any code! Let your views dictate your Models. You don't have to take your objects and make them 2 dimensional anymore as in the relational world. You can store objects with shape now.

how-to-think-in-data-stores-instead-of-databases

Ok, you haven given a normalized data model as you would do in an SQL setup.

In my understanding you don't do this in MongoDB. You could store references, but you do not for performance reasons in the general case.

I'm not an expert in the NoSQL area in no way, but why don't you simply follow your needs and store the user (ids) that have voted for a story in the stories collection and the story (ids) a user has voted for in the users collection?

In CouchDB this is very simple. One view emits:

function(doc) {
 if(doc.type == "vote") {
   emit(doc.story_id, doc.user_id);
 }
}

Another view emits:

function(doc) {
 if(doc.type == "vote") {
   emit(doc.user_id, doc.story_id);
 }
}

Both are queries extremely fast since there is no join. If you do need user data or story data, CouchDB supports multi-document fetch. Also quite fast and is one way to do a "join".

I've been looking into MongoDB and CouchDB a lot lately, but my insight is limited. Still, when thinking about storing the votes inside the story document, you might have to worry about hitting the 4MB document size limit. Even if you don't, you might be constantly increasing the size of the document enough to cause it to get moved and thus slowing down your writes (see how documents are sized in MongoDB).

As for CouchDB, these kinds of things are quite simple, elegant, and quite fast once the view indexes are calculated. Personally, however, I have hesitated to do a similar project in CouchDB because of benchmarks showing it progressively slowing down to a considerable degree as the database grows (and the view indexes grow). I'd love to see some more recent benchmarks showing CouchDB performance as database size increases. I WANT to try MongoDB or CouchDB, but SQL still seems so efficient and logical, so I'll stay with it until the project fits the temptation just right.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top