Question

I'm building a simple accounting system where a user has many bills. Now I'm trying to decide if bills should be its own collection, or nested within the user. I'm leaning towards the former but I've NEVER done any noSQL stuff so I'm just going by trial and error and what I think makes sense to me.

I understand that Mongo has a 4mb document size limit which is what's making me think that I should have a separate collection for bills, as these will accumulate daily and could eventually take up a large amount of space.

I'm just looking for opinions on the matter. Basically I'll be querying for bills of a user between different date periods (as you can imagine an accounting system would do).

Not that it really matters but I'm using Mongoid in a Rails3 project. I figured I'd do something like:

class User
  references_many :bills
end

class Bill
  referenced_in :user
end

Any comments or design suggestions are greatly appreciated.

Was it helpful?

Solution

1) Regarding the 4MB document limit, this is what the "MongoDB: The Definitive Guide" says :

Documents larger than 4MB (when converted to BSON) cannot be saved to the database. This is a somewhat arbitrary limit (and may be raised in the future); it is mostly to prevent bad schema design and ensure consistent performance. To see the BSON size (in bytes) of the document doc, run Object.bsonsize(doc) from the shell.

To give you an idea of how much 4MB is, the entire text of War and Peace is just 3.14MB.

In the end it depends on how big you expect the bills for a user to grow. I hope the excerpt above gives you an idea of the limits imposed by the document size.

2) De-normalized schema (bills go with the user document) is the way to go if you know that you are never going to run global queries on bills (example of such a query is if you want to retrieve the ten most recent bills entered into the system). You will have to use map-reduce to retrieve results for such queries if you use a denormalized schema.

Normalized schema (user and bills in separate documents) is a better choice if you want flexibility in how the bills are queried. However, since MongoDB doesn't support joins, you will have to run multiple queries every time you want to retrieve the bills corresponding to a user.

Given the use-case you mentioned, I'd go with de-normalized schema.

3) All updates in MongoDB are atomic and serialized. That should answer Steve's concern.

You may find these slides helpful. http://www.slideshare.net/kbanker/mongodb-meetup

You may also look at MongoDB's Production Deployments page. You may find the SF.net slides helpful.

OTHER TIPS

One question you might want to consider is will there ever be a time where you'll need to reference the bills individually apart from their membership in a user? If so, it'll be simpler if they have an independent existence.

Apart from that, the size limit issue you've already identified is a good reason to split them off.

There might be a transactional issue as well, if you're writing a large user with many included bills, what happens if you get reasonably simultaneous writes of changes to the same user from different connections? I don't know enough about mongo to know how it would resolve this - my guess would be that if the writes contained different added bills you'd get them both, but if they contained different changes in existing bills you'd get overwrites - Hopefully someone else will comment on this, but at the very least I'd test it. If you're writing the bills to a separate collection this isn't a concern.

Been a long time since this question has been addressed, but I was dealing with something similar and figured I would add my findings for anyone else researching this issue.

My understanding is that the 4MB document has been expanded to 16MB in versions 1.8+. This was from a video presentation by Banker, who is one of the MongoDB members. I have NOT verified this value, but am taking his word for it (since he hopefully knows what he's talking about).

As to the question about what happens when multiple updates occur on the same user with embedded bills...again from the same video presentation, the answer provided is that MongoDB updates the info so quickly that it's usually not an issue. The MongoDB instance is locked down while the updates take place, so multiple updates shouldn't be an issue.

A concern I had about embedded documents is that they cannot be treated independent of their parent document. This, in my opinion, makes embedded documents rather worthless. They are only useful for niche cases that meet specific use cases.

I personally found that MongoDB (and NoSQL DBs) are useful for particular cases but that traditional SQL/RDMSs are still better for the majority of issues. If you're someone like Craigslist and a schema alteration takes you 2 months to run on your archived data, then yes, MongoDB and NoSQL makes sense. but for the vast majority of apps I don't think handling that amount of data will be a major concern.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top