Question

We are working on a website which contains native literature. whole website is designed as writer centric. Each writer has 8000 - 10000 articles/poems/books.

Client requires mongoDB to be used as a backend for this application. As a newbie, am confused in data modelling in mongo.

My question is, what is best approach? embedded data model or normalised data model for my use case.

Writer:{
       _id: ObjectID
    WriterName: String
    Email: String
    Article :[
       _id: ObjectID
       ArticleName: String
       CreatedDate: Date
       comments: [
           body: String
       ]
    ]

Or

Writer: {
    _id: ObjectID
    WriterName: String
    Email: String
}

Articles: {
    _id: ObjectID
    Writer_id: ObjectID
    ArticleName: String
    CreatedDate: Date
    comments: [
        body: String
    ]
}

We have another use case where we need to retrieve top 20 articles from all the writers articles. keeping in this mind what is the best solution? Also please let me know what will be the impact of document, if its size exceeds 16MB.

Was it helpful?

Solution

That depends how much of your data is fixed, and how (often) it's updated.

If you constantly update your Article array (as in blogging systems), documents will eventually grow, wouldn't fit original disk space and will be moved by MongoDB on disk. This will cause storage size to increase massively, fragmentation and will harm performance (IO, indexes that have to updated with pointer to documents on file system). Plus these kind of documents tend to grow beyond 16 MB.

If it's a book catalog for example - where data seldom changes - embedding can be considered as it implies a more convenient / simple data model.

You also have a third option of embedding / adding writer data (Name, Email) inside Articles collection, leaving your application code to update all documents once a writer email changes, if you care about it.

So, if writer has 8000 - 10000 articles/poems/books (I expect those numbers to vary and you shouldn't count on this assumption), embedding option means unpredictable avg. document size and increasing padding (factor). I would go against embedding in that case.

As for your second concern, normalization in this case implies a slightly more concise query patterns: for example you don't have to slice an array in order to fetch 20 topmost articles.

OTHER TIPS

I think you should look closer at usage scenario. Usually (as it seems to me) if I'm looking at author info, I expect to see a list of books, author bio, etc. Though I think that it isn't necessary to store comments in the same document (and it will be a good idea to keep them separated if there will be a lot of them), because I don't need them immediately. So the first version of data model looks fine to me, except for comments. I would prefer to keep them in separated collection.

About maximum document size: 16MB is a lot of data, this limitation is made to ensure that document doesn't take too much RAM and network bandwith (if your mongodb is on separate server). Also I think that if your document size exceeds 16MB there's something wrong with your data model.

I don't know what exactly will happen in current version of mongodb if your document exceed 16MB, because I never faced such situations, but I assume that data will be trimmed.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top