Data modelling in MongoDB

Question 1

That depends how much of your data is fixed, and how (often) it's updated.

If you constantly update your Article array (as in blogging systems), documents will eventually grow, wouldn't fit original disk space and will be moved by MongoDB on disk. This will cause storage size to increase massively, fragmentation and will harm performance (IO, indexes that have to updated with pointer to documents on file system). Plus these kind of documents tend to grow beyond 16 MB.

If it's a book catalog for example - where data seldom changes - embedding can be considered as it implies a more convenient / simple data model.

You also have a third option of embedding / adding writer data (Name, Email) inside Articles collection, leaving your application code to update all documents once a writer email changes, if you care about it.

So, if writer has 8000 - 10000 articles/poems/books (I expect those numbers to vary and you shouldn't count on this assumption), embedding option means unpredictable avg. document size and increasing padding (factor). I would go against embedding in that case.

As for your second concern, normalization in this case implies a slightly more concise query patterns: for example you don't have to slice an array in order to fetch 20 topmost articles.

Question 2

I think you should look closer at usage scenario. Usually (as it seems to me) if I'm looking at author info, I expect to see a list of books, author bio, etc. Though I think that it isn't necessary to store comments in the same document (and it will be a good idea to keep them separated if there will be a lot of them), because I don't need them immediately. So the first version of data model looks fine to me, except for comments. I would prefer to keep them in separated collection.

About maximum document size: 16MB is a lot of data, this limitation is made to ensure that document doesn't take too much RAM and network bandwith (if your mongodb is on separate server). Also I think that if your document size exceeds 16MB there's something wrong with your data model.

I don't know what exactly will happen in current version of mongodb if your document exceed 16MB, because I never faced such situations, but I assume that data will be trimmed.