Understanding MongoDB BSON Document size limit

https://stackoverflow.com/questions/4667597

mongodb
bson

10-10-2019
|

Question

From MongoDB The Definitive Guide:

Documents larger than 4MB (when converted to BSON) cannot be saved to the database. This is a somewhat arbitrary limit (and may be raised in the future); it is mostly to prevent bad schema design and ensure consistent performance.

I don't understand this limit, does this mean that A Document containing a Blog post with a lot of comments which just so happens to be larger than 4MB cannot be stored as a single document?

Also does this count the nested documents too?

What if I wanted a document which audits the changes to a value. (It will eventually may grow, exceeding 4MB limit.)

Hope someone explains this correctly.

I have just started reading about MongoDB (first nosql database I'm learning about).

Thank you.

Solution

First off, this actually is being raised in the next version to 8MB or 16MB ... but I think to put this into perspective, Eliot from 10gen (who developed MongoDB) puts it best:

EDIT: The size has been officially 'raised' to 16MB

So, on your blog example, 4MB is actually a whole lot.. For example, the full uncompresses text of "War of the Worlds" is only 364k (html): http://www.gutenberg.org/etext/36

If your blog post is that long with that many comments, I for one am not going to read it :)

For trackbacks, if you dedicated 1MB to them, you could easily have more than 10k (probably closer to 20k)

So except for truly bizarre situations, it'll work great. And in the exception case or spam, I really don't think you'd want a 20mb object anyway. I think capping trackbacks as 15k or so makes a lot of sense no matter what for performance. Or at least special casing if it ever happens.

-Eliot

I think you'd be pretty hard pressed to reach the limit ... and over time, if you upgrade ... you'll have to worry less and less.

The main point of the limit is so you don't use up all the RAM on your server (as you need to load all MBs of the document into RAM when you query it.)

So the limit is some % of normal usable RAM on a common system ... which will keep growing year on year.

Note on Storing Files in MongoDB

If you need to store documents (or files) larger than 16MB you can use the GridFS API which will automatically break up the data into segments and stream them back to you (thus avoiding the issue with size limits/RAM.)

Instead of storing a file in a single document, GridFS divides the file into parts, or chunks, and stores each chunk as a separate document.

GridFS uses two collections to store files. One collection stores the file chunks, and the other stores file metadata.

You can use this method to store images, files, videos, etc in the database much as you might in a SQL database. I have used this to even store multi gigabyte video files.

OTHER TIPS

Many in the community would prefer no limit with warnings about performance, see this comment for a well reasoned argument: https://jira.mongodb.org/browse/SERVER-431?focusedCommentId=22283&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-22283

My take, the lead developers are stubborn about this issue because they decided it was an important "feature" early on. They're not going to change it anytime soon because their feelings are hurt that anyone questioned it. Another example of personality and politics detracting from a product in open source communities but this is not really a crippling issue.

To post a clarification answer here for those who get directed here by Google.

The document size includes everything in the document including the subdocuments, nested objects etc.

So a document of:

{
    _id:{},
    na: [1,2,3],
    naa: [
        {w:1,v:2,b:[1,2,3]},
        {w:5,b:2,h:[{d:5,g:7},{}]}
    ]
}

Has a maximum size of 16meg.

Sbudocuments and nested objects are all counted towards the size of the document.

I have not yet seen a problem with the limit that did not involve large files stored within the document itself. There are already a variety of databases which are very efficient at storing/retrieving large files; they are called operating systems. The database exists as a layer over the operating system. If you are using a NoSQL solution for performance reasons, why would you want to add additional processing overhead to the access of your data by putting the DB layer between your application and your data?

JSON is a text format. So, if you are accessing your data through JSON, this is especially true if you have binary files because they have to be encoded in uuencode, hexadecimal, or Base 64. The conversion path might look like

binary file <> JSON (encoded) <> BSON (encoded)

It would be more efficient to put the path (URL) to the data file in your document and keep the data itself in binary.

If you really want to keep these files of unknown length in your DB, then you would probably be better off putting these in GridFS and not risking killing your concurrency when the large files are accessed.

Nested Depth for BSON Documents: MongoDB supports no more than 100 levels of nesting for BSON documents.

More more info vist

Perhaps storing a blog post -> comments relation in a non-relational database is not really the best design.

You should probably store comments in a separate collection to blog posts anyway.

[edit]

See comments below for further discussion.

According to https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1

If you expect that a blog post may exceed the 16Mb document limit, you should extract the comments into a separate collection and reference the blog post from the comment and do an application-level join.

// posts
[
  {
    _id: ObjectID('AAAA'),
    text: 'a post',
    ...
  }
]

// comments
[
  {
    text: 'a comment'
    post: ObjectID('AAAA')
  },
  {
    text: 'another comment'
    post: ObjectID('AAAA')
  }
]

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow