Question

What're the best practices to store nested sets (like trees of comments) in MongoDB?

I mean, every comment can have a parent comment and children-comments (answers).

Storing them like this:

{
   title: "Hello",
   body: "Please comment me!",
   comments: [
        {
            author: "Peter",
            text: "Hi there",
            answers: [
                  {
                      author: "Peter",
                      text: "Hi there",
                      answers: [
                                 { author: "Ivan", text: "Hi there" },
                                 { author: "Nicholas", text: "Hi there" }
                      ]
                  },
                  { author: "Ivan", text: "Hi there" },
                  { author: "Nicholas", text: "Hi there" },
            ]
        },
        { author: "Ivan", text: "Hi there" },
        { author: "Nicholas", text: "Hi there" },
   ]
}

is not cool, because we can't, for example, ask for "all post which are commented by Peter" without map/reduce.

Was it helpful?

Solution

I think there is no perfect solution - depends on what operations are more important for your app. I believe Silicon Alley Insider stores comments nested with MongoDB for example. That does make the query you mention harder.

One option is store at top-level in the post a list of all commenters in an array. Think of that as denormalized data. Then one can easily find all posts which involve a certain commenter. Then to drill down, you use map/reduce or db.eval() to get the nested post info within.

One other note - if you are dealing with a single document, db.eval() is probably lighter-weight than map/reduce. $where is also an option but can be slow so I like the additional 'list of commenters' mentioned above - not it is also easy to index that array too (see 'Multikey' in the docs).

See also: http://groups.google.com/group/mongodb-user/browse_thread/thread/df8250573c91f75a/e880d9c57e343b52?lnk=gst&q=trees#e880d9c57e343b52

OTHER TIPS

In the link from dm's post Dwight Merriman mentions using a path key and doing regex matches

{
  path : "a.b.c.d.e.f"
}

Another way to do this would be with arrays

{
  path : ["a", "b", "c", "d", "e", "f"]
}

db.test.ensureIndex({path: 1})

that should make it pretty fast.

if each node can only be in a single path then you wouldn't need to do worry about where it is located in the list

db.test.find({path: "a"})

would find all children of "a"

Instead of path names I would probably use the _id of the nodes.

Update

  • one thing to be careful of is that an index can only have one array in it.
  • Be careful to use explain on your queries

    db.test.find({path: {$in: ["a", "b"]})

gives you

 db.test.find({path: {$in: ["a", "b"]}}).explain()
{
        "cursor" : "BtreeCursor path_1 multi",
        "nscanned" : 2,
        "nscannedObjects" : 2,
        "n" : 1,
        "millis" : 0,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "isMultiKey" : true,
        "indexOnly" : false,
        "indexBounds" : {
                "path" : [
                        [
                                "a",
                                "a"
                        ],
                        [
                                "b",
                                "b"
                        ]
                ]
        }
}

but

 db.test.find({path: {$all: ["a", "b"]}}).explain()
{
        "cursor" : "BtreeCursor path_1",
        "nscanned" : 1,
        "nscannedObjects" : 1,
        "n" : 1,
        "millis" : 0,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "isMultiKey" : true,
        "indexOnly" : false,
        "indexBounds" : {
                "path" : [
                        [
                                "a",
                                "a"
                        ]
                ]
        }
}

only uses the first element and then scans all matching results for b.
If a is your root element or is in most of your records then your doing a nearly full scan of the records instead of an efficient index query.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top