Question

Background

I am storing table rows as MongoDb documents, with each column having a name. Let's say table has these columns of interest: Identifier, Person, Date, Count. The MongoDb document also has some extra fields separate from the table data, represented by timestamp. Columns are not fixed (which is why I use schema-free database to store them in the first place).

There will be need to do various complex, but so far unspecified queries. I am not very concerned about performance, though query performance may conceivably become a bottleneck. Once inserted, documents will not be modifed (a new document with same Identifier will be created instead), and insertions are not very frequent (let's say, 1000 new MongoDb documents per day). So amount of data will steadily grow over time.

Example

The straight-forward approach is having a collection of MongoDb documents like:

{
   _id: XXXX,
   insertDate: ISODate("2012-10-15T21:26:17Z"),
   flag: true,
   data: {
       Identifier: "AB002",
       Person: "John002",
       Date: ISODate("2013-11-16T21:26:17Z"),
       Count: 1
   }
}

Now I have seen an alternative approach (for example in accepted answer of this question), using array with two fields per object:

{
   _id: XXXX,
   insertDate: ISODate("2012-10-15T21:26:17Z"),
   flag: true,
   data: [
       { field: "Identifier", value: "AB002" },
       { field: "Person", value: "John001" },
       { field: "Date", value: ISODate("2013-11-16T21:26:17Z") },
       { field: "Count", value: 1 }
   ]
}

Questions

Does the 2nd approach make any sense at all?

If yes, then how to choose which to use? Especially, are there some specific kinds of queries which are easy/cheap with one approach, hard/costly with another? Any "rules of thumb" on which way to go, or pro-con lists for both? Example real-life cases of one aproach being inconvenient would be especially valuable.

Was it helpful?

Solution

In your specific example the First version is a lot more appropriate and simple. You have to think in terms of how you would query your document.

It is a lot simpler to query your database like this: db.collection.find({"data.Identifier": "AB002"})

Although I'm not 100% sure why you even need the inner document. Why can't structure your document like:

{
   _id:  "AB002",
   insertDate: ISODate("2012-10-15T21:26:17Z"),
   flag: true,
   Person: "John002",
   Date: ISODate("2013-11-16T21:26:17Z"),
   Count: 1
}

Pros of first example:

  • Simple to query
  • Enforces unique keys, but your data won't have two columns with the same name anyway
  • I would assume mongoDB would generate better query plans because the structure is a lot more simple (haven't tested)

Pros of second example:

  • Allows multiple entries with the same key/field, but I don't feel that is useful in your case
  • A single index on the array can be used for all of its entries regardless of their field name

OTHER TIPS

I don't think that the situation in the other example here and yours are the same. In the other example, they're creating a list of items with one of two answers, which would be more appropriately in an array, and the goal is to return a list of subdocuments that match the criteria. In your example, you're really just describing an object since they all hold different types of information, and you won't need to retrieve searchable bits of the subdocuments.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top