Question

We have a mongo collection that can be search&sorted on many fields. For example (sorry I can't put the real collection due to confidentiality), let's take:

MathProblem
{
    Creator : String,
    Difficulty : integer (from 0 to 4),
    Categorie : integer (from 0 to 40),
    NbOfQuestion : integer (less than 20),
    Likes : integer,
    Dislikes : integer,
    Succeeded : integer,
    Failures : integer
}

We can search on Creator, Difficulty, Categorie , NbOfQuestion . And sort by likes, dislikes, succeeded and failures.

Ex:

  • Give me the problems of difficulty 3, with a categorie 20 sort them by # of likes.
  • Give me the problems with 5 questions sort them by failures.
  • Give me problems with difficulty 1, categorie 10, 2 questions and created by Einstein.
  • Give me all problems sorted by Succeeded.

Etc... you get the picture all the permutation are possible and we optionaly sort on one field.

Problems here, is that we have millions of record. Indexing cost us at least 30 gig. Also, because we have so many indexes, it crushes the writing speed on this collection. And while it's crushing the writing, it's locking the reading. So we have a lot of read and a possibly little bit less of write, but still a lot.

I search for a "search engine solution", but I could only find thing on "full text search" which is not my case.

We also tried to merge the difficulty, Categorie and NbOfQuestion in one array (by multiplying value by a 10 factor to keep them appart) to index only on this array and save some space.

Any hindsight will be greatly appreciated!

Thanks,

Charles

Was it helpful?

Solution

Charles,

There is no simple solution to your issue. Probably the best solution for now would be to scale outwards using MongoDB's sharding functionality. http://www.mongodb.org/display/DOCS/Sharding+Introduction

The aim here would be to split your working set over a number of machines to lower the amount of data that would be "crushed" by one write. Additionally i can suggest upgrading to atleast MongoDB v 2.0.7 as there are number of new features (such as yield-on-long-operation and yield-on-page-fault) in the 2.X branch which are designed to reduce the impact of write locks on your system. More info on MongoDB concurrency can be found http://www.mongodb.org/display/DOCS/How+does+concurrency+work

Cheers,

David

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top