Question

So, I've been trying out the php framework lithium now and it seems like a really good framework and all but I have a slight problem. A query I run on a collection with only 6k+ documents is amazingly slow from php but blazingly fast when I run it from the terminal.

One document in the collection may look like this:

{
    "_id" : ObjectId("504c9a3b6070d8b7ea61938e"),
    "startDate" : "Jan 2011",
    "episodes" : [
        {
            "title" : "Series 1, Episode 1",
            "airdate" : ISODate("2011-01-20T00:00:00Z"),
            "epnum" : "1",
            "prodnum" : null,
            "seasonnum" : "01",
            "link" : "http://www.tvrage.com/10_OClock_Live/episodes/1065007783"
        },
        {and maybe 20 more},
    ],
    "runTime" : "60 min",
    "endDate" : "Apr 2012",
    "network" : "Channel 4",
    "numberOfEpisodes" : "25 eps",
    "title" : "10 O'Clock Live",
    "directory" : "10OClockLive",
    "country" : "UK",
    "tvrage" : "27363"
}

I want to get all episodes that exists for this current month. So in the terminal (I use fake values and more than a month) I use the following query:

db.series.find({'episodes.airdate': {$gt: ISODate('2012-09-07 00:00:00'), $lt: ISODate('2012-11-01')}})

And wham, it just goes very fast. Even if I do an explain() on the query it tells me that it's fast:

{
    "cursor" : "BtreeCursor episodes.airdate_1",
    "isMultiKey" : true,
    "n" : 382,
    "nscannedObjects" : 1620,
    "nscanned" : 1620,
    "nscannedObjectsAllPlans" : 1620,
    "nscannedAllPlans" : 1620,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    **"millis" : 181**,
    "indexBounds" : {
        "episodes.airdate" : [
            [
                ISODate("2012-09-07T00:00:00Z"),
                ISODate("292278995-01--2147483647T07:12:56.808Z")
            ]
        ]
    },
    "server" : "example:27017"
}

But when I use the query inside php and lithium, man, it take ages:

$series = Series::find('all', array(
                'fields' => array('title', 'episodes.title', 'episodes.airdate'),
                'conditions' => array('episodes.airdate' => array('$gt' => new MongoDate(strtotime(date('Y-m-01'))), '$lt' =>  new MongoDate(strtotime(date('Y-m-t')))))
            ));

And if I even try to loop through it, then it's even worse well past the 30 second execution time. All though, I think I have a memory leak since I had to add this ini_set('memory_limit', '-1'); without getting a "maxium usage" or whatever.

Could anyone provide me with a answer on why this is happening? Is there any way to improve the speed of the query? I have no idea why it is so slow and I would be super glad if anyone could point me in the right direction.

Was it helpful?

Solution

The issue is that Lithium boxes all the data in objects, which for large queries can be very memory-intensive, hence slow. If you don't need any ActiveRecord features for that particular query, there's an option you can pass to find() which gets passed to MongoDb::read() (so check the docs for MongoDb::read()) that allows you to get back either a raw array, or the actual database cursor which you can iterate over manually.

The other option is to wait till I implement streaming iteration, which will solve the memory problem. :-)

OTHER TIPS

I'm not sure why this is slow for you. I have a gist here with a class that will log insert, read, and update mongo commands issued from lithium. You could probably add some type of timer to that to get the length of each query. Then you could at least know if the problem is waiting for mongo or other parts of the code.

Here is some code for iterating over a DocumentSet while discarding each document retrieved from the MongoCursor as you loop.

$docs = SomeModel::all();
while ($docs->valid()) {
    $key = $docs->key();
    $doc = $docs->current();
    unset($docs[$key]);
    $docs->rewind();

    if (!$docs->valid()) {
        $docs->next();
    }

    // ... do stuff with $doc here ...
}

I just solved an issue where a page was taking us more than 65 seconds to load. Turns out the user record for this particular user had an array with 152 records, and each array item was very big, so probably this account exceeded the mongodb record limit of 65,000 characters. When I deleted the large array from the user account, suddenly the page is loading at 4.5 seconds.

The thing is -- the content on the page that was being loaded was unrelated to this user record, so we were working with the queries for that content to try and speed it up. Then we find out the bug is completely unrelated to all of that, and it was due to this other issue.

So make sure your records don't get too big.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top