Question

I am facing a problem here . Actually i have a very large dataset stored in mongo db. I have to perform some analysis on this dataset.

The data that i have in my database is like following:

     { type: 'go_to_page',
        params: { page: 'shouts' },
        _id: 52f7add2efaf195c0300ab0f,
        created: Sun Feb 09 2014 22:03:22 GMT+0530 (IST)
        user: ObjectId('34eesdfe3456efr345eee3');
     }

I have about one million rows in my database for the above dataset. Now i have to process the dataset using mongoose. The information that i have to extract is described below.

"Params" field in the above schema can take four values 'profile','people','shout' and 'event'. now if a user goes from profile page to people page, the time that the user spent on the profile page will be:

Time when the user arrived on the profile page - time when the user arrived on the people page.

Thus one can see that extracting rows one by one using mongoose cant help in getting the required information because the information extraction requires at least two rows.

Now the problem is that i have about one million rows in my database and there are about 600 distinct users in the database. For every user i have to find out how much time did he spent on each of the four pages "per day(date by date)". Current code which i have written takes about 20 minutes to do fetch only the usernames and their logs only which is not acceptable.

My exact current code looks like this:

    var sessionSchema = require('./model/sessions');
    var ContactSchema3 = require('./model/sessions');
    var ContactModel3 = mongoose.model('Contact3', ContactSchema3, 'logs');
    var SessionModel = mongoose.model('Session', sessionSchema, 'sessions');
     exports.session = function(req, res) {

     var query1 = SessionModel.find({}, {
     created: true
     }).sort({
       created: -1
     }).limit(1);

     /* query for executing the latest date */
      query1.exec(function(err, val) {
       if (!err) {
         console.log('there is error',err);
       }

       else {
         /* fetch the list of all users */
            var userObjId = ContactModel3.distinct('user');
             userObjId.exec(function(err, rslt1) {

               /* iterate over all users to fetch their logs in bunch of 1000 */
               rslt1.forEach(function(value, id) {
                 var fun = function(currentIndex) {
                  var que = ContactModel3.find({
                  user: value
                  }, {
                   type: 1,
                   params: 1,
                   created: 1
                  }).sort({
                  created: -1
                  }).skip(currentIndex).limit(500).exec(function(err, rslt) {
                   if (!err) {
                     if (rslt.length === 0 || rslt === undefined || rslt === [] || rslt === {} || rslt === null) {
                console.log('rslt while returning is ', rslt);
                   return;
                    } else {


                /* place for manipulation function */
          /* place for manipulation function ends here  */
                currentIndex += 500;
                fun(currentIndex);
              }
            } else {
              console.log('there is error', err);
              }
             });
            }
           fun(0);
          });
         });
        }
       } else {
       console.log('there is error');
      }
     });
    }

Can anyone help me in getting the result?

Was it helpful?

Solution

Yeah!! first keep in mind that you need to add an index on the created filled so that you can fetch the data easily as you have mentioned that you have millions of rows in your database.

Second you can get the time in milliseconds easily using date.getTime(). Thus you can get the time when the user arrived and he previous time in milliseconds and subtract them.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top