Question

I have records with a time value and need to be able to query them for a span of time and return only records at a given interval.

For example I may need all the records from 12:00 to 1:00 in 10 minute intervals giving me 12:00, 12:10, 12:20, 12:30, ... 12:50, 01:00. The interval needs to be a parameter and it may be any time value. 15 minutes, 47 seconds, 1.4 hours.

I attempted to do this doing some kind of reduce but that is apparently the wrong place to do it.

Here is what I have come up with. Comments are welcome.

Created a view for the time field so I can query a range of times. The view outputs the id and the time.

function(doc) { 
  emit([doc.rec_id, doc.time], [doc._id, doc.time]) 
}

Then I created a list function that accepts a param called interval. In the list function I work thru the rows and compare the current rows time to the last accepted time. If the span is greater or equal to the interval I add the row to the output and JSON-ify it.

function(head, req) { 

  // default to 30000ms or 30 seconds.
  var interval = 30000; 

  // get the interval from the request.
  if (req.query.interval) {
    interval = req.query.interval; 
  }

  // setup
  var row; 
  var rows = []; 
  var lastTime = 0; 

  // go thru the results...
  while (row = getRow()) { 
      // if the time from view is more than the interval 
      // from our last time then add it.
      if (row.value[1] - lastTime > interval) { 
          lastTime = row.value[1]; 
          rows.push(row); 
      } 
  } 
  // JSON-ify!
  send(JSON.stringify({'rows' : rows}));
}

So far this is working well. I will test against some large data to see how the performance is. Any comments on how this could be done better or would this be the correct way with couch?

Was it helpful?

Solution

CouchDB is relaxed. If this is working for you, then I'd say stick with it and focus on your next top priority.

One quick optimization is to try not to build up a final answer in the _list function, but rather send() little pieces of the answer as you know them. That way, your function can run on an unlimited result size.

However, as you suspected, you are using a _list function basically to do an ad-hoc query which could be problematic as your database size grows.

I'm not 100% sure what you need, but if you are looking for documents within a time frame, there's a good chance that emit() keys should primarily sort by time. (In your example, the primary (leftmost) sort value is doc.rec_id.)

For a map function:

function(doc) {
  var key = doc.time; // Just sort everything by timestamp.
  emit(key, [doc._id, doc.time]);
}

That will build a map of all documents, ordered by the time timestamp. (I will assume the time value is like JSON.stringify(new Date), i.e. "2011-05-20T00:34:20.847Z".

To find all documents within, a 1-hour interval, just query the map view with ?startkey="2011-05-20T00:00:00.000Z"&endkey="2011-05-20T01:00:00.000Z".

If I understand your "interval" criteria correctly, then if you need 10-minute intervals, then if you had 00:00, 00:15, 00:30, 00:45, 00:50, then only 00:00, 00:30, 00:50 should be in the final result. Therefore, you are filtering the normal couch output to cut out unwanted results. That is a perfect job for a _list function. Simply use req.query.interval and only send() the rows that match the interval.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top