You didn't say what time units you want the age in, but I'm just going to show you how to get it back in minutes and trust you can work out how to convert that to any other time grain. I'm going to assume original documents have schema like this:
{ _id: xxx,
post_id: uniqueId,
comments: [ { ..., date: ISODate() }, ..., { ... , date: ISODate() } ],
...
}
Now the aggregation:
// first you want to define some fixed point in time that you are calculating age from.
// I'm going to use a moment just before "now"
var now = new Date()-1
// unwind the comments array so you can work with individual comments
var unwind = {$unwind:"$comments"};
// calculate a new comment_age value
var project = {$project: {
post_id:1,
comment_age: {
$divide:[
{$subtract:[now, "$comments.date"]},
60000
]
}
} };
// group back by post_id calculating average age of comments
var group = {$group: {
_id: "$post_id",
age: {$avg: "$comment_age"}
} };
// now do the aggregation:
db.coll.aggregate( unwind, project, group )
You can use $max, $min, and other grouping function to find oldest and newest comment date or lowest/highest comment age. You can group by post_id or you can group by constant to find these calculations for the entire collection, etc.
* edit * Using the document you included for "library book" as example, this might be the pipeline to calculate for each book that's currently "Out" how long it's been out for, assuming that "comments.cal_date" is when it was checked out and that latest cal_date of all the comments represents the current "check-out" (the older ones having been returned):
db.coll.aggregate( [
{ $match : { status : "Out" } },
{ $unwind : "$comments" },
{ $group : { _id : "$_id",
cal_date : { $max : "$comments.cal_date" }
}
},
{ $project : { outDuration : { $divide : [
{ $subtract : [
ISODate("2013-07-15"),
"$cal_date"
]
},
24*60*60*1000
]
}
}
},
{ $group : { _id : 1,
avgOut : { $avg : "$outDuration" }
}
}
] )
What the steps are doing:
- filtering out documents based on
status
to make calculation about books that are currentlyOut
only. $unwind
to flatten out the "comments" array so that we can- find which entry is the latest
cal_date
with$group
and$max
. - use this max cal_date (which represents when the book was checked out) to subtract it from today's date and divide the result by number of milliseconds in a day to get number of days this book has been out
$group
all the results together to find the average number of days all the checked-out books have been out.
* edit * I was assuming you knew Ruby and just needed to know how to do an aggregation framework command to calculate date differences/averages/etc. Here is the same code in Ruby using "now" to compare cal_date to (you can also do it using a constant date value:
# get db collection from MongoClient into variable 'coll'
# see basic MongoDB Ruby driver tutorial for details
coll.aggregate([
{ "$match" => {"status"=>"Out"} },
{ "$unwind" => "$comments"},
{ "$group" => { "_id" => "$_id", "cal_date" => { "$max" => "$comments.cal_date" } } },
{ "$project"=> {
"outDuration" => {
"$divide" => [
{"$subtract" => [ Time.now, "$cal_date" ] },
24*60*60*1000
]
}
}
},
{ "$group" => {
"_id" => 1,
"avgOut" => {"$avg"=>"$outDuration"}
}
}
])
See https://github.com/mongodb/mongo-ruby-driver/wiki/Aggregation-Framework-Examples for more examples and explanations.
If there are additional fields that you want to preserve in your $group
phase you can add more fields by changing the pipeline step like this:
{ $group : { _id : "$_id",
barcode : { $first : "$barcode" },
cal_date : { $max : "$comments.cal_date" }
}
}
If you don't need the original _id
you can just use "$barcode" instead of "$_id" in the first line (that is _id: "$barcode"
) but since there may be multiple fields you want to preserve, $first
trick works with as many of them as you want to keep.