So for index deployment to optimize you want the following indexes created, or otherwise specified with the equivalent @CompoundIndexes
annotation on your class:
db.collection.ensureIndex({
"timestamp": 1, "userId": 1
})
db.collection.ensureIndex({
"timestamp": 1, "applicationName": 1, "country": 1
})
That comes from your comments for intended usage, so 2 indexes are required in total.
Also to mention that you want your "timestamp" values to be BSON Dates, in that way you get the date aggregation operators that are important to your actual queries. Just using the shell JavaScript form here for general reference:
db.collection.aggregate([
// Using the index that was created
{ "$match": {
"timestamp": {
"$gte": new Date("2014-04-01"), "$lt": new Date("2014-05-01")
},
"userId": { "$gte": "lowervalue", "$lte: "uppervalue" }
}},
// Grouping Data
{ "$group": {
"_id": {
"y": { "$year": "$timestamp" },
"m": { "$month": "$timestamp" },
"d": { "$day": "$timestamp" }
},
"someField": { "$sum": "$someField" },
"otherField": { "$avg": "$otherField" }
}}
])
So it is the "date aggregation operators" that allow you to split that BSON date into the components that you want (in this case day) so that all the timestamp values contained within those boundaries are subject to the other aggregation operations on the other fields that you have.
Please note that the indexes can only ever be used in the initial $match
stage of the aggregation pipeline, so this is importantly where you select your data and reduce your working set. But if you do things this way then you will be getting the maximum performance possible from your data.
For further gains, consider "pre-aggregating" information in other collections, based on periodically running the base forms of aggregation over the raw "log" data that you have.