For performance, you really need to move the $sort
to be before the unwind - it will get combined with $match
and use an appropriate index (which would be {symbol:1,ts_minute:1} so I hope you have that index available). The project should go after unwind
to create the price and volume fields you need for aggregation. It seems that you should just group by ts_minute directly. The changes to make would be:
query = {'$match': {'symbol': 'AAPL'}}
sort = {'$sort': {'ts_minute': 1}}
unwind = {'$unwind': '$ticks'}
projection = {
'$project': {
'symbol': 1,
'ts_minute': 1,
'volume' : { '$cond' : [
{"$eq" : ["$ticks.t",null]},
"$ticks.v",
0
] },
"price" : { "$cond" : [
{"$eq" : ["$ticks.t",null] },
null,
"$ticks.v"
] }
}
}
group = {
'$group': {
'_id': {
'symbol': '$symbol',
'minute': '$ts_minute'
},
'open': {'$first': '$price'},
'high': {'$max': '$price'},
'low': {'$min': '$price'},
'close': {'$last': '$price'},
'volume': {'$sum': '$volume'}
}
}
bars = tick_collection.aggregate([query, sort, unwind, projection, group])