I think that you probably just have a typing mistake or otherwise a misunderstanding:
db.tweets.aggregate([
{ "$project": { "hashtags":1 }},
{ "$unwind": "$hashtags" },
{ "$group": { "_id": "$hashtags", "count": { "$sum": 1 } }}
])
So the value for _id
in the group needs to the "$hashtags"
rather than the "hashtags"
you have used. This is so it uses the actual value of the field, and the result is the count of each "hashtag".
Without the $
to declare that you want the value of the field, it is just a string. So grouping on an unmatched string groups everything.
So that would give you the count for each tag. If in fact you are looking for the total number of "unique" tags without listing each tag. You can modifiy like this:
db.tweets.aggregate([
{ "$project": { "hashtags":1 }},
{ "$unwind": "$hashtags" },
{ "$group": { "_id": "$hashtags" }},
{ "$group": { "_id": null, "count": { "$sum": 1 } }
])
So that just summarizes. There is another way to do this using the $addToSet
operator, but it really just creates additional work in the pipeline and is not the best usage case for that operator. But just for reference:
db.tweets.aggregate([
{ "$project": { "hashtags":1 }},
{ "$unwind": "$hashtags" },
{ "$group": {
"_id": null,
"hashtags": { "$addToSet": "$hashtags" }
}},
{ "$unwind": "$hashtags" },
{ "$group": { "_id": null, "count": { "$sum": 1 } }
])