Question

I have a mongo collection of tweets and each document looks like this:

{ _id:'1234567', 
date:'9/27/08 3:21', 
tweet:'Some text here and some #hashtag and one more #hashtag2', 
a_name:'name', 
a_nick:'nick', 
hashtags:['#hashtag' , '#hashtag2' ]} 

I need to count all the occurrences of #hashtag and #hashtag2 in all of the entries using aggregation. So far I have something like this:

 db.tweets.aggregate(
 { $project: { hashtags:1}},
 { $unwind: "$hashtags" },
 { $group: { _id: "hashtags", count: { $sum: 1 }}}
 );

But that is bringing me the count of all of the hashtags. If I remove the $group line I get a list with all the separate hash tags, which is good, but I would like to be able to count them using $aggregation and mongo only. Any ideas?

Was it helpful?

Solution

I think that you probably just have a typing mistake or otherwise a misunderstanding:

db.tweets.aggregate([
   { "$project": { "hashtags":1 }},  
   { "$unwind": "$hashtags" },  
   { "$group": { "_id": "$hashtags", "count": { "$sum": 1 } }}  
])

So the value for _id in the group needs to the "$hashtags" rather than the "hashtags" you have used. This is so it uses the actual value of the field, and the result is the count of each "hashtag".

Without the $ to declare that you want the value of the field, it is just a string. So grouping on an unmatched string groups everything.

So that would give you the count for each tag. If in fact you are looking for the total number of "unique" tags without listing each tag. You can modifiy like this:

db.tweets.aggregate([
   { "$project": { "hashtags":1 }},  
   { "$unwind": "$hashtags" },  
   { "$group": { "_id": "$hashtags" }},
   { "$group": { "_id": null, "count": { "$sum": 1 } }
])

So that just summarizes. There is another way to do this using the $addToSet operator, but it really just creates additional work in the pipeline and is not the best usage case for that operator. But just for reference:

db.tweets.aggregate([
   { "$project": { "hashtags":1 }},  
   { "$unwind": "$hashtags" },  
   { "$group": { 
       "_id": null, 
       "hashtags": { "$addToSet": "$hashtags" }
   }},
   { "$unwind": "$hashtags" },
   { "$group": { "_id": null, "count": { "$sum": 1 } }
])
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top