Question

I am trying to mine social media data, such as tweets. However, social media data have a lot of noise- for example people discussing celebrities or quoting a movie/TV/song, that is something most generally that is not about themselves or somebody they actually know personally.

So, is: are there any dynamic (i.e., automatically updated) databases on the most popular current celebrities? Movie quotes that they are in or song lyrics that they sing would also be relevant.

Was it helpful?

Solution

I don't think such a curated list exists. Smaller ones do exist, for example the 100 top movies quotes on Wikipedia. However, these are not updated.

One possibility is to filter out the aspects of your input that appear on another social media site that tracks trends, such as Delicious. Unless you are looking for trends, something that rises to the top of two trending sites likely ... is just a trend.

Delicious has a nice Python wrapper for its API.

In Pythonic pseudocode,

 data = social-media.content
 data = filter(lambda datum: datum not in delicious.content-list,data)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top