Question

Could anyone tell me what algorithm does Twitter.com use on classifying trend topics with multiple words? The problem is easy when only dealing with trends having only single words say for example "#SoulTrainAwards" or "#DontYouWish". But it is a totally different problem when dealing with trends with multiple words say for example "Chrisette Michelle" or "Happy Halloween" or "Merry Christmas" since a word on a mltiple-word trend can be another different trend. Say for example the word "Happy", or the word "Christmas" alone.

Was it helpful?

Solution

As it was pointed out by user judotens on this question, you'd divide the message into n-grams. I believe Twitter uses at most 3 words on a trending topic, so the message

The cat ate the food.

would result on the following items

  • The cat ate
  • cat ate the
  • ate the food
  • The cat
  • cat ate
  • ate the
  • the food
  • The
  • cat
  • ate
  • the
  • food

Then, I believe it uses that data as input for some kind of streaming algorithm, which will return the most frequent items.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top