LDA vs Word2Vec vs Others for predicting recipients of a message

https://datascience.stackexchange.com/questions/23932

31-10-2019
|

Question

I'm investigating various NLP algorithms and tools to solve the following problem; NLP newbie here, so pardon my question if it's too basic.

Let's say, I have a messaging app where users can send text messages to one or more people. When the user types a message, I want the app to suggest to the user who the potential recipients of the message are?

If user "A" sends a lot of text messages regarding "cats" to user "B" and some messages to user "C" and sends a lot of messages regarding "politics" to user "D", then next time user types the message about "cats" then the app should suggest "B" and "C" instead of "D".

So I'm doing some research on topic modeling and word embeddings and see that LDA and Word2Vec are the 2 probable algorithms I can use.

Wanted to pick your brain on which one you think is more suitable for this scenario.

One idea I have is, extract topics using LDA from the previous messages and rank the recipients of the messages based on the # of times a topic has been discussed (ie, the message sent) in the past. If I have this mapping of the topic and a sorted list of users who you talk about it (ranked based on frequency), then when the user types a message, I can again run topic extraction on the message, predict what the message is about and then lookup the mapping to see who can be the possible recipients and show to user.

Is this a good approach? Or else, Word2Vec (or doc2vec or lda2vec) is better suited for this problem where we can predict similar messages using vector representation of words aka word embeddings? Do we really need to extract topics from the messages to predict the recipients or is that not necessary here? Any other algorithms or techniques you think will work the best? Should I just use supervised learning instead?

What are your thoughts and suggestions?

Thanks for the help.

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange