Semantic similarity between two or more sentences

https://datascience.stackexchange.com/questions/81418

13-12-2020
|

Pergunta

I need to determine how similar sentences (in meaning) are to one another. In order to do it, I have been considering an algorithm (cosine similarity) to determine the similarity between sentences. I have thought as appropriate Word2vec or wordnet to build features for similarity.

If you have used this (or similar) approach, could you please provide me an example of use of word2vec/wordnet for similarity analysis?

Solução

Word2vec as the name suggests will create an embedding for each word in your sentence. In order to get a sentence level embedding you would need to average (or combine in some other way) the individual embeddings together.

An example of a model to generate sentence level embedding would be the Universal Sentence Encoder (USE). You may want to try it out and see if it can outperform a word-level model in your use-case.

The original paper can be found here: https://arxiv.org/abs/1803.11175

An example blog post leveraging USE: https://medium.com/@gaurav5430/universal-sentence-encoding-7d440fd3c7c7

Licenciado em: CC-BY-SA com atribuição

Não afiliado a datascience.stackexchange