Question

I'm currently working in text mining ptoject I'd like to know once I'm on vectorisation. With method is better.

  • Is it Word2Vec or Tf-Idf ?

  • Here I see we can combine them why that? Does it improve quality of data?

  • What about GloVe?

Thanks

Was it helpful?

Solution

  • Word2Vec algorithms (Skip Gram and CBOW) treat each word equally, because their goal to compute word embeddings. The distinction becomes important when one needs to work with sentences or document embeddings; not all words equally represent the meaning of a particular sentence. And here different weighting strategies are applied, TF-IDF is one of those successful strategies.
  • At times, it does improve quality of inference, so combination is worth a shot.
  • Glove is a Stanford baby, which has often proved to perform better. Can read more about Glove against Word2Vec here, among many other resources available online.
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top