문제

I'm currently working in text mining ptoject I'd like to know once I'm on vectorisation. With method is better.

  • Is it Word2Vec or Tf-Idf ?

  • Here I see we can combine them why that? Does it improve quality of data?

  • What about GloVe?

Thanks

도움이 되었습니까?

해결책

  • Word2Vec algorithms (Skip Gram and CBOW) treat each word equally, because their goal to compute word embeddings. The distinction becomes important when one needs to work with sentences or document embeddings; not all words equally represent the meaning of a particular sentence. And here different weighting strategies are applied, TF-IDF is one of those successful strategies.
  • At times, it does improve quality of inference, so combination is worth a shot.
  • Glove is a Stanford baby, which has often proved to perform better. Can read more about Glove against Word2Vec here, among many other resources available online.
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 datascience.stackexchange
scroll top