How to compute a language model with word2vec tool?
-
21-12-2019 - |
Question
I'm trying to build a neural network language model and it seems that word2vec tool by Mikolov et al is a good tool for this purpose. I tried that but it just produces word representations. Does anybody know how i can produce a language model by that tool or any other reasonable deep learning framework?
Solution 2
Doc2Vec implemented in Gensim does the job. The trick is that they use the document ID as a context word, which is present in all window sizes of all the words in the document.
Code is here in Python/Gensim
OTHER TIPS
Microsoft Research has released a toolkit for language modelling with word2vec-style vectors. You can find it here.
word2vec
is a tool to represent a single word (o a group of words) as a numerical vector. So it is not directly related to a language model.
To generate a Language model you can use the MITLM to do it. For example you can create a N-gram model using the corpus Lectures.txt
with this command:
estimate-ngram -text Lectures.txt -write-lm Lectures.lm
A great tutorial can be found here.