Question

I'm trying to build a neural network language model and it seems that word2vec tool by Mikolov et al is a good tool for this purpose. I tried that but it just produces word representations. Does anybody know how i can produce a language model by that tool or any other reasonable deep learning framework?

Was it helpful?

Solution 2

Doc2Vec implemented in Gensim does the job. The trick is that they use the document ID as a context word, which is present in all window sizes of all the words in the document.

Code is here in Python/Gensim

OTHER TIPS

Microsoft Research has released a toolkit for language modelling with word2vec-style vectors. You can find it here.

word2vec is a tool to represent a single word (o a group of words) as a numerical vector. So it is not directly related to a language model.

To generate a Language model you can use the MITLM to do it. For example you can create a N-gram model using the corpus Lectures.txt with this command:

estimate-ngram -text Lectures.txt -write-lm Lectures.lm

A great tutorial can be found here.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top