문제

I'm trying to build a neural network language model and it seems that word2vec tool by Mikolov et al is a good tool for this purpose. I tried that but it just produces word representations. Does anybody know how i can produce a language model by that tool or any other reasonable deep learning framework?

도움이 되었습니까?

해결책 2

Doc2Vec implemented in Gensim does the job. The trick is that they use the document ID as a context word, which is present in all window sizes of all the words in the document.

Code is here in Python/Gensim

다른 팁

Microsoft Research has released a toolkit for language modelling with word2vec-style vectors. You can find it here.

word2vec is a tool to represent a single word (o a group of words) as a numerical vector. So it is not directly related to a language model.

To generate a Language model you can use the MITLM to do it. For example you can create a N-gram model using the corpus Lectures.txt with this command:

estimate-ngram -text Lectures.txt -write-lm Lectures.lm

A great tutorial can be found here.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top