How to get word vector representation when using Deep Learning in NLP ? The words are represented by a fixed length vector, see http://machinelearning.wustl.edu/mlpapers/paper_files/BengioDVJ03.pdf for more details.

有帮助吗?

解决方案

Deep Learning and NLP are quite complex subjects, so if you really want to understand them you'll need to follow a couple of courses in the field and read many papers. There are lots of different techniques for converting words into vector representations and it's a very active area of research. Socher's DL for NLP tutorial is a good next step if you are already well acquainted with NLP and Machine Learning (including deep learning).

With that said (and considering it's a programming forum), if you are just interested for now in using someone's else tools to quickly obtain vector representations which can be useful in some tasks, one library which you must look at is word2vec. Take a look in its website: https://code.google.com/p/word2vec/. It's a very powerful tool and for some basic stuff it could be used without much knowledge.

其他提示

For getting word vector for a word you can use Google News 300 dimensional word vector model.

Download the model from here - https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing OR from here
https://s3.amazonaws.com/mordecai-geo/GoogleNews-vectors-negative300.bin.gz .

After downloading load the model using gensim python library as below -

import gensim

# Load Google's pre-trained Word2Vec model.
model = gensim.models.Word2Vec.load_word2vec_format('./model/GoogleNews-vectors-negative300.bin', binary=True)  

Then just query the model for word vector corresponding to a word like

model['usa']

And it returns you a 300 dimensional word vector for usa.

Note that you may not found word vectors for all the words in this model.
Also instead of this Google News model, other models can also be used.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top