Question

How to get word vector representation when using Deep Learning in NLP ? The words are represented by a fixed length vector, see http://machinelearning.wustl.edu/mlpapers/paper_files/BengioDVJ03.pdf for more details.

Was it helpful?

Solution

Deep Learning and NLP are quite complex subjects, so if you really want to understand them you'll need to follow a couple of courses in the field and read many papers. There are lots of different techniques for converting words into vector representations and it's a very active area of research. Socher's DL for NLP tutorial is a good next step if you are already well acquainted with NLP and Machine Learning (including deep learning).

With that said (and considering it's a programming forum), if you are just interested for now in using someone's else tools to quickly obtain vector representations which can be useful in some tasks, one library which you must look at is word2vec. Take a look in its website: https://code.google.com/p/word2vec/. It's a very powerful tool and for some basic stuff it could be used without much knowledge.

OTHER TIPS

For getting word vector for a word you can use Google News 300 dimensional word vector model.

Download the model from here - https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing OR from here
https://s3.amazonaws.com/mordecai-geo/GoogleNews-vectors-negative300.bin.gz .

After downloading load the model using gensim python library as below -

import gensim

# Load Google's pre-trained Word2Vec model.
model = gensim.models.Word2Vec.load_word2vec_format('./model/GoogleNews-vectors-negative300.bin', binary=True)  

Then just query the model for word vector corresponding to a word like

model['usa']

And it returns you a 300 dimensional word vector for usa.

Note that you may not found word vectors for all the words in this model.
Also instead of this Google News model, other models can also be used.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top