Question

I have a corpus on which I want to perform sentiment analysis using LSTM and word embeddings. I have converted the words in the documents to word vectors using word2vec. My question is how to input these word vectors as input to keras? I don't want to use the Embeddings provided by keras. Thanks in advance.

Was it helpful?

Solution

You can just skip the Embedding layer and use a normal input layer with n input nodes where n is the dimensions of your word2vec embeddings. The rest is the same as you would with an embedding layer, just pass a sequence of n dimensional vectors as the input, potentially padded or truncated depending on your model.

OTHER TIPS

Keras does not provide pre-trained word embeddings out of the box. Do you need to avoid using the Embedding layer entirely? If not, you can use that layer as an input for a matrix of pre-trained word vectors. You can load in your pre-trained embeddings as the initial layer weights using the weights property of the layer. If you set trainable=False then your word embeddings will not shift when you run your model.

Here's a snippet of code from an example using pre-trained word vectors from the Keras Github repo:

embedding_layer = Embedding(nb_words + 1,
                        EMBEDDING_DIM,
                        weights=[embedding_matrix],
                        input_length=MAX_SEQUENCE_LENGTH,
                        trainable=False)

If you do need to avoid the Embedding layer entirely, then @Jan's probably gives you what you need.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top