Best way to fix the size of a sentence [Sentiment Analysis]
-
22-10-2019 - |
Question
I am working on a project that is about Natural Language Processing. However I am stuck at the point which is I have a ANN that has fixed size of input neurons.
I am trying to do sentiment analysis with using Imdb movie review set. To able to do that, firstly, I calculated the word embeddings for each word with creating a word-context matrix then applied SVD. So I have the word embedding matrix. But I do not know the best way to compress sentence's vector (which contains embeddings for each word in the sentence) into a fixed size to be able to feed the neural net. I tried PCA but result was not satisfying.
Any help?
Solution
The easiest way is to average the word- embeddings. This works quite well.
Another thing you can try is to represent each document as a bag of words - i.e. - to have a vector in the size of your vocabulary, where each element in the vector represents the number of times a certain word had been mentioned in your document (for example, the first element in the vector will represent how many times the word a
was mentioned, and so on).
Afterwords, to reduce the size of the vector you can use techniques like LDA, SVD, or autoencoders.