Would averaging two vectors in word embeddings make sense?
-
09-12-2020 - |
Pergunta
I'm currently using the GloVe embedding matrix which is pre-trained on a large corpus. For my purpose it works fine, however, there are a few words which it does not know (for example, the word 'eSignature'). This spoils my results a bit. I do not have the time or data to retrain on a different (more domain-specific) corpus, so I wondered if I could add vectors based on existing vectors. By E(word) I denote the embedding of a word. Would the following work?
E(eSignature) = 1/2 * ( E(electronic) + E(signature) )
If not, what are other ideas that I could use to add just a few words in a word embedding?
Solução
Averaging embeddings vectors could make sense if your aim is to represent a sentence or document with a unique vector. For words out of vocabulary it make more sense to just use a random initialisation and allow training of the embedding parameters during the training of the model. In this way the model will learn the representation for the out-of-vocabulary words by itself.
Alternatively, you could use external resources like WordNet [1] to extract a set of synonyms and other words closely related to a specific term, and then leverage the vectors of those close words (averaging them might have sense but it's always a matter of testing and see what happens, as far as I know there are no grounded rules established yet).