Question

I have set of categories and I want to compare a document vector with word vector of categories to find best matching category.

Is it possible to compare a word vector with document vector? If yes, is there any literature which gives proof of concept for this?

Was it helpful?

Solution

In paragraph vector, the vector tries to grasp the semantic meaning of all the words in the context by placing the vector itself in each and every context. Thus finally, the paragraph vector contains the semantic meaning of all the words in the context trained.

When we compare this to word2vec, each word in word2vec preserves its own semantic meaning. Thus summing up all the vectors or averaging them will result in a vector which could have all the semantics preserved. This is sensible, because when we add the vectors (transport+water) the result nearly equals ship or boat, which means summing the vectors sums up the semantics.

Before the paragraph vector paper got published, people used averaged word vectors as sentence vectors. To be honest, in my work these average vectors work better than document vectors. So, with these things in mind, in this way it could be compared.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top