Question

I have training data in form of pair of documents with an associated label - {doc1, doc2, label}. Label is defined as function of pair of documents.

Now I want to build a model which can predict the label given two new documents.

I want to try different representation of document (instead of common ones say TF-IDF). Can I use vectors (topic distribution) from LDA as features for a classifier?

Was it helpful?

Solution

Yes, that is a reasonable approach. Also try neural network based representations such as doc2vec. I suppose you know how to do the classification part?

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top