Labelled LDA usage

https://stackoverflow.com/questions/16740154

30-05-2022
|

Question

I am working on a project which requires applying the topic model LDA. Because each document in my case is short, I have to use Labelled LDA. I do not have much knowledge in this area, and all I need to do is to apply the LLDA to my data.

After searching on web I found an LLDA implementation on Stanford TMT. What I understand from section Training a Labeled LDA model is: I should label each input document before training. Am I misunderstanding something?

If my understanding is correct, this will involves too much work on labeling documents. Instead, can I provide a separate list of topics, and train the documents without labels?

Solution

Your understanding is correct: you need to label each input document before training.

Labelled LDA is a supervised method, meaning that you need a labelled dataset.

If you "have to use Labelled LDA" you cannot get away from the need to obtained a labelled dataset. If the LabeledLDA model in TMT needs a LabeledLDADocumentParams object and to crete it you need array of lablels. So, no it is not possible to train a Labeled LDA model without labels.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow