Pregunta

I'm currently doing the topic modeling things (beginner) I was thinking using mallet for some tool to get me understand this area, but, my problem is, I'd like to train a model based on, let's say, 1000 documents, to construct a model and using the model on a new single document to generate its potential topics.

But, as far as I read about mallet tutorial, it always says like this tool or API is useful on a corpus of texts, which means, it's used to find topics within several documents.

Is there a way that it can find topic on single document based on the model (or inference parameter it learned / constructed from the 1000 documents?)

Is there any other tool that can do this?

Thanks a lot!

¿Fue útil?

Solución

You can refer the example code src/cc/mallet/examples/TopicModel.java which describes how to clustering and infer the new instance.

Otros consejos

Actually when you run the simple LDA on a directory the model assigns topic proportions to each of the documents of that directory based on "an already" trained model from a part of your corpus. So, topic proportions are assigned with a certain probability to each of the documents (already ranked by the probability of appearance of that topic to that specific document).

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top