Question

I want to run mallet using the --use-ngrams true option but can't seem to get it working. I've imported my data using:

./bin/mallet import-dir --input path --output topic-input.mallet --keep-seqence -- removed stopwords

Now I want to train a topical ngram model:

bin/mallet train-topics --input topic-input.mallet --use-ngrams true --num-topics 30 --xml-topic-report topic-report.xml

But I'm getting this error:

Exception in thread "main" java.lang.ClassCastException: cc.mallet.types.FeatureSequence cannot be cast to cc.mallet.types.FeatureSequenceWithBigrams
at cc.mallet.topics.TopicalNGrams.estimate(TopicalNGrams.java:78)
at cc.mallet.topics.tui.Vectors2Topics.main(Vectors2Topics.java:249)

As you can see I'm running mallet as a command line tool and would rather not pry into its API to get it working. Any suggestions?

Was it helpful?

Solution

Found the answer:

you must import the directory you'd like to run topical-ngram modeling over using the '--keep-sequence-bgirams' argument (e.g.

./bin/mallet import-dir --input path --output topic-input.mallet --keep-sequence-bigrams --remove-stopwords

And then, you run the topic model as:

bin/mallet train-topics --input topic-input.mallet --use-ngrams true --num-topics 30 --xml-topic-report topic-report.xml
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top