Question

I'd like to run one of the built-in classifiers on a file, then run my own classifier, merging the results.

How do I do so with Stanford NER, in particular, via the command line?

I am aware of How do I include more than one classifiers when using Stanford named entity recogniser? , but this is slightly different, as that questions asks about multiple classifiers with NERServer.

Looks like I need to use CoreNLP to run multiple NER models in sequence...can I do it without CoreNLP?

Say I had a file with contents "the quick brown fox jumped over the lazy dog in America". I run the one of the built-in classifiers, and it finds "America" as a location, then I run my own, and it finds "fox" and "dog", the result should be:

the quick brown <animal>fox</animal> jumped over the lazy <animal>dog</animal> in <location>America</location
Was it helpful?

Solution

So, a place to get started if you're dead set on doing this in a single command from the command line:

cat corpus.txt | tee `stanfordNER -options here > out1.xml` | myNERTagger -options here > out2.xml && diff out1.xml out2.xml | awk to do whatever merging you want here...

But what you'll likely find is that this is not a solution. You're going to want to go sentence-by-sentence in a little script, calling pyner or similar to hook into the Stanford tagger and then whatever custom tagger you've built, merging the differences as you go along. The output formatting of your taggers will change how this looks pretty dramatically.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top