Question

... or is gender information enough? More specifically, I'm interested in knowing if I can reduce the number of models loaded by the Stanford Core NLP to extract coreferences. I am not interested in actual named entity recognition.

Thank you

Was it helpful?

Solution

According to the EMNLP paper that describes the coref system packaged with Stanford CoreNLP, named entities tags are just used in the following coref annotation passes: precise constructs, relaxed head matching, and pronouns (Raghunathan et al. 2010).

You can specify what passes to use with the dcoref.sievePasses configuration property. If you want coreference but you don't want to do NER, you should be able to just run the pipeline without NER and specify that the coref system should only use the annotation passes that don't require NER labels.

However, the resulting coref annotations will take a hit on recall. So, you might want to do some experiments to determine whether the degraded quality of the annotations is problem for whatever your are using them for downstream.

OTHER TIPS

In general, yes. First, you need named entities because they serve as the candidate antecedents, or targets to which the pronouns refer. Many (most?) systems perform both entity recognition and type classification in one step. Second, the semantic category (e.g. person, org, location) of the entities are important for constructing accurate coreference chains.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top