OpenNLP allows getting n best for POS tagging:
Some applications need to retrieve the n-best pos tag sequences and not only the best sequence. The topKSequences method is capable of returning the top sequences. It can be called in a similar way as tag.
Sequence topSequences[] = tagger.topKSequences(sent);
Each Sequence object contains one sequence. The sequence can be retrieved via Sequence.getOutcomes() which returns a tags array and Sequence.getProbs() returns the probability array for this sequence.
Also, there is also a way to make spaCy do something like this:
Doc.set_extension('tag_scores', default=None)
Token.set_extension('tag_scores', getter=lambda token: token.doc._.tag_scores[token.i])
class ProbabilityTagger(Tagger):
def predict(self, docs):
tokvecs = self.model.tok2vec(docs)
scores = self.model.softmax(tokvecs)
guesses = []
for i, doc_scores in enumerate(scores):
docs[i]._.tag_scores = doc_scores
doc_guesses = doc_scores.argmax(axis=1)
if not isinstance(doc_guesses, numpy.ndarray):
doc_guesses = doc_guesses.get()
guesses.append(doc_guesses)
return guesses, tokvecs
Language.factories['tagger'] = lambda nlp, **cfg: ProbabilityTagger(nlp.vocab, **cfg)
Then each token will have tag_scores with the probabilities for each part of speech from spaCy's tag map.