What is the tag mapping for entity recognition in nltk?
-
31-10-2019 - |
Question
When doing entity recognition using NLTK, one gets as a result a Tree
with a bunch of words mapped to tags (eg. Mark -> NNP
, first -> JJ
, ...). It's not at all clear what all the tags stand for at first glance and I was unable to find any documentation about these tags in the NLTK
docs.
>>> from nltk import word_tokenize, pos_tag, ne_chunk
>>> sentence = "Mark and John are the first to work at Google from one years old in 39 years."
>>> print ne_chunk(pos_tag(word_tokenize(sentence)))
(S
(PERSON Mark/NNP)
and/CC
(PERSON John/NNP)
are/VBP
the/DT
first/JJ
to/TO
work/VB
at/IN
(ORGANIZATION Google/NNP)
from/IN
one/CD
years/NNS
old/JJ
in/IN
39/CD
years/NNS
./.)
I ended up looking into the source code to get the mapping. Posting in case anyone else runs into the same problem.
No correct solution
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange