Question

When doing entity recognition using NLTK, one gets as a result a Tree with a bunch of words mapped to tags (eg. Mark -> NNP, first -> JJ, ...). It's not at all clear what all the tags stand for at first glance and I was unable to find any documentation about these tags in the NLTK docs.

>>> from nltk import word_tokenize, pos_tag, ne_chunk
>>> sentence = "Mark and John are the first to work at Google from one years old in 39 years."
>>> print ne_chunk(pos_tag(word_tokenize(sentence)))
(S
  (PERSON Mark/NNP)
  and/CC
  (PERSON John/NNP)
  are/VBP
  the/DT
  first/JJ
  to/TO
  work/VB
  at/IN
  (ORGANIZATION Google/NNP)
  from/IN
  one/CD
  years/NNS
  old/JJ
  in/IN
  39/CD
  years/NNS
  ./.)

I ended up looking into the source code to get the mapping. Posting in case anyone else runs into the same problem.

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top