You can extract only the tags you want with a list comprehension, e.g.:
>>> tags = nltk.pos_tag(tokens)
>>> dt_tags = [t for t in tags if t[1] == "DT"]
>>> dt_tags
[('a', 'DT')]
Pergunta
this must be simple but I'm missing it somehow. I have the code:
import nltk
f=open('...\\t.txt','rU')
raw=f.read()
tokens = nltk.word_tokenize(raw)
print nltk.pos_tag(tokens)
which returns for instance:
"[('processes', 'NNS'), ('a', 'DT'), ('sequence', 'NN'), ('of', 'IN'), ('words', 'NNS')]
I was wondering how I could just collected solely all 'NN' for example or all 'DT' AND 'IN' instead of every member of the string.
thanks in advance
Solução
You can extract only the tags you want with a list comprehension, e.g.:
>>> tags = nltk.pos_tag(tokens)
>>> dt_tags = [t for t in tags if t[1] == "DT"]
>>> dt_tags
[('a', 'DT')]