Question

I have used the python code given below to extract named entities present in the text. Now i need to get the adjectives from those sentences in the text where there is a named entity . i.e the adjective used with named entities. Can i alter my code to check whether the tree has 'JJ' if there is 'NE', or is there any other approach??

def tokenize(text): 
sentences = nltk.sent_tokenize(text) 
sentences = [nltk.word_tokenize(sent) for sent in sentences] 
sentences = [nltk.pos_tag(sent) for sent in sentences] 
return sentences 

text=open("file.txt","r").read() 
sentences=tokenize(text) 
chunk_sent=nltk.batch_ne_chunk(sentences,binary=True)
print chunk_sent[1]

The output:

Tree('S', [("'", 'POS'), ('Accomplished', 'NNP'), ('in', 'IN'), ('speech', 'NN'), (',', ','), Tree('NE', [('Gautam', 'NNP')]), (',', ' ,'), ('thus', 'RB'), ('questioned', 'VBD'), (',', ','), ('gave', 'VBD'), ('in', 'IN'), ('the', 'DT'), ('midst', 'NN'), ('of', 'IN'), ('that', 'DT'), ('big', 'JJ'), ('assemblage', 'NN'), ('of', 'IN'), ('contemplative', 'JJ'), ('sages' 'NNP'), ('a', 'DT'), ('full', ' JJ'), ('and', 'CC'), ('proper', 'NN'), ('answer', 'NN'), ('in', 'IN'), ('words', 'NNS'), ('consonant', 'JJ'), ('with', 'IN'), ('their ', 'PRP$'), ('mode', 'NN'), ('of', 'IN'), ('life', 'NN'), ('.', '.')])

Though this sentence doesnt have a JJ before NE.How can i get the JJ used with NE?

def ne(tree):
    names = []
    if hasattr(tree, 'node') and tree.node:
      if tree.node == 'NE':
        names.append(' '.join([child[0] for child in tree]))
    else:
        for child in tree:
            names.extend(ne(child))

return names

names = []
for item in chunk_sent:
   names.extend(ne(item))
print names
Was it helpful?

Solution

>>> from nltk.corpus import brown
>>> from nltk import batch_ne_chunk as bnc
>>> from nltk.tree import Tree
>>> sentences = brown.tagged_sents()[0:5]
>>> chunk_sents = bnc(sentences)
>>> 
>>> for sent in chunk_sents:
...     for i,j in zip(sent[:-1], sent[1:]):
...             if type(j) is Tree and i[1].startswith("JJ"):
...                     print i,j
... 
('Grand', 'JJ-TL') (PERSON Jury/NN-TL)
('Executive', 'JJ-TL') (ORGANIZATION Committee/NN-TL)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top