Extracting two names from same sentence in nltk python

https://stackoverflow.com/questions/10320194

03-06-2021
|

Question

Hi I started playing around with Python these days and it seems easy so I found the corpus in nltk in Python. When I tried out

text1.concordance("Moby")

it gave me the number of sentences and a display of the sentences containing the word Moby, cool.

So I tried to test out if I could find all the sentences with the names Moby and Ahab but sadly I get errors out of that.

Am I doing something wrong or should I be able to get all the sentences containing those both names? Is there another function from nltk I should use? O.o

It's probably easy but not so much for me to see it atm...hope someone could help, thanks.

PS: If I need to write some code, an example would be great.^^

Edit: Since someone asked for the error I will write the code I wrote too.

import nltk
from nltk.book import *

text1.concordance("Moby","Ahab")

gives me the error:

Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    text1.concordance('Moby','Ahab')
  File "C:\Programmering\Python27\lib\site-packages\nltk\text.py", line 314, in concordance
    self._concordance_index.print_concordance(word, width, lines)
  File "C:\Programmering\Python27\lib\site-packages\nltk\text.py", line 174, in print_concordance
    half_width = (width - len(word) - 2) / 2
TypeError: unsupported operand type(s) for -: 'str' and 'int'

I had guessed that I would get some matches like with just running:

text1.concordance("Moby")

where I got 84 matches.

La solution

You can't do that with concordance. It only accepts one word and it prints out the results. There's no (reasonable) way to get them as a list, so you can't filter them further. The problem is that Text, the object behind text1, is only suitable for simple interactive exploration--I've never understood why the nltk book starts with it. So forget about Text, skip the rest of the chapter and go straight to chapter 2. Moby Dick is part of the gutenberg corpus, so you can iterate over its sentences and get your answer like this:

from nltk.corpus import gutenberg
for s in gutenberg.sents('melville-moby_dick.txt'):
    if 'Ahab' in s and 'Moby' in s:
        print " ".join(s)

Autres conseils

You could make a list of all the names you want to find a concordance with such as:

name_list = ['Moby', 'Ahab']

The code for doing so would be:

import nltk
from nltk.book import *
name_list = ['Moby', 'Ahab']
for name in name_list: 
    text1.concordance(name)

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow