Find different realization of a word in a sentence string - Python

Question 1

I recommend having a look at the stem package of NLTK: http://nltk.org/api/nltk.stem.html

Using it you can "remove morphological affixes from words, leaving only the word stem. Stemming algorithms aim to remove those affixes required for eg. grammatical role, tense, derivational morphology leaving only the stem of the word."

If your language is not covered by NLTK currently, you should consider extending NLTK. If you really need something simple and don't bother about NLTK, then you should still write your code as a collection of small, easy to combine utility functions, for example:

import string 

def variation(stem, word):
    return word.lower() in [stem, stem + 'es', stem + 's']

def variations(sentence, stem):
    sentence = cleanPunctuation(sentence).split()
    return ( (i, w) for i, w in enumerate(sentence) if variation(stem, w) )

def cleanPunctuation(sentence):
    exclude = set(string.punctuation)
    return ''.join(ch for ch in sentence if ch not in exclude)

def firstVariation(sentence, stem):
    for i, w  in variations(sentence, stem):
        return i, w

sentence = "First coach, here another two coaches. Coaches are nice."

print firstVariation(sentence, 'coach')

# print all variations/forms of 'coach' found in the sentence:
print "\n".join([str(i) + ' ' + w for i,w in variations(sentence, 'coach')])

Question 2

Morphology is typically a finite-state phenomenon, so regular expressions are the perfect tool to handle it. Build an RE that matches all of the cases with a function like:

def inflect(stem):
    """Returns an RE that matches all inflected forms of stem."""
    pat = "^[%s%s]%s(?:e?s)$" % (stem[0], stem[0].upper(), re.escape(stem[1:]))
    return re.compile(pat)

Usage:

>>> sentence = "this is a sentence with the Coaches"
>>> target = inflect("coach")
>>> [(i, w) for i, w in enumerate(sentence.split()) if re.match(target, w)]
[(6, 'Coaches')]

If the inflection rules get more complicated than this, consider using Python's verbose REs.