Domanda

I am using python pattern to get the singular form of English nouns.

    In [1]: from pattern.en import singularize
    In [2]: singularize('patterns')
    Out[2]: 'pattern'
    In [3]: singularize('gases')
    Out[3]: 'gase'

I am solving the problem in the second example by defining

    def my_singularize(strn):
        '''
        Return the singular of a noun. Add special cases to correct pattern generic rules.
        '''
        exceptionDict = {'gases':'gas','spectra':'spectrum','cross':'cross','nuclei':'nucleus'}
        try:
            return exceptionDict[strn]
        except:
            return singularize(strn)

Is there a better way to do this, e.g. add to the rules of pattern, or make the exceptionDict somehow internal to pattern?

È stato utile?

Soluzione

As mentioned in the comments, you would be better off by lemmatizing the words. Its part of nltk stemming module.

from nltk.stem import WordNetLemmatizer

wnl = WordNetLemmatizer()
test_words = ['gases', 'spectrum','cross','nuclei']
%timeit [wnl.lemmatize(wrd) for wrd in test_words]

10000 loops, best of 3: 60.5 µs per loop

compared to your function

%timeit [my_singularize(wrd) for wrd in test_words]
1000 loops, best of 3: 162 µs per loop

nltk lemmatizing performs better.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top