I am using python pattern to get the singular form of English nouns.

    In [1]: from pattern.en import singularize
    In [2]: singularize('patterns')
    Out[2]: 'pattern'
    In [3]: singularize('gases')
    Out[3]: 'gase'

I am solving the problem in the second example by defining

    def my_singularize(strn):
        '''
        Return the singular of a noun. Add special cases to correct pattern generic rules.
        '''
        exceptionDict = {'gases':'gas','spectra':'spectrum','cross':'cross','nuclei':'nucleus'}
        try:
            return exceptionDict[strn]
        except:
            return singularize(strn)

Is there a better way to do this, e.g. add to the rules of pattern, or make the exceptionDict somehow internal to pattern?

有帮助吗?

解决方案

As mentioned in the comments, you would be better off by lemmatizing the words. Its part of nltk stemming module.

from nltk.stem import WordNetLemmatizer

wnl = WordNetLemmatizer()
test_words = ['gases', 'spectrum','cross','nuclei']
%timeit [wnl.lemmatize(wrd) for wrd in test_words]

10000 loops, best of 3: 60.5 µs per loop

compared to your function

%timeit [my_singularize(wrd) for wrd in test_words]
1000 loops, best of 3: 162 µs per loop

nltk lemmatizing performs better.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top