Frage

Given a word, which may or may not be a singular-form noun, how would you generate its plural form?

Based on this NLTK tutorial and this informal list on pluralization rules, I wrote this simple function:

def plural(word):
    """
    Converts a word to its plural form.
    """
    if word in c.PLURALE_TANTUMS:
        # defective nouns, fish, deer, etc
        return word
    elif word in c.IRREGULAR_NOUNS:
        # foot->feet, person->people, etc
        return c.IRREGULAR_NOUNS[word]
    elif word.endswith('fe'):
        # wolf -> wolves
        return word[:-2] + 'ves'
    elif word.endswith('f'):
        # knife -> knives
        return word[:-1] + 'ves'
    elif word.endswith('o'):
        # potato -> potatoes
        return word + 'es'
    elif word.endswith('us'):
        # cactus -> cacti
        return word[:-2] + 'i'
    elif word.endswith('on'):
        # criterion -> criteria
        return word[:-2] + 'a'
    elif word.endswith('y'):
        # community -> communities
        return word[:-1] + 'ies'
    elif word[-1] in 'sx' or word[-2:] in ['sh', 'ch']:
        return word + 'es'
    elif word.endswith('an'):
        return word[:-2] + 'en'
    else:
        return word + 's'

But I think this is incomplete. Is there a better way to do this?

War es hilfreich?

Lösung

The pattern-en package offers pluralization

>>> import pattern.en
>>> pattern.en.pluralize("dog")
'dogs'
>>> 

Andere Tipps

Another option which supports python 3 is Inflect.

import inflect
engine = inflect.engine()
plural = engine.plural(your_string)

First, it's worth noting that, as the FAQ explains, WordNet cannot generate plural forms.

If you want to use it anyway, you can. With Morphy, WordNet might be able to generate plurals for many nouns… but it still won't help with most irregular nouns, like "children".


Anyway, the easy way to use WordNet from Python is via NLTK. One of the NLTK HOWTO docs explains the WordNet Interface. (Of course it's even easier to just use NLTK without specifying a corpus, but that's not what you asked for.)

There is a lower-level API to WordNet called pywordnet, but I believe it's no longer maintained (it became the foundation for the NLTK integration), and only works with older versions of Python (maybe 2.7, but not 3.x) and of WordNet (only 2.x).

Alternatively, you can always access the C API by using ctypes or cffi or building custom bindings, or access the Java API by using Jython instead of CPython.

Or, of course, you can call the command-line interface via subprocess.


Anyway, at least on some installations, if you give the simple Morphy interface a singular noun, it will return its plural, while if you give it a plural noun, it will return its singular. So:

from nltk.corpus import wordnet as wn
assert wn.morphy('dogs') == 'dog'
assert wn.morphy('dog') == 'dog'

This isn't actually documented, or even implied, to be true, and in fact it's clearly not true for the OP, so I'm not sure I'd want to rely on it (even if it happens to work on your computer).

The other way around is documented to work, so you could write some rules that apply all possible English plural rules, call morphy on each one, and the first one that returns the starting string is the right plural.

However, the way it's documented to work is effectively by blindly applying the same kind of rules. So, for example, it will properly tell you that doges is not the plural of dog—but not because it knows dogs is the right answer; only because it knows doge is a different word, and it likes the "+s" rule more than the "+es" rule. So, this isn't going to be helpful.

Also, as explained above, it has no rules for any irregular plurals—WordNet has no idea that children and child are related in any way.

Also, wn.morphy('reckless') will return 'reckless' rather than None. If you want that, you'll have to test whether it's a noun first. You can do this just sticking with the same interface, although it's a bit hacky:

def plural(word):
    result = wn.morphy(word)
    noun = wn.morphy(word, wn.NOUN)
    if noun in (word, result):
        return result

To do this properly, you will actually need to add a plurals database instead of trying to trick WordNet into doing something it can't do.

Also, a word can have multiple meanings, and they can have different plurals, and sometimes there are even multiple plurals for the same meaning. So you probably want to start with something like (lemma for s in synsets(word, wn.NOUN) for lemma in s.lemmas if lemma.name == word) and then get all appropriate plurals, instead of just returning "the" plural.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top