0th synset in NLTK wordnet interface

https://stackoverflow.com/questions/18470873

26-06-2022
|

Question

From the semcor corpus (http://www.cse.unt.edu/~rada/downloads.html), there are senses wasn't mapped to the later versions of wordnet. And magically, the mapping can be found in the NLTK WordNet API as such:

>>> from nltk.corpus import wordnet as wn
# Emunerate the possible senses for the lemma 'delayed'
>>> wn.synsets('delayed')
[Synset('delay.v.01'), Synset('delay.v.02'), Synset('stay.v.06'), Synset('check.v.07'), Synset('delayed.s.01')]
>>> wn.synset('delay.v.01')
Synset('delay.v.01')
# Magically, there is a 0th sense of the word!!!
>>> wn.synset('delayed.a.0')
Synset('delayed.s.01')

I've checked the code and the API (http://nltk.googlecode.com/svn/trunk/doc/api/nltk.corpus.reader.wordnet.Synset-class.html, http://nltk.org/_modules/nltk/corpus/reader/wordnet.html) but i can't find how they did the magically mapping that didn't shouldn't exist (e.g. for delayed.a.0 -> delayed.s.01).

Does anyone know which part of the NLTK Wordnet API code does the magical mapping?

Solution

It's a bug I guess. When you do wn.synset('delayed.a.0') the first two lines in the method are:

lemma, pos, synset_index_str = name.lower().rsplit('.', 2)
synset_index = int(synset_index_str) - 1

So in this case the value of synset_index is -1 which is a valid index in python. And it won't fail when looking up in the array of synsets whose lemma is delayed and pos is a.

With this behavior you can do tricky things like:

>>> wn.synset('delay.v.-1')
Synset('stay.v.06')

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow