There is actually a Synset.name()
function to extract the synset name:
>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('dog')[0].name()
u'dog.n.01'
Also there's a Synset.unicode_repr()
which is useful to avoid any encoding/bytecode problems. Going back to the regex:
>>> x = wn.synsets('dog')[0].unicode_repr()
>>> re.sub(r'Synset\((.+)\)','\1',x)
u'\x01'
>>> re.sub(r'Synset\((.+)\)','1',x)
u'1'
>>> re.sub(r'Synset\((.+)\)','\\1',x)
u"'dog.n.01'"
>>> re.sub(r"Synset\(\'(.+)\'\)",'\\1',x)
u'dog.n.01'