From a "show me an example" perspective, here's an example to show how you can use semantic similarity to perform WSD:
from nltk.corpus import wordnet as wn
from nltk.tokenize import word_tokenize
def max_wupa(context_sentence, ambiguous_word):
"""
WSD by Maximizing Wu-Palmer Similarity.
Perform WSD by maximizing the sum of maximum Wu-Palmer score between possible
synsets of all words in the context sentence and the possible synsets of the
ambiguous words (see http://goo.gl/XMq2BI):
{argmax}_{synset(a)}(\sum_{i}^{n}{{max}_{synset(i)}(Wu-Palmer(i,a))}
Wu-Palmer (1994) similarity is based on path length; the similarity between
two synsets accounts for the number of nodes along the shortest path between
them. (see http://acl.ldc.upenn.edu/P/P94/P94-1019.pdf)
"""
result = {}
for i in wn.synsets(ambiguous_word):
result[i] = sum(max([i.wup_similarity(k) for k in wn.synsets(j)]+[0]) \
for j in word_tokenize(context_sentence))
result = sorted([(v,k) for k,v in result.items()],reverse=True)
return result
bank_sents = ['I went to the bank to deposit my money',
'The river bank was full of dead fishes']
ans = max_wupa(bank_sents[0], 'bank')
print ans
print ans[0][1].definition
(source: pyWSD @ github)
Use the above code with care because you need to consider:
- what is really happening when we try to maximize path similarity between all possible synsets of all tokens in context sentence and the possible synsets of the ambiguous word?
- is the maximization even logical if most of the path similarity yields
None
and by chance you get some rogue word that have a related synset to one of the synset of the ambiguous word?