Semantic Relatedness Algorithms - python [closed]

Question 1

From a "show me an example" perspective, here's an example to show how you can use semantic similarity to perform WSD:

from nltk.corpus import wordnet as wn
from nltk.tokenize import word_tokenize

def max_wupa(context_sentence, ambiguous_word):
  """ 
  WSD by Maximizing Wu-Palmer Similarity.

  Perform WSD by maximizing the sum of maximum Wu-Palmer score between possible 
  synsets of all words in the context sentence and the possible synsets of the 
  ambiguous words (see http://goo.gl/XMq2BI):
  {argmax}_{synset(a)}(\sum_{i}^{n}{{max}_{synset(i)}(Wu-Palmer(i,a))}

  Wu-Palmer (1994) similarity is based on path length; the similarity between 
  two synsets accounts for the number of nodes along the shortest path between 
  them. (see http://acl.ldc.upenn.edu/P/P94/P94-1019.pdf)
  """

  result = {}
  for i in wn.synsets(ambiguous_word):
    result[i] = sum(max([i.wup_similarity(k) for k in wn.synsets(j)]+[0]) \
                    for j in word_tokenize(context_sentence))
  result = sorted([(v,k) for k,v in result.items()],reverse=True)
  return result

bank_sents = ['I went to the bank to deposit my money',
'The river bank was full of dead fishes']
ans = max_wupa(bank_sents[0], 'bank')
print ans
print ans[0][1].definition

(source: pyWSD @ github)

Use the above code with care because you need to consider:

what is really happening when we try to maximize path similarity between all possible synsets of all tokens in context sentence and the possible synsets of the ambiguous word?
is the maximization even logical if most of the path similarity yields None and by chance you get some rogue word that have a related synset to one of the synset of the ambiguous word?

Question 2

Firstly, the OPs sort of confused between relatedness and similarity, the distinction is fine but it's worth noting.

Semantic relatedness measures how related two concepts are, using any kind of relation; algorithms:

Lexical Chains (Hirst and St-Onge, 1998)
Adapted/Extended Sense Overlaps algorithm (Banerjee and Pedersen, 2002/2003)
Vectorized Sense Overlaps (Patwardhan, 2003)

Semantic similarity only considers the IS-A relation (i.e. hypernymy / hyponymy); algorithms:

Wu-Palmer measure (Wu and Palmer 1994)
Resnik measure (Resnik 1995)
Jiang-Conrath measure (Jiang and Conrath 1997)
Leacock-Chodorow measure (Leacock and Chodorow 1998)
Lin measure (Lin 1998)

Resnik, Jiang-Conrath and Lin measures are based on information content. The information content of a synset is -log the sum of all probabilities (computed from corpus frequencies) of all words in that synset (Resnik, 1995).

Wu-Palmer and Leacock-Chodorow are based on path length; the similarity between two concepts /synsets is respective of the number of nodes along the shortest path between them.

The list given above is inexhaustive, but historically, we can see that using similarity measure is sort of outdated since relatedness algorithms considers more relations and should theoretically give more disambiguating power to compare concepts.

Next, efficiency is poorly defined. Is speed or accuracy the concerned? For which task would the semantic relatedness/similarity be applied?

If the task is Word Sense Disambiguation (WSD), then it would be nice to refer to Warin's (2004) thesis: http://goo.gl/6wWums. Or an updated survey is Navigli's (2009) http://dl.acm.org/citation.cfm?id=1459355

And if WSD is concerned, there are more sophisticated tools/techniques, please refer to Anyone know of some good Word Sense Disambiguation software?

References

Satanjeev Banerjee and Ted Pedersen. 2002. An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. In Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing (CICLing '02), Alexander F. Gelbukh (Ed.). Springer-Verlag, London, UK, UK, 136-145.

Satanjeev Banerjee and Ted Pedersen. 2003. Extended gloss overlaps as a measure of semantic relatedness. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, pages 805–810, Acapulco.

Graeme Hirst and David St-Onge, 1998. Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms, chapter 13, pages 305–332. MIT Press, Cambridge, MA.

Siddarth Patwardhan. 2003. Incorporating dictionary and corpus infor- mation into a context vector measure of semantic relatedness. Master’s thesis, University of Minnesota.

(too lazy to list all the citations, please search and append to this answer appropriately)