Is there an algorithm to identify the most salient word from a given set of words?

Question

The usual approach is that the less common a word is, the more important it is.

First, choose a corpus that represents your problem domain. Then run a word frequency count over it. You could skip these two sets and use a pre-made list, e.g. http://en.wiktionary.org/wiki/Wiktionary:Frequency_lists and e.g. http://en.wiktionary.org/wiki/Wiktionary:Frequency_lists/PG/2006/04/1-10000 However, making word frequencies is one of the easier things to do in Python/NLTK.

The third step is to find the frequency of each of your input words, and the one with the lowest frequency is the most salient. Or, if this is input to another step and a real number is useful, tf-idf gives you that.

You might want to normalize/stem words first. That will depend on your application. But, if you do, make sure you do it both in the generation stage (i.e. normalize your corpus), and in the usage stage (normalize your inputs).

Here are some examples, using frequency counts from the Word Usage Trends box here at http://www.collinsdictionary.com/dictionary/english/man:

man          0.0289
woman        0.0149
walk         0.0064
shot         0.0049
accident     0.0048

Luckily those numbers match up with the correct answers you gave: accident and shot.