I want to use Wu and Palmer method for computing similarity measure in wordnet,

wp = (2 X depth(lcs)) / (depth(synset1) + depth(synset2))

where lcs is the "least common subsumer" of synset1 and synset2

My question is:

  1. What is "least common subsumer"?
  2. How to compute it?
有帮助吗?

解决方案

According to this paper, Least Common Subsumer of two concepts A and B is "the most specific concept which is an ancestor of both A and B", where the concept tree is defined by the is-a relation. A concept is defined to be an ancestor of other concept just like the way you define ancestor in human family tree, which is the parent of the other concept, the grandparents, and so on. For example:

  1. A car is an automobile, and an automobile is a vehicle
  2. A boat is a vehicle.
  3. Vehicle is an object.

And the graph:

    Object
      |
    Vehicle
      |
  ---------
  |       |
 Boat  Automobile
          |
         Car

In this case, "automobile" is the parent (and also ancestor) of "car", while "vehicle" is an ancestor of "car". "Vehicle" is also an ancestor of "boat". In this case, the LCS of "boat" and "car" is "vehicle", since it's the most specific concept which is an ancestor of both "boat" and "car". Note that while "object" is a common subsumer of both "boat" and "car", it is not the least, since there is still a child of "object" (in this case it's "vehicle") which is also a common subsumer of both "car" and "boat". "Automobile" is not the least common subsumer since it's not an ancestor of "boat".

To compute the similarity measure, I suggest you to use available library, otherwise you will need to build the concept graph yourself, which is troublesome.

In Perl, you can use WordNet::Similarity package

In Python, you can use nltk package, specifically, the wup_similarity

In Java, you can use ws4j package

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top