Here is why -
In jcn we have...
sim(c1, c2) = 1 / distance(c1, c2)
distance(c1, c2) = ic(c1) + ic(c2) - (2 * ic(lcs(c1, c2)))
where c1, c2 are the two concepts,
ic is the information content of the concept.
lcs(c1, c2) is the least common subsumer of c1 and c2.
Now, we don't want distance to be 0 (=> similarity will become
undefined).
distance can be 0 in 2 cases...
(1) ic(c1) = ic(c2) = ic(lcs(c1, c2)) = 0
ic(lcs(c1, c2)) can be 0 if the lcs turns out to be the root
node (information content of the root node is zero). But since
c1 and c2 can never be the root node, ic(c1) and ic(c2) would be 0
only if the 2 concepts have a 0 frequency count, in which case, for
lack of data, we return a relatedness of 0 (similar to the lin case).
Note that the root node ACTUALLY has an information content of
zero. Technically, none of the other concepts can have an information
content value of zero. We assign concepts zero values, when
in reality their information content is undefined (due to zero
frequency counts). To see why look at the formula for information
content: ic(c) = -log(freq(c)/freq(ROOT)) {log(0)? log(1)?}
(2) The second case that distance turns out to be zero is when...
ic(c1) + ic(c2) = 2 * ic(lcs(c1, c2))
(which could have a more likely special case ic(c1) = ic(c2) =
ic(lcs(c1, c2)) if all three turn out to be the same concept.)
How should one handle this?
Intuitively this is the case of maximum relatedness (zero
distance). For jcn this relatedness would be infinity... But we
can't return infinity. And simply returning a 0 wouldn't work...
since here we have found a pair of concepts with maximum
relatedness, and returning a 0 would be like saying that they
aren't related at all.