Question

I have been searching for some Java library which can give me information about "Frequency count" of the synset. I checked JWNL and JWI and they don't provide such information. Does anybody know other Java WordNet APIs?

Was it helpful?

Solution 2

each Synset has a frequency indicator, based on corpora.

JAWS - http://lyle.smu.edu/~tspell/jaws offers Synset#getTagCount

Not sure about JWNL and JWI, but look for synset apis in these libraries.

Note: (personal opinion)do not trust this frequency indicator, it is seriously misleading.

OTHER TIPS

I believe this can be done with JWI as well, but it's not very intuitive.

Let's start with a lemmatized word. If you have a word that is not lemmatized, you should use a lemmatizer before searching for the word using JWI.

String         lemma = ... // the lemmatized word
IRAMDictionary dict  = new RAMDictionary(WN_DIR,ILoadPolicy.IMMEDIATE_LOAD);
IIndexWord indexWord = dict.getIndexWord(lemma, POS.NOUN); // or verbs, etc.

List<IWordID> wrdIDs = indexWord.getWordIDs();
for (IWordID id : wrdIDs) {
    IWord word  = dict.getWord(id);
    int   count = dict.getSenseEntry(word.getSenseKey()).getTagCount();
    System.out.println("Synset: "    + word.getSynset().getGloss());
    System.out.println("Frequency: " + count);
}

This may look overly complicated, but note that we started with a word for this little code snippet, not a synset!

In JWI, each IWord uniquely identifies a synset (although a synset will typically have more than word in it), so the approach to computing the frequency of each word sense is quite counter-intuitive (at least to me, it was).

The count is given by the getTagCount() method, for which the documentation states

Returns the tag count for the sense entry. A tag count is a non-negative integer that represents the number of times the sense is tagged in various semantic concordance texts. A count of 0 indicates that the sense has not been semantically tagged.

Keep in mind, though, that the sense counts in WordNet are horribly outdated (as far as I can recall, they have not been updated since 2001).

extjwnl has a function of Word, getUseCount(), which returns what you want:

Here: http://extjwnl.sourceforge.net/javadocs/index.html

For example:

IndexWord word = dictionary.lookupIndexWord(POS.NOUN, exampleWord);

  List<Synset> synset=word.getSenses();
  int nums = word.sortSenses();

  // for each sense of the word
  for (  Synset syn : synset) {

    // get the synonyms of the sense
    PointerTargetTree s = PointerUtils.getSynonymTree(syn, 2 /*depth*/);        
    List<PointerTargetNodeList>  l = s.toList();

    for (PointerTargetNodeList nl : l) {
      for (PointerTargetNode n : nl) {
        Synset ns = n.getSynset();
        if (ns!=null) {
          List<Word> ws = ns.getWords();
          for (Word ww : ws) {
            // ww.getUseCount() is the frequency of occurance as reported by wordnet engine
            println(ww.getLemma(), ww.getUseCount());
            }
          }
        }
      }
    }
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top