Question

In the NLTK senseval module, senses are of the form HARD1, HARD2, etc. (see source here). However, there doesn't seem to be a way to get the actual definition. I'm trying to implement the Lesk algorithm, and I'm now attempting to check whether the sense predicted by the Lesk algorithm is correct (using a definition from WordNet).

The problem I'm running into is how to unify the WordNet definition with the senseval answer (HARD1, HARD2). Does anybody know how to translate the SENSEVAL sense into a definition, or look it up somewhere?

Was it helpful?

Solution

I ended up finding out that these correspond to the senses in WordNet 1.7, which is pretty archaic (doesn't seem easily installable on Mac OS X or Ubuntu 11.04).

There are no online versions of WordNet 1.7 that I could find.

This site also has some useful information about these three corpora. For example, it says that the six senses of interest were taken from the Longman English Dictionary Online (circa 2001). See here

It describes the source of HARD as WordNet 1.7.

Ultimately, I ended up manually mapping the definitions to those in WordNet 3.0. If you're interested, here's the dictionary. Note, however, that I'm not an expert on linguistics, and they're not exact

# A map of SENSEVAL senses to WordNet 3.0 senses.
# SENSEVAL-2 uses WordNet 1.7, which is no longer installable on most modern
# machines and is not the version that the NLTK comes with.
# As a consequence, we have to manually map the following
# senses to their equivalent(s).
SV_SENSE_MAP = {
    "HARD1": ["difficult.a.01"],    # not easy, requiring great physical or mental
    "HARD2": ["hard.a.02",          # dispassionate
              "difficult.a.01"],
    "HARD3": ["hard.a.03"],         # resisting weight or pressure
    "interest_1": ["interest.n.01"], # readiness to give attention
    "interest_2": ["interest.n.03"], # quality of causing attention to be given to
    "interest_3": ["pastime.n.01"],  # activity, etc. that one gives attention to
    "interest_4": ["sake.n.01"],     # advantage, advancement or favor
    "interest_5": ["interest.n.05"], # a share in a company or business
    "interest_6": ["interest.n.04"], # money paid for the use of money
    "cord": ["line.n.18"],          # something (as a cord or rope) that is long and thin and flexible
    "formation": ["line.n.01","line.n.03"], # a formation of people or things one beside another
    "text": ["line.n.05"],                 # text consisting of a row of words written across a page or computer screen
    "phone": ["telephone_line.n.02"],   # a telephone connection
    "product": ["line.n.22"],       # a particular kind of product or merchandise
    "division": ["line.n.29"],      # a conceptual separation or distinction
    "SERVE12": ["serve.v.02"],       # do duty or hold offices; serve in a specific function
    "SERVE10": ["serve.v.06"], # provide (usually but not necessarily food)
    "SERVE2": ["serve.v.01"],       # serve a purpose, role, or function
    "SERVE6": ["service.v.01"]      # be used by; as of a utility
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top