Question

I'm writing some experiments with ruby accessing wordnet through the wn command line tool because I gave up on getting the wordnet gem to work.

I want to be able to lookup the frequencies of senses, ultimately to be able to calculate the probability that a given word is a noun/adjective/verb/adverb.

I've tried the documentation but it's not always so explicit.

Is this possible without using just the wn tool? and am I write in thinking wordnet includes this info?

Was it helpful?

Solution

As far as I can tell, it does not include frequencies per se, though synsets are ordered from most to least frequent in the return results.

You can get actual frequencies a number of ways. Perhaps the most reliable is to use a POS tagged corpus like the Penn TreeBank, then just compute the values yourself. Unfortunately, getting a free copy of that is difficult if you're not in a university. Another option is to build your own corpus (maybe from blogs, Project Gutenberg books, Wikipedia, whatever), run a POS tagger over it and then compute the frequencies from that. Obviously, this method is going to be skewed, but it's a lot easier than tagging a corpus manually.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top