Get POS probabilities from Wordnet command line tool
Question
I'm writing some experiments with ruby accessing wordnet through the wn command line tool because I gave up on getting the wordnet gem to work.
I want to be able to lookup the frequencies of senses, ultimately to be able to calculate the probability that a given word is a noun/adjective/verb/adverb.
I've tried the documentation but it's not always so explicit.
Is this possible without using just the wn tool? and am I write in thinking wordnet includes this info?
Solution
As far as I can tell, it does not include frequencies per se, though synsets are ordered from most to least frequent in the return results.
You can get actual frequencies a number of ways. Perhaps the most reliable is to use a POS tagged corpus like the Penn TreeBank, then just compute the values yourself. Unfortunately, getting a free copy of that is difficult if you're not in a university. Another option is to build your own corpus (maybe from blogs, Project Gutenberg books, Wikipedia, whatever), run a POS tagger over it and then compute the frequencies from that. Obviously, this method is going to be skewed, but it's a lot easier than tagging a corpus manually.