Question

I want to use the synonym tokenfilter in Elasticsearch for an index. I downloaded the Prolog version of WordNet 3.0, and found the wn_s.pl file that Elasticsearch can understand. However, it seems that the file contains synonyms for all sorts of words and phrases, while I am really only interested in supporting synonyms for nouns. Is there a way to extract those type of entries?

Was it helpful?

Solution

Given that the format of wn_s.pl is

s(112947045,1,'usance',n,1,0).
s(200001742,1,'breathe',v,1,25).

A very raw way of doing that would be to execute the following in your terminal to only take the lines from that file that have the ',n,' string.

grep ",n," wn_s.pl > wn_s_nouns_only.pl

The file wn_s_nouns_only.pl will only have the entries that are marked as nouns.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top