Serach for brown
and break
.
Lucene will only return documents that contain both, if your set the parameters right.
题
I have a Lucene index of a big text file (corpus), for some of n-grams I need to find a list of similar words (co-occurrence list).
For example, I have unigram - "table" with term frequency 1500 and I need to get such a co-occurrence list, with co-occurrence counts and the measured co-occurrence strength:
WORD FREQ Dice(Jaccard) coefficient
brown 1286 0.3
break 729 0.2
解决方案
Serach for brown
and break
.
Lucene will only return documents that contain both, if your set the parameters right.