Lucene find co-occurence list

https://stackoverflow.com/questions/23500272

java
lucene

16-07-2023
|

문제

I have a Lucene index of a big text file (corpus), for some of n-grams I need to find a list of similar words (co-occurrence list).

For example, I have unigram - "table" with term frequency 1500 and I need to get such a co-occurrence list, with co-occurrence counts and the measured co-occurrence strength:

WORD       FREQ         Dice(Jaccard) coefficient
brown      1286         0.3
break      729          0.2

해결책

Serach for brown and break.

Lucene will only return documents that contain both, if your set the parameters right.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow