Question

I am using Lucene to index the content of my site and provide a search facility. I also use Lucene's MoreLikeThis to generate a "related pages" facility for the site. My site is multi lingual, so I need to limit the MoreLikeThis to a specific language at a time.

Anyone has an idea on how to do this?

Was it helpful?

Solution 2

I ended up with just splitting into multiple indexes and then perform the MLT query. Otherwise it is too heavy of a request. I hope the Lucene developers will ov

OTHER TIPS

MoreLikeThis returns a Query object.
MoreLikeThis mlt = new MoreLikeThis(ir);
Reader target = ... // orig source of doc you want to find similarities to
Query query = mlt.like( target);

You could create a 2nd query that checks for language. Then wrap both queries using You could create a BooleanQuery, like so:
BooleanQuery booleanQuery = new BooleanQuery();
booleanQuery.add(MoreLikeThisQuery, BooleanClause.Occur.MUST);
booleanQuery.add(languageQuery, BooleanClause.Occur.MUST);

Not very performance efficient but it will get the job done if you have a small corpus.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top