BooleanQuery$TooManyClauses exception when using wildcard queries
-
21-09-2019 - |
Question
I'm using Hibernate Search / Lucene to maintain a really simple index to find objects by name - no fancy stuff.
My model classes all extend a class NamedModel
which looks basically as follows:
@MappedSuperclass
public abstract class NamedModel {
@Column(unique = true)
@Field(store = Store.YES, index = Index.UN_TOKENIZED)
protected String name;
}
My problem is that I get a BooleanQuery$TooManyClauses
exception when querying the index for objects with names starting with a specific letter, e.g. "name:l*"
.
A query like "name:lin*"
will work without problems, in fact any query using more than one letter before the wildcard will work.
While searching the net for similar problems, I only found people using pretty complex queries and that always seemed to cause the exception. I don't want to increase maxClauseCount
because I don't think it's a good practice to change limits just because you reach them.
What's the problem here?
Solution
Lucene tries to rewrite your query from simple name:l*
to a query with all terms starting with l in them (something like name:lou OR name:la OR name: ...
) - I believe as this is meant to be faster.
As a workaround, you may use a ConstantScorePrefixQuery
instead of a PrefixQuery
:
// instead of new PrefixQuery(prefix)
new ConstantScoreQuery(new PrefixFilter(prefix));
However, this changes scoring of documents (hence sorting if you rely on score for sorting). As we faced the challenge of needing score (and boost), we decided to go for a solution where we use PrefixQuery
if possible and fallback to ConstantScorePrefixQuery
where needed:
new PrefixQuery(prefix) {
public Query rewrite(final IndexReader reader) throws IOException {
try {
return super.rewrite(reader);
} catch (final TooManyClauses e) {
log.debug("falling back to ConstantScoreQuery for prefix " + prefix + " (" + e + ")");
final Query q = new ConstantScoreQuery(new PrefixFilter(prefix));
q.setBoost(getBoost());
return q;
}
}
};
(As an enhancement, one could use some kind of LRUMap
to cache terms that failed before to avoid going through a costly rewrite again)
I can't help you with integrating this into Hibernate Search though. You might ask after you've switched to Compass ;)