Question

I'm running the latest version of Lucene.Net (3.0.3). (I've also tagged lucene because it is basically the same architecture ...)

I have following Lucene.Net.Analysis.Analyzer class:

public sealed class LowerCaseKeywordAnalyzer : Lucene.Net.Analysis.KeywordAnalyzer
{
    public override TokenStream TokenStream(string fieldName,
                                            TextReader reader)
    {
        var keywordTokenizer = base.TokenStream(fieldName,
                                                reader);
        var asciiFoldingFilter = new ASCIIFoldingFilter(keywordTokenizer);
        var lowerCaseFilter = new LowerCaseFilter(asciiFoldingFilter);

        return lowerCaseFilter;
    }
}

This analyzer removes, beside the different cases, any special characters - so that eg Außendienst becomes aussendienst.

Now I want to search this field with a "prefixQuery" (I have tried Lucene.Net.Search.PrefixQuery before, but this class does not allow any injection of an analyzer). I now do this like so:

var escapedLowerCaseSearchPattern = QueryParser.Escape(searchPattern);
var prefixEscapedLowerCaseSearchPattern = string.Concat(escapedLowerCaseSearchPattern,
                                                        "*");
var queryParser = new QueryParser(/* my lucene version*/,
                                  fieldName,
                                  /* a reference to a static instance of my LowerCaseKeywordAnalyzer */);
var query = queryParser.Parse(prefixEscapedLowerCaseSearchPattern);

1st testcase

searchPattern: Auß
fieldName: Test

actual:

{Test:auß*}

expected:

{Test:auss*}

2nd testcase

searchPattern: Auß test
fieldName: Test

actual:

{Test:auß Test:test*}

expected:

{Test:auss test*}

So, how can I utilize my LowerCaseKeywordAnalyzer with a Lucene.Net.QueryParsers.QueryParser to get the expected result? (or is there any other solution??)

Was it helpful?

Solution

Well, I've tried this:

var escapedLowerCaseSearchPattern = QueryParser.Escape(searchPattern);
var prefixEscapedLowerCaseSearchPattern = string.Concat("\"",
                                                        escapedLowerCaseSearchPattern,
                                                        "*\"");
var queryParser = new QueryParser(/* my lucene version */,
                                  fieldName,
                                  /* a reference to a static instance of my LowerCaseKeywordAnalyzer */);
var query = queryParser.Parse(prefixEscapedLowerCaseSearchPattern);

This generates the very valid query

{Test:auss*}

but does somehow not work...

I remembered that I got results when I've used the Lucene.Net.Search.PrefixQuery with non umlaut searchPatterns ...
Then, I thought ... well ... just use the Lucene.Net.Index.Term inside my Lucene.Net.Search.TermQuery-instance for a Lucene.Net.Search.PrefixQuery:

var escapedLowerCaseSearchPattern = QueryParser.Escape(searchPattern);
var prefixEscapedLowerCaseSearchPattern = string.Concat("\"",
                                                        escapedLowerCaseSearchPattern,
                                                        "\"");
var queryParser = new QueryParser(/* my lucene version */,
                                  fieldName,
                                  /* a reference to a static instance of my LowerCaseKeywordAnalyzer */);
var termQuery = (TermQuery) queryParser.Parse(prefixEscapedLowerCaseSearchPattern);
var term = termQuery.Term;
var prefixQuery = new PrefixQuery(term);

BOOOM!

This generates the same query ({Test:auss*}), but somehow yields a result ... I don't know why, though ...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top