Your problem comes from the fact you're using StandardAnalyzer
. If you read its javadoc, it tells that it's using StandardTokenizer
for token splitting. This means phrases like doc1.txt
will be split into doc1
and txt
.
If you want to match the entire text, you need to use KeywordAnalyzer
- both for indexing and searching. The code below displays the difference: using StandardAnalyzer
tokens are {"doc1", "txt"}
and using KeywordAnalyzer
the only token is doc1.txt
.
String foo = "foo:doc1.txt";
StandardAnalyzer sa = new StandardAnalyzer(Version.LUCENE_34);
TokenStream tokenStream = sa.tokenStream("foo", new StringReader(foo));
while (tokenStream.incrementToken()) {
System.out.println(tokenStream.getAttribute(TermAttribute.class).term());
}
System.out.println("-------------");
KeywordAnalyzer ka = new KeywordAnalyzer();
TokenStream tokenStream2 = ka.tokenStream("foo", new StringReader(foo));
while (tokenStream2.incrementToken()) {
System.out.println(tokenStream2.getAttribute(TermAttribute.class).term());
}