EnglishAnalyzer
, along with most language-specific analyzers, uses a stemmer. This means that it reduces terms to a stem (or root) of the term, in order to attempt to match more loosely. Mostly this works well, removing suffixes and matching up derived words to a common root. So when I search for "fish", I also find "fished", "fishing" and "fishes".
In this case though, both "activities" and "activation" both reduce to the root of "activ", resulting in the match you are seeing. Another example: "organ", "organic" and "organize" all have the common stem "organ".
You can stem or not, neither approach is perfect. If you don't stem you'll miss relevant results. If you do, you'll hit some odd irrelevant results.
To deal with specific problematic cases, you can define a stemmer exclusion set in EnglishAnalyzer
to prevent stemming just on those specific problematic terms. In this case, I would think of "activation" as the probable term to prevent stemming on, though you could go either way. So I could do something like:
CharArraySet stemExclusionSet = new CharArraySet(VERSION, 1, true);
stemExclusionSet.add("activation");
EnglishAnalyzer englishAnalyzer = new EnglishAnalyzer(VERSION, STOPWORDS, stemExclusionSet);