You can check for yourself using the Analyze API.
This yields the tokens file
, name
, and pdf
for "file name .pdf"
,
and the tokens file
, and name.pdf
for "file name.pdf"
.
The StandardAnalyzer, or rather the StandardTokenizer, implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29, which says:
Do not break within sequences, such as “3.2”
So, "name.pdf"
is considered a full word by the StandardTokenizer.
For your Query, the SimpleAnalyzer would work. You can use the Analyze API as well as the elasticsearch-inquisitor plugin to test the available analyzers.