Elastic Search Analyzers and Facets

https://stackoverflow.com/questions/10923642

13-06-2021
|

Question

I am evaluating Elastic Search for a client. I have begun playing with the API and succesfully created an index and added documents to the search. The main reason for using Elastic Search is that it provides facets functionality.

I am having trouble understanding Analyzers, Tokenizers and Filters and how do they fit in with facets. I want to be able to use keywords, dates, search terms, etc as my facets.

How would I go about incorporating Analyzers into my search and how can I use it with facets?

Solution

When Elastic Search indexes a string by default, usually it breaks them up into tokens, for example: "Fox jump over the wall" will be tokenized into individual words as "Fox", "jump", "over", "the", "wall".

So what does this do? If you were to search through your documents using the Lucene Query, you may not get the string that you want because Elastic Search will automatically search for tokenized words instead of the entire string, thus your search results will be severely affected.

For example, if you search for "Fox jump over the wall", you will not get you any result. Searching for "Fox" instead will get you a result.

The Analyze API or the analyze term tells Elastic Search not to tokenize the indexed string, so that you can properly search for exact strings, which is particularly useful when you want to do statistical facets on entire strings.

Tokenizers just tokenize strings into individual words and stores them in Elastic Search. As mentioned, these tokens can be queried against using the Search API.

Filters create a subset of your queried result under specific conditions which you specify, thus helping you separate what you need from what you do not need in your search results.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow