Question

I want to index the following field:

quick brown fox

So that a search for exactly all 3 words, in any order, will hit, but a search for only a few of the words, (like 'brown fox'), will miss.

In other words, I want to create an index/query so the following is true:

# input for field
quick brown fox

# hits
brown fox quick
fox quick brown
quick brown fox
...

# misses
quick brown
fox
quick brown fox red
...

It seems like I'd have to make a custom tokenizer to do this. A tokenizer that explodes an input stream into these keyword permutations. I'm not sure where to begin. I'm using ruby/tire. Is that the right idea? How do I go about making my own tokenizer?

Was it helpful?

Solution

I would sort these words alphabetically, combine them together using some separator that couldn't appear in the words and index them as a single not analyzed token. So, quick brown fox would be indexed as brown-fox-quick. You would need to preform this operation during both indexing and searching. Ideally, this would be done by an analyzer, but I am not aware of any analyzers that would do something like this for you. So you need to write your own custom analyzer (as java plugin) or implement this logic in your code outside of elasticsearch.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top