'LIKE' Query Algorithm

https://stackoverflow.com/questions/21410328

03-10-2022
|

Question

I have a problem with like-query in my project.The requirement is 'like' srarch in about 10million name, for example:

amy ford phil jedwk edwords jones

I want srarch names that have 'edw', the result should be 'jedwk' and 'edwords'. Just like "select * from table where name like '%edw%'" in sql.

Do you have some algorithm can deal this problem? Thanks!

Solution

You might be able to use a variation of suffix tree, where in addition each leaf in the tree holds a reference to the original string this suffix refer to.

This will allow you to follow the path of the input query on the suffix tree, and from there do a DFS to get all leaves reachable from this substring. Each leaf will contain the relevant strings - and you just need to extract them.

PS Note that most Information Retrieval systems do not allow a 'substring' search, only a 'prefix search' - which is much more easier to implement, using a trie for example.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow