Question

I have a problem with like-query in my project.The requirement is 'like' srarch in about 10million name, for example:


amy ford phil jedwk edwords jones

I want srarch names that have 'edw', the result should be 'jedwk' and 'edwords'. Just like "select * from table where name like '%edw%'" in sql.


Do you have some algorithm can deal this problem? Thanks!

Was it helpful?

Solution

You might be able to use a variation of suffix tree, where in addition each leaf in the tree holds a reference to the original string this suffix refer to.

This will allow you to follow the path of the input query on the suffix tree, and from there do a DFS to get all leaves reachable from this substring. Each leaf will contain the relevant strings - and you just need to extract them.


PS Note that most Information Retrieval systems do not allow a 'substring' search, only a 'prefix search' - which is much more easier to implement, using a trie for example.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top