문제

I have a problem with like-query in my project.The requirement is 'like' srarch in about 10million name, for example:


amy ford phil jedwk edwords jones

I want srarch names that have 'edw', the result should be 'jedwk' and 'edwords'. Just like "select * from table where name like '%edw%'" in sql.


Do you have some algorithm can deal this problem? Thanks!

도움이 되었습니까?

해결책

You might be able to use a variation of suffix tree, where in addition each leaf in the tree holds a reference to the original string this suffix refer to.

This will allow you to follow the path of the input query on the suffix tree, and from there do a DFS to get all leaves reachable from this substring. Each leaf will contain the relevant strings - and you just need to extract them.


PS Note that most Information Retrieval systems do not allow a 'substring' search, only a 'prefix search' - which is much more easier to implement, using a trie for example.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top