What is an efficient data structure for prefix matching?

https://cs.stackexchange.com/questions/104771

05-11-2019
|

Pergunta

I'm looking for an data structure that supports efficient random prefix matching queries (pattern) over a previously known set of words (dictionary). The dictionary is expected to contain about 10,000 words of varying length (I haven't calculated average word length, but I don't expect any word to be more than 80 characters long). Prefix matching in this case would be equivalent to words[i].toLowerCase().startsWith(pattern.toLowerCase()).

A Trie is an obvious choice, and provides linear time search corresponding to the length of the pattern. However, I'm confused whether a Suffix Tree, or a Suffix Array, might provide any improvements over a Trie. It seems a Suffix whatever is commonly used for one text, not multiple. I also have no requirement for supporting the various cool use cases (longest common prefix, or number of times a prefix occurs etc) that Suffix whatever can efficiently support.

I'm looking for a recommendation on which data structure to use, and why. Tries use a lot of space, and if I end up using one, I'd consider compress the branches with nodes with outdegree 1.

For the duplicate button happy readers, I've read this and this question, none seemed directly relevant.

Example:

Dictionary: [banana, apple, banter, orange]

Pattern: ban
Return: banana (any match)

Pattern: grapes
Return: null

Nenhuma solução correta

Licenciado em: CC-BY-SA com atribuição

Não afiliado a cs.stackexchange