Algorithm: given very large file of strings, find lines containing substring

https://cs.stackexchange.com/questions/75875

04-11-2019
|

Pergunta

I'm not sure if cs.stackexchange.com is correct place to ask this question but it looks like most relevant.

On the job interview I was asked a question: lets say we have a very large file, containing a lot of lines with strings. And lets say that one line is a "document". We are building service which takes some string as input and returns numbers of documents containing this string or substring. Which approach would you use?

And I'm stuck with event trying to find a way to do it. Obviously we can just go through the file with every query and count lines containing substring and it will take O(n) time. But there should be more elegant and fast way to do it, right? I have no CS grad, so my knowledge about algorithms and data structures are not systematic.

My intuition says that it should some very basic question (like knackpack problem, for example) and I just don't know right data structure and algorithm to handle it.

It looks like suffix tree could help somehow but I can't find a way - how? Or, maybe, there is some another approach?

Nenhuma solução correta

Licenciado em: CC-BY-SA com atribuição

Não afiliado a cs.stackexchange