Finding the longest word in a string using dynamic programming?

https://stackoverflow.com/questions/11269266

18-06-2021
|

Question

I have started working on some algorithm problems when I saw a problem asking if I can find the longest word from a string (string does not have spaces just characters). After thinking for some time, I just wanted to confirm if I can use Dynamic Programming for this issue similar to Maximum contiguous sum problem. Here after parsing every character I can call isWord method (already implemented) and then if it is keep going to the next character and increase the word length, if its not then simply reset the counter to zero and start looking for a word from that index. Please let me know if that would be a good approach otherwise please guide me what would be better approach to solve this.

Thanks for your help guys.

-Vik

Solution

This algorithm will not work correctly. Consider the following string:

BENDOCRINE

If you start from the start of the string and scan forward while you still have a word, you will find the word "BEND," then reset the string after that point and pick up from the O. The correct answer here is instead to pick the word "ENDOCRINE," which is much longer.

If you have a static dictionary and want to find the longest word from that dictionary that is contained within a text string, you might want to look at the Aho-Corasick algorithm, which will find every single match of a set of strings inside a text string, and does so extremely efficiently. You could easily modify the algorithm so that it tracks the longest word it has outputted at any time so that it does not output shorter strings than the longest one found so far, in which case the runtime will be O(n + m), where n is the length of your text string to search and m is the total number of characters in all legal English words. Moreover, if you do O(m) preprocessing in advance, from that point forward you can find the longest word in a given string in time O(n), where n is the number of characters in the string.

(As for why it runs in time O(n + m): normally the runtime is O(n + m + z), where z is the number of matches. If you restrict the number of matches outputted so that you never output a shorter word than the longest so far, there can be at most n words outputted. Thus the runtime is O(n + m + n) = O(n + m)).

Hope this helps!

OTHER TIPS

Dynamic programming will not work for your problem:

let seq1 and seq2 be 2 character sequences

isWord(Concatenation(seq1, seq2)) cannot be infered from the values of isWord(seq1) and isWord(seq2)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow