C++ Boggle Solver: Finding Prefixes in a Set

https://stackoverflow.com/questions/9050556

20-04-2021
|

Question

This is for a homework assignment, so I don't want the exact code, but would appreciate any ideas that can help point me in the right direction.

The assignment is to write a boggle solving program. I've got the recursive part down I feel, but I need some insight on how to compare the current sequence of characters to the dictionary.

I'm required to store the dictionary in either a set or sorted list. I've been trying a way to implement this using a set. In order to make the program run faster and not follow dead end paths, I need to check and see if the current sequence of characters exists as a prefix to anything in the set (dictionary).

I've found that set.find() operation only returns true if the string is an exact match. In the lab requirements, the professor mentioned that:

"If the dictionary is stored in a Set, many data structure libraries provide a way to find the string in the Set that is closest to the one you are searching for. Such an operation could be used to quickly find a word with a given prefix."

I've been searching today for a what the professor is describing. I've found a lot of information on tries, but since I'm required to use a list or set, I don't think that will work.

I've also tried looking up algorithms for autocomplete functions, but the ones that I've found seem extremely complicated for what I'm trying to accomplish here.

I also was thinking of using strncmp() to compare the current sequence to a word from the dictionary set, but again, I don't know how exactly that would function in this situation, if at all.

Is it worth it to continue investigating how this would work in a set or should I just try using a sorted list to store my dictionary?

Thanks

Solution

As @Raymond Hettinger mentions in his answer, a trie would be extremely useful here. However, if you either are uncomfortable writing a trie or would prefer to use off-the-shelf components, you can use a cute property of how words are ordered alphabetically to check in O(log n) time whether a given prefix exists. The idea is as follows - suppose for example that you are checking for the prefix "thr." If you'll note, every word that begins with the prefix "thr" must be sandwiched between the strings "thr" and "ths." For example, thr ≤ through < ths, and thr ≤ throat < ths. If you are storing your words in a giant sorted array, you can use a modified version of binary search to find the first word alphabetically at least the prefix of your choice and the first word alphabetically at least the next prefix (formed by taking the last letter of the prefix and incrementing it). If these are the same word, then nothing is between them and the prefix doesn't exist. If they're not, then something is between them and the prefix does it.

Since you're using C++, you can potentially do with a std::vector and the std::lower_bound algorithm. You could also throw all the words into a std::set and use the set's version of lower_bound. For example:

std::set<std::string> dictionary;
std::string prefix = /* ... */

/* Get the next prefix. */
std::string nextPrefix = prefix;
nextPrefix[nextPrefix.length() - 1]++;

/* Check whether there is something with the prefix. */
if (dictionary.lower_bound(prefix) != dictionary.lower_bound(nextPrefix)) {
    /* ... something has that prefix ... */
} else {
    /* ... no word has that prefix ... */
}

That said, the trie is probably a better structure here. If you're interested, there is another data structure called a DAWG (Directed Acyclic Word Graph) that is similar to the trie but uses substantially less memory; in the Stanford introductory CS courses (where Boggle is an assignment), students actually are provided a DAWG containing all the words in the language. There is also another data structure called a ternary search tree that is somewhere in-between a binary search tree and a trie that may be useful here, if you'd like to look into it.

Hope this helps!

OTHER TIPS

The trie is the preferred data structure of choice for this problem.

If you're limited to sets and dictionaries, I would choose a dictionary that maps prefixes to an array of possible matches:

asp -> aspberger aspire
bal -> balloon balance bale baleen ...

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow