Question

I am attempting to make the logic for the opponent in a scrabble game.

I have done a lot of thinking and come to the conclusion that I need to use anagrams and check those anagrams against a list of words in a dictionary file to see if the generated word is in fact a word contained in the dictionary file.

The issue I'm having is optimization. Because this anagram uses recursion and runs up to 8 factorial, there tends to be many 'junk' words generated that do not exist in any dictionary, such as repetitions of a single letter.

There has to be some sort of check to see if the permutations are valid and not just a repetition of 1 character. So far, I'm at a loss for how to do this both quickly and accurately.

In English, words seem to be formed by both vowels and consonants. I was thinking of checking if a word contains at least 1 vowel and at least 1 consonant, however there are some exceptions where words can contain only vowels or only consonants. So this method seems to go out the window.

Now I may be missing something crucial, but short of brute-forcing my way through all permutations I have no real idea of how to check in a way that is sufficiently fast enough for gameplay.


My question is:

Can anyone suggest a method that will work 100% of the time to optimize the number of permutations generated?


I don't need useless ones generated and these turn out to be the bulk of what is generated.

I believe this to a good approach, however at the same time I believe that I must be missing something that is much faster and more appropriate for what I want to achieve.

If anyone could suggest a way to check if words are actually viable or not, OR if you could suggest a better way of approaching the situation, it'd be greatly appreciated.

Thanks.

Was it helpful?

Solution

(disclaimer: pseudocode may not be valid java, even though it looks like it)

It sounds like you have a jumbled collection of letters, and want to find all of the English words that can be spelled using them.

Two strings are anagrams of one another if they compare equal when you sort both of them. Permuting the order of letters in your candidate word to see if any of them are legal English words is expensive. Instead, sort the letters and compare it against your word list:

boolean is_anagram(string word_a, string word_b){
    return sorted(word_a).equals(sorted(word_b));
}

List<string> valid_anagrams(string candidate_word){
    anagrams = new List<string>();
    foreach(string word : list_of_words){
        if (is_anagram(candidate, word)){
            anagrams.push(word);
        }
    }
    return anagrams;
}

This is more efficient if the number of words in your word list is less than the factorial of the size of your candidate word. For instance, the number of legal words in Words With Friends is about 170,000, so you would prefer the above method for checking words of length 9 or more.

If you plan to check a lot of candidate words, then you can save time by saving the sorted forms of all of your valid words. Create a dictionary where the key is a sorted string, and the value is a list of English words that are an anagram of that string. It should look like this:

{
    "act": ["act", "cat", "tab"],
    "abll": ["ball"],
    "aeprs": ["asper", "parse", "pears", "reaps", "spare", "spear"]
}

You can construct this dictionary once at the beginning of your program, like so:

d = new Dictionary<string, List<string>>();
foreach (string word in list_of_words){
    string key = sorted(word)
    if (!d.contains_key(key)){
        d[key] = new List<string>();
    }
    d[key].push(word);
}

then finding valid anagrams for a string is just a matter of accessing the dictionary.

List<string> valid_anagrams(string candidate_word){
    string key = sorted(candidate_word);
    if (!d.contains_key(key)){
        return new List<string>();
    }
    else{
        return d[key];
    }
}

OTHER TIPS

You could build a binary tree(s) from your dictionary, or a weighted graph, and then just traverse the graph(s) with your anagrams, if you want a fast way to check your anagrams. This could become expensive in memory, depending on the size of your dictionary, and building the graph(s) might take some time at initialization.

If going the route of several graphs, you could create a graph for each letter of the alphabet, and then create a 1st degree connection to each letter that follows that letter in your dictionary.

So, say you have the dictionary [and, arm, ant, an, ants, antsy, army]

you would have a graph like follows:

[a][ar:1][an:3]
[ar][arm:2]
[an]["":0][and:1][ant:2]
[arm]["":0][army:1]
[and]["":0]
[ant]["":0][ants:2]
[ants]["":0][antsy:1]
[army]["":0]
[antsy]["":0]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top