Question

I was hoping for a way to create the following kind of Double word square in Python. I have a given list of words, which should contain the words that are to be used in the word square. The current way that I can think of is the 'brute-force' way which is highly inefficient for anything greater than a 4x4 grid.

S C E N T 
C A N O E 
A R S O N 
R O U S E 
F L E E T

I would appreciate any help.

Was it helpful?

Solution

Graph theory / search algorithms can help here. Assume we've laid out the board like this:

SCENT
CANOE
ARSON
ROUSE
?????

…and that I have >1 word to stick in that last spot. This state of the board (the above) is a node in a tree. Let's say there remain 5 words to stick in that spot: I have 5 more states (nodes) that I can go to. This is more valuable earlier in the process:

SCENT
?????
?????
?????
?????

Again, consider this a node in a tree. Here, maybe I have 13 possible words. Each of those 13 choices takes me to a node with 12 choices, each of those 12 have 11 choices, etc. The tree we're exploring looks like this:

          ?
          |
        SCENT
        ?????
        ?????
        ?????
        ?????
    /---/ | \-----\----\
   /      |   \    \    \
  |       |    \    \    \
SCENT  SCENT    ?    ?    ?
CANOE  ARSON
?????  ?????   ... etc ...
?????  ?????
?????  ?????
| |    /  | \
| |   /   \  \
? ?  /     |  \
   SCENT   ?   ?
   ARSON
   CANOE
   ?????
   ?????

Brute force solutions would explore this tree completely, visiting every node. However, the number of nodes is usually prohibitively high. The idea is to explore it efficiently: if we can determine that a "node" with only two words filled in will never result in any valid solution, we can stop exploring right then and there: we eliminate that node and all nodes below it. Wikipedia's article on Brute-force searching mentions this:

One way to speed up a brute-force algorithm is to reduce the search space, that is, the set of candidate solutions, by using heuristics specific to the problem class. For example, in the eight queens problem the challenge is to place eight queens on a standard chessboard so that no queen attacks any other. Since each queen can be placed in any of the 64 squares, in principle there are 648 = 281,474,976,710,656 possibilities to consider. However, because the queens are all alike, and that no two queens can be placed on the same square, the candidates are all possible ways of choosing of a set of 8 squares from the set all 64 squares; which means 64 choose 8 = 64!/56!/8! = 4,426,165,368 candidate solutions — about 1/60,000 of the previous estimate. Further, no arrangement with two queens on the same row or the same column can be a solution. Therefore, we can further restrict the set of candidates to those arrangements.

As this example shows, a little bit of analysis will often lead to dramatic reductions in the number of candidate solutions, and may turn an intractable problem into a trivial one.

So, how can we do this? This is the part where you have to be clever, honestly. There are probably multiple possible things you could do. For example, when we have two words filled in:

SCENT
CANOE
?????
?????
?????

We see that for this to be a valid state, we need to have words that start with SC, CA, EN, NO, and TE. (Once we fill in the last words, these would be the words along the verticals.) If we lack words starting with those prefixes, then this isn't going to ever generate a valid solution, as we have no possible words for the verticals. If we can quickly determine that, then we can quickly eliminate this state, and all possible outcomes from being in this state: if this is especially early in the brute force, this can be valuable.

We can do such as "prefix check" in O(1) time by generating a set of all prefixes of all of our words. Assuming n words of length L, this takes O(nL) to do. To check if a prefix is valid, we just see if it's in the set.

Now of course, if our word list consists of (for your example) all possible 5 letter words, this will get us nowhere, as every prefix is okay. However, you've likely not been given such a list.

You could extend the prefixes set to be a map of prefixes → words with that prefix. If there's ever just one word in the prefix, we can put it in the board right now. (If it doesn't belong there, then nothing will.) Of course, this will make placing further words on the board more complicated.

Graph theory will also help with brute force: once you see your problem as a graph or a tree, you'll see that the "brute force" is just a tree-explore algorithm: it's just the nodes themselves, and how you move between them that differs.

OTHER TIPS

I would make a list of pairs of words that have a letter in common, along with the indices of the letter in those words, then search through different combinations of intersections until you get a combination of intersections that arranges the words in a square.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top