Question

I realized a recursive function to solve this problem:

  • With a 4x4 Matrix composed of letters (A...Z), for every letter get all the n-lenght words by following only adiacent and diagonal cells.

My function's idea is to do a recursive call for every letter to every possible direction and then concatenate the string that it is going to be formed. When the lenght of this string will be n, it stops. To prevent the returning back to a letter already used, I created and array of pointers to letter's class which compares every new possible letter with all the letters used. For doing this I created a special class for every letter.

This is the function: (sorry, it's very long)

dirSu means Up - dirGiu Down - dirDx right - dirSx left - and all the Others the diagonal positions

    void decisione(lettera *start, lettera *next, int count, string seq, vector <lettera*> visitate, int quanti) {

        if (next == nullptr) return ;

        if (count == quanti) {
            bool test = false;

            for (int k = 0; k < start->sequenze.size(); k++) { //To prevent same letter to be used
                if (start->sequenze[k] == seq)  {
                    test = true;
                    break;
                }
            }
            if (!test) start->sequenze.push_back(seq);
            return ;
        }

        seq = seq + next->chiave; //String concatenating every call
        count++; //Counter to decide when stop
        visitate.push_back(next); //Array of used letters

        int i;
        bool test;

        //Decide direction
        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirSu) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirSu, count, seq, visitate, quanti);


        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirSu) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirSu, count, seq, visitate, quanti);

        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirGiu) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirGiu, count, seq, visitate, quanti);

        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirSx) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirSx, count, seq, visitate, quanti);

        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirDx) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirDx, count, seq, visitate, quanti);

        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirAdx) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirAdx, count, seq, visitate, quanti);

        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirAsx) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirAsx, count, seq, visitate, quanti);

        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirBdx) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirBdx, count, seq, visitate, quanti);

        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirBsx) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirBsx, count, seq, visitate, quanti);

    }

Well, when I activate the function for words of 3 letters, it is really fast, 4 letters: 0.028s of calculation time per letter in the matrix, 5 letters: 0.225s, 6 letters: 1.891s, 7 letters: 14.914s.

Is there a way to reduce calculation time by optimizing my function? (or removing my stupid logical errors?) Thanks a lot!

The Matrix I used for my tests is:

C A S A
C O S A
B A C I
O T A T

I used this for every speed test (without calculating it by frequency or randomly) and with this (with my recursive function improved since my question) I get in about 90 seconds all the words with 5, 6, 7 letters for every starting letter. There are a lot of words, although it is only a 4x4 Matrix! If I can send it to you privately it could be interesting to see also your solution to the same problem. (I don't know how to contact you here)

Was it helpful?

Solution

Change your algorithm from recursive to iterative (with a loop). It's usually faster. After that, the other optimizations are related to the algorithm and the data structures, in order to reduce the complexity. It's much more difficult to find such optimizations (but is possible).

EDIT: some thoughts on the code, followed by a version of the word-search I've done today:

  • you want only english words (or italian ones, etc...). To check this you should use a dictionary, it is the most efficient structure for look-up of words, both in terms of memory and performances
  • a comment said that this algorithm is strongly recursive. I disagree, I've made an iterative one, with much less copy of vectors, and with word check. Passing an algorithm from recursive to iterative usually reduce the number of copies. But I still agree with one point of this comment: some algorithm may be faster in recursive form (but hey're not so common). Also, this algorithm was not simple to find ;)
  • You should split your code. It will be way more readable, and easier to optimize.
  • You're coding a boggle resolver, so you should be aware that your dictionary must be precise, and should contain all forms of verbs (for example, for 'be': am, are, is, was, were, etc...). I tested it in french because the english word list I have is only 850 words, which is way too short.
  • The conditions on letter with accents in the dictionary are not important, they're here just to avoid using more slots for accents. Also, I striped every composed word (with caret in it) from the text file.
  • On my computer, loading the dictionary (non-exhaustive 350000 french words, 3 Mb) and searching all 5,6 and 7-length words take approximatively 0.6 seconds (i7 2660K, 8Gb RAM, release with /0x, measured by me, not by the computer ;) ). The perf measurement will be relevant only on your machine, for comparison.

So, here is the code. It's a standalone, you just have to compile it, execute with the english (or italian or french) words file (text, one word per line):

#include <string>
#include <iostream>
#include <stdlib.h>
#include <vector>
#include <time.h>
#include <map>
#include <algorithm>

// **************************************************** //
//                                                      //
// Data structure for the words : dictionnary           //
//                                                      //
// **************************************************** //
struct DicNode
{
    const char l;
    bool isEndOfWord;
    DicNode* children[26]; // a - z, case insensitive
    DicNode* parent;
    unsigned int min, max, level;

    DicNode(char c, DicNode* p = 0, bool end = false) : l(c),isEndOfWord(end),parent(p),min((unsigned int)(-1)),max(0), level(parent ? parent->level + 1 : 0)
    {
        for(unsigned int t = 0; t < 26; t++)
            children[t] = 0;
    }

    ~DicNode()
    {
        for(unsigned int t = 0; t < 26; t++)
            if(children[t])
                delete children[t];
    }

    DicNode*& at(char l)
    {
        if(l == 'à' || l == 'ä' || l == 'â')
            l = 'a';
        if(l == 'é' || l == 'è' || l == 'ê' || l == 'ë')
            l = 'e';
        if(l == 'ï' || l == 'î')
            l = 'i';
        if(l == 'ö' || l == 'ô')
            l = 'o';
        if(l == 'ü' || l == 'û' || l == 'ù')
            l = 'u';
        if(l == 'ç')
            l = 'c';
        if(l <= 'Z')
            l = 'a' + l - 'A';

        // Note: undefined behavior if l > 'Z' or l < 'A'.
        return children[(unsigned int)(l - 'a')];
    }

    bool inRange(unsigned int n) const
    {
        return n <= max && n >= min;
    }

    void print(std::string str = "") const
    {
        // Here recursive algorithm is way better because of the object-oriented structure
        if(isEndOfWord)
            std::cout << (str + l) << "\n";

        for(unsigned int t = 0; t < 26; t++)
            if(children[t])
                children[t]->print(str + l);
    }

    std::string toString() const
    {
        std::string str(level, '0');
        const DicNode* node = this;
        while(node && node->level > 0)
        {
            str[node->level - 1] = node->l;
            node = node->parent;
        }
        return str;
    }
};

void addWord(DicNode* root, const std::string& str)
{
    if(!root || str == "") return;
    DicNode* node = root;
    unsigned int i = 0;
    while(i != str.size())
    {
        if(str.size() - i > node->max) node->max = str.size() - i;
        if(str.size() - i < node->min) node->min = str.size() - i;

        char c = tolower(str[i]);
        DicNode*& n = node->at(c);
        if(!n) n = new DicNode(c,node);
        node = n;
        i++;
    }
    node->isEndOfWord = true;
}

// **************************************************** //
//                                                      //
// Data structures for the algorithm                    //
//                                                      //
// **************************************************** //

enum Direction
{
    INVALID,
    NORTH,
    NORTH_EAST,
    EAST,
    SOUTH_EAST,
    SOUTH,
    SOUTH_WEST,
    WEST,
    NORTH_WEST,
    FINISHED
};

Direction incDir(Direction d)
{
    switch(d)
    {
        case NORTH:         return NORTH_EAST;
        case NORTH_EAST:    return EAST;
        case EAST:          return SOUTH_EAST;
        case SOUTH_EAST:    return SOUTH;
        case SOUTH:         return SOUTH_WEST;
        case SOUTH_WEST:    return WEST;
        case WEST:          return NORTH_WEST;
        case NORTH_WEST:    return NORTH;
        default:            return INVALID;
    }
}

Direction operator-(Direction d)
{
    switch(d)
    {
        case NORTH:         return SOUTH;
        case NORTH_EAST:    return SOUTH_WEST;
        case EAST:          return WEST;
        case SOUTH_EAST:    return NORTH_WEST;
        case SOUTH:         return NORTH;
        case SOUTH_WEST:    return NORTH_EAST;
        case WEST:          return EAST;
        case NORTH_WEST:    return SOUTH_EAST;
        default:            return INVALID;
    }
}

std::string toString(Direction d)
{
    switch(d)
    {
        case NORTH:         return "NORTH";
        case NORTH_EAST:    return "NORTH_EAST";
        case EAST:          return "EAST";
        case SOUTH_EAST:    return "SOUTH_EAST";
        case SOUTH:         return "SOUTH";
        case SOUTH_WEST:    return "SOUTH_WEST";
        case WEST:          return "WEST";
        case NORTH_WEST:    return "NORTH_WEST";
        default:            return "INVALID";
    }
}

struct Cell
{
    char l;
    DicNode* node;
    Direction dir, dirParent;
    Cell(char l_ = 'A') : l(l_),node(0),dir(INVALID),dirParent(INVALID) {}
};

struct ProbLetter
{
    char letter;
    float proba;
};

class Map
{
    public:
        Map(unsigned int w, unsigned int h) : width(w), height(h)
        {
            data = new Cell[width * height];
            for(unsigned int t = 0; t < width * height; t++)
                data[t] = randomEnglishLetter();
        }

        static char randomEnglishLetter()
        {
            // Frequency of english letters
            static ProbLetter probas[26] =
            {
                { 'Z', 0.074f },
                { 'Q', 0.095f },
                { 'X', 0.150f },
                { 'J', 0.153f },
                { 'K', 0.772f },
                { 'V', 0.978f },
                { 'B', 1.492f },
                { 'P', 1.929f },
                { 'Y', 1.974f },
                { 'G', 2.015f },
                { 'F', 2.228f },
                { 'W', 2.360f },
                { 'M', 2.406f },
                { 'U', 2.758f },
                { 'C', 2.782f },
                { 'L', 4.025f },
                { 'D', 4.253f },
                { 'R', 5.987f },
                { 'H', 6.094f },
                { 'S', 6.327f },
                { 'N', 6.749f },
                { 'I', 6.966f },
                { 'O', 7.507f },
                { 'A', 8.167f },
                { 'T', 9.056f },
                { 'E', 12.702f }
            };

            // Random number between 0 and 1
            float p = 100.0f * float(rand() % 10000) / 9999.0f;
            float sum = 0.0f;
            for(unsigned int t = 0; t < 26; t++)
            {
                sum += probas[t].proba;
                if(p < sum) return probas[t].letter;
            }
            return probas[25].letter;
        }

        Cell& operator()(unsigned int x, unsigned int y)
        {
            return data[x + y * width];
        }

        bool inRange(int x, int y)
        {
            return x >= 0 && x < (int)width && y >= 0 && y < (int)height;
        }

        void print()
        {
            for(unsigned int y = 0; y < height; y++)
            {
                for(unsigned int x = 0; x < width; x++)
                    std::cout << "  " << data[x + y * width].l;
                std::cout << "\n";
            }
        }

        unsigned int getWidth() const { return width; }
        unsigned int getHeight() const { return height; }

    private:
        unsigned int width, height;
        Cell* data;
};


// **************************************************** //
//                                                      //
// Word-search algorithm                                //
//                                                      //
// **************************************************** //

inline void advance(int& x, int& y, Direction d)
{
    switch(d)
    {
        case NORTH:
            y--;
            return;
        case NORTH_EAST:
            x++;
            y--;
            return;
        case EAST:
            x++;
            return;
        case SOUTH_EAST:
            x++;
            y++;
            return;
        case SOUTH:
            y++;
            return;
        case SOUTH_WEST:
            x--;
            y++;
            return;
        case WEST:
            x--;
            return;
        case NORTH_WEST:
            x--;
            y--;
            return;
        default:
            return;
    }
}

struct Data
{
    Map& map;
    int x;
    int y;
};

bool descend(Data& dat, unsigned int n)
{
    DicNode* parent = dat.map(dat.x, dat.y).node;
    if(n == parent->level) return false;

    Direction dir = dat.map(dat.x, dat.y).dir;
    if(dir == FINISHED) return false;
    else if(dir == INVALID) dir = NORTH;

    int pX = dat.x, pY = dat.y;
    bool firstIteration = true;
    for(; firstIteration || dir != NORTH; dir = incDir(dir))
    {
        firstIteration = false;
        pX = dat.x;
        pY = dat.y;
        advance(pX, pY, dir);

        if(dat.map.inRange(pX, pY) // (pX, pY) is not outside the map
            && dat.map(pX, pY).node == 0 // The cell is not already used
            && parent->at(dat.map(pX, pY).l) != 0) // An entry in the dictionary exists
        {
            // We found the next node
            dat.map(dat.x, dat.y).dir = dir;
            dat.map(pX, pY).node = parent->at(dat.map(pX, pY).l);
            dat.map(pX, pY).dirParent = -dir;
            dat.x = pX;
            dat.y = pY;

            return true;
        }
    }

    return false;
}

bool ascend(Data& dat)
{
    // Go back on the previous node

    Direction dp = dat.map(dat.x, dat.y).dirParent;
    if(dp == INVALID) return false;

    dat.map(dat.x, dat.y).dir = INVALID;
    dat.map(dat.x, dat.y).dirParent = INVALID;
    dat.map(dat.x, dat.y).node = 0;
    advance(dat.x, dat.y, dp);

    dat.map(dat.x, dat.y).dir = incDir(dat.map(dat.x, dat.y).dir);
    if(dat.map(dat.x, dat.y).dir == NORTH)
        dat.map(dat.x, dat.y).dir = FINISHED;

    return true;
}

void findWordsFromPosition(Map& map, DicNode* parent, unsigned int x, unsigned int y, unsigned int n, std::vector<std::string>& output)
{
    if(!parent) return;

    bool ok = true;

    // Setup first node
    map(x, y).node = parent;
    map(x, y).dir = NORTH;

    // while we can find next node with direction
    // If marked node has right order and is end of word, or if condition on n is not verified:
    //     add it to the output (if condition on n is verified)
    //     no need to go further (because of n), so advance direction of parent cell, reset current cell, and go up for the current position
    Data dat = { map, x, y };
    while(ok)
    {
        if(map(dat.x,dat.y).node->toString() == "ane")
            std::cout << "ok\n";
        if(!descend(dat, n))
            ok = ascend(dat);
        else
        {
            DicNode* node = dat.map(dat.x, dat.y).node;
            if(node->level == n && node->isEndOfWord)
            {
                std::string str = node->toString();
                if(std::find(output.begin(), output.end(), str) == output.end())
                    output.push_back(str);
                ok = ascend(dat);
            }
            else if(!node->inRange(n - node->level))
                ok = ascend(dat);
        }
    }

    // Clean first node
    map(x, y).dir = INVALID;
    map(x, y).dirParent = INVALID;
    map(x, y).node = 0;
}

void getWords(Map& map, DicNode& dic, unsigned int n, std::vector<std::string>& output)
{
    if(n > dic.max || n < dic.min) return;

    // Search words for every position
    for(unsigned int y = 0; y < map.getHeight(); y++)
        for(unsigned int x = 0; x < map.getWidth(); x++)
            findWordsFromPosition(map,dic.at(map(x,y).l),x,y,n,output);
}

#include <fstream>

int main()
{
    srand((unsigned int)time(0));
    // Prepare the words, for example, load them from a real dictionary
    DicNode dictionary(0);
    unsigned int maxSize = 0;

    // Note: the following code make sense only if you load words from a file
    {
        std::ifstream dico("english.txt");
        std::string str;
        int count = 0;
        while(dico >> str)
        {
            if(str.size() > maxSize) maxSize = str.size();
            addWord(&dictionary,str);
            count++;
        }
        std::cout << "Read " << count << " words from dictionary, longest with " << maxSize << " letters.\n";
    }

    // Prepare the matrix
    Map mat(4,4);
    /*
    mat(0,0) = 'A';
    mat(1,0) = 'N';
    mat(2,0) = 'F';
    mat(3,0) = 'N';
    mat(0,1) = 'S';
    mat(1,1) = 'L';
    mat(2,1) = 'E';
    mat(3,1) = 'R';
    mat(0,2) = 'G';
    mat(1,2) = 'O';
    mat(2,2) = 'R';
    mat(3,2) = 'R';
    mat(0,3) = 'S';
    mat(1,3) = 'I';
    mat(2,3) = 'U';
    mat(3,3) = 'I';
    */
    std::cout << "\nMatrix:\n";
    mat.print();

    // Search words
    std::vector<std::string> words;
    for(unsigned int t = 5; t <= 7; t++)
        getWords(mat,dictionary,t,words);
    std::cout << "\nWords:\n";
    if(words.size() == 0)
        std::cout << " (no words)\n";
    else
        for(unsigned int t = 0; t < words.size(); t++)
            std::cout << " " << words[t] << "\n";
}

Note: the previous code was a code-search resolver. It now removed from the answer, but you can still get it by searching in the edit revisions of this answer.

EDIT2: Some verbal explanations

The dictionary. It's a kind of a radix-tree, but each of the node have only one letter to store. It can have up to 26 children (more if you include accents and/or digits), one for each letter of the considered alphabet. The basic layout is the following:

Word tree http://img571.imageshack.us/img571/2281/wtree.png

Example:

An example of a dictionary http://img706.imageshack.us/img706/1244/wordsr.png

Each node also contains a boolean which indicates whether it's the end of a word. Additionnally, each node store the minimum and maximum size of its children branchs, its parent, and its level from the root (= size of the prefix). It allows to stop the search of words when we see that no children branch can give a word with a specific length.

The matrix. The matrix is a double array of letters. But, for more efficiency, I added some data: its corresponding node in the dictionary (see algorithm), the direction to its children (see algorithm) and to its parent (see algorithm).

The algorithm. The idea is to 'circle' in the map, by 'descending' (increasing the path representing the current found string). When descending is not possible, we have to 'ascend':

FIND WORD:
----------

while(we can continue)
    try descend
    if(failed to descend)
        ascend
    else
        node = node of current cell
        if node is end of word
            add it to the found words, then ascend
        else if node is out of range (ie, n = 5, but with node you can only have words with 3 characters)
            ascend

DESCEND:
--------

we're from cell (x,y), with node D
for each direction (from cell.direction to north-west, clockwise)
    advance position (x,y) with direction
    if the position is valid
        and we haven't already visited the cell for this word
        and the node got with D.children[cell.letter] exists
        set the current position of the algorithm to the result of the advance
        set the direction of the old cell to the direction used to advance
        set the direction of the new cell to the opposite (pointing to the old cell)
        set the node of the new cell by the found one
        return

ASCEND:
-------

we're at (x,y)
cell.node = 0 (the cell is not considered as visited anymore)
advance the parent direction by one clockwise (if this direction is north-west, we use a special direction that say that we cannot rotate anymore: FINISHED)
cell.dir = INVALID
advance (x,y) following cell.direction_to_parent
cell.direction_to_parent = INVALID
set the current position to the result of advance

An image explanation:

Algorithm unrolled http://img822.imageshack.us/img822/1374/algow.png

Steps (number indicated in the losanges):

  1. Here is our matrix. I tested it with a french dictionary. Let's say we start from the (0,0) cell (the algorithm works wherever you start).
  2. The first valid direction (word + in map) is east. We advance following it. The node is not the end of a word, and is still valid.
  3. The first valid direction (word + in map) is south-east. We advance following it. The node is not the end of a word, and is still valid.
  4. The first valid direction (word + in map) is east. We advance following it. The node is not the end of a word, and is still valid.
  5. There are no valid directions (either out-of-map, cell already visited (the E), or not in dictionary), so we ascend
  6. The first valid direction (word + in map) is south-east. We advance following it. The node is not the end of a word, and is still valid.
  7. The first valid direction (word + in map) is south. We advance following it. The node is not the end of a word, and is still valid.
  8. The first valid direction (word + in map) is west. We advance following it. The node is the end of a word (which is "anerie"), so we add it to the list of found words, and we ascend. For information, "ânerie" in french is "nonsense" or "stupidity" in english.
  9. And so on...

OTHER TIPS

You are dealing with a massive number of combinations, because you aren't looking for actual words. If you were, you would be able to look up combination of characters and see if a word is still possible in a particular language. Looking up would take more time, but many combinations would be eliminated.
But in the end, how much time are you willing to spend to make the application a bit faster?

Having said that: there might be a possibility in optimization because you pass a vector and a string by value, which both generates memory allocation (certainly for vector, possibly for string). passing an std::array<lettera*,wordlength> instead should be faster and will give you the word too.

First, the number of solutions you're looking grows exponentially with the length, so any solution will be exponential; all you can hope to do is reduce the constant factor.

I would suggest, however, keeping the strings you've already found sorted, and doing a binary search with each new string. For a length of 7, you'll find well over 60000, once the duplicates have been excluded. (On a quick implementation I knocked up myself, this made a difference of a factor of more than 10, with the difference greater the longer the string I was looking for.) Similarly, annotating each cell with a flag indicating whether it is visited or not, rather than having to look it up somehow, should save a lot of time as well.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top