Change your algorithm from recursive to iterative (with a loop). It's usually faster. After that, the other optimizations are related to the algorithm and the data structures, in order to reduce the complexity. It's much more difficult to find such optimizations (but is possible).
EDIT: some thoughts on the code, followed by a version of the word-search I've done today:
- you want only english words (or italian ones, etc...). To check this you should use a dictionary, it is the most efficient structure for look-up of words, both in terms of memory and performances
- a comment said that this algorithm is strongly recursive. I disagree, I've made an iterative one, with much less copy of vectors, and with word check. Passing an algorithm from recursive to iterative usually reduce the number of copies. But I still agree with one point of this comment: some algorithm may be faster in recursive form (but hey're not so common). Also, this algorithm was not simple to find ;)
- You should split your code. It will be way more readable, and easier to optimize.
- You're coding a boggle resolver, so you should be aware that your dictionary must be precise, and should contain all forms of verbs (for example, for 'be': am, are, is, was, were, etc...). I tested it in french because the english word list I have is only 850 words, which is way too short.
- The conditions on letter with accents in the dictionary are not important, they're here just to avoid using more slots for accents. Also, I striped every composed word (with caret in it) from the text file.
- On my computer, loading the dictionary (non-exhaustive 350000 french words, 3 Mb) and searching all 5,6 and 7-length words take approximatively 0.6 seconds (i7 2660K, 8Gb RAM, release with /0x, measured by me, not by the computer ;) ). The perf measurement will be relevant only on your machine, for comparison.
So, here is the code. It's a standalone, you just have to compile it, execute with the english (or italian or french) words file (text, one word per line):
#include <string>
#include <iostream>
#include <stdlib.h>
#include <vector>
#include <time.h>
#include <map>
#include <algorithm>
// **************************************************** //
// //
// Data structure for the words : dictionnary //
// //
// **************************************************** //
struct DicNode
{
const char l;
bool isEndOfWord;
DicNode* children[26]; // a - z, case insensitive
DicNode* parent;
unsigned int min, max, level;
DicNode(char c, DicNode* p = 0, bool end = false) : l(c),isEndOfWord(end),parent(p),min((unsigned int)(-1)),max(0), level(parent ? parent->level + 1 : 0)
{
for(unsigned int t = 0; t < 26; t++)
children[t] = 0;
}
~DicNode()
{
for(unsigned int t = 0; t < 26; t++)
if(children[t])
delete children[t];
}
DicNode*& at(char l)
{
if(l == 'à' || l == 'ä' || l == 'â')
l = 'a';
if(l == 'é' || l == 'è' || l == 'ê' || l == 'ë')
l = 'e';
if(l == 'ï' || l == 'î')
l = 'i';
if(l == 'ö' || l == 'ô')
l = 'o';
if(l == 'ü' || l == 'û' || l == 'ù')
l = 'u';
if(l == 'ç')
l = 'c';
if(l <= 'Z')
l = 'a' + l - 'A';
// Note: undefined behavior if l > 'Z' or l < 'A'.
return children[(unsigned int)(l - 'a')];
}
bool inRange(unsigned int n) const
{
return n <= max && n >= min;
}
void print(std::string str = "") const
{
// Here recursive algorithm is way better because of the object-oriented structure
if(isEndOfWord)
std::cout << (str + l) << "\n";
for(unsigned int t = 0; t < 26; t++)
if(children[t])
children[t]->print(str + l);
}
std::string toString() const
{
std::string str(level, '0');
const DicNode* node = this;
while(node && node->level > 0)
{
str[node->level - 1] = node->l;
node = node->parent;
}
return str;
}
};
void addWord(DicNode* root, const std::string& str)
{
if(!root || str == "") return;
DicNode* node = root;
unsigned int i = 0;
while(i != str.size())
{
if(str.size() - i > node->max) node->max = str.size() - i;
if(str.size() - i < node->min) node->min = str.size() - i;
char c = tolower(str[i]);
DicNode*& n = node->at(c);
if(!n) n = new DicNode(c,node);
node = n;
i++;
}
node->isEndOfWord = true;
}
// **************************************************** //
// //
// Data structures for the algorithm //
// //
// **************************************************** //
enum Direction
{
INVALID,
NORTH,
NORTH_EAST,
EAST,
SOUTH_EAST,
SOUTH,
SOUTH_WEST,
WEST,
NORTH_WEST,
FINISHED
};
Direction incDir(Direction d)
{
switch(d)
{
case NORTH: return NORTH_EAST;
case NORTH_EAST: return EAST;
case EAST: return SOUTH_EAST;
case SOUTH_EAST: return SOUTH;
case SOUTH: return SOUTH_WEST;
case SOUTH_WEST: return WEST;
case WEST: return NORTH_WEST;
case NORTH_WEST: return NORTH;
default: return INVALID;
}
}
Direction operator-(Direction d)
{
switch(d)
{
case NORTH: return SOUTH;
case NORTH_EAST: return SOUTH_WEST;
case EAST: return WEST;
case SOUTH_EAST: return NORTH_WEST;
case SOUTH: return NORTH;
case SOUTH_WEST: return NORTH_EAST;
case WEST: return EAST;
case NORTH_WEST: return SOUTH_EAST;
default: return INVALID;
}
}
std::string toString(Direction d)
{
switch(d)
{
case NORTH: return "NORTH";
case NORTH_EAST: return "NORTH_EAST";
case EAST: return "EAST";
case SOUTH_EAST: return "SOUTH_EAST";
case SOUTH: return "SOUTH";
case SOUTH_WEST: return "SOUTH_WEST";
case WEST: return "WEST";
case NORTH_WEST: return "NORTH_WEST";
default: return "INVALID";
}
}
struct Cell
{
char l;
DicNode* node;
Direction dir, dirParent;
Cell(char l_ = 'A') : l(l_),node(0),dir(INVALID),dirParent(INVALID) {}
};
struct ProbLetter
{
char letter;
float proba;
};
class Map
{
public:
Map(unsigned int w, unsigned int h) : width(w), height(h)
{
data = new Cell[width * height];
for(unsigned int t = 0; t < width * height; t++)
data[t] = randomEnglishLetter();
}
static char randomEnglishLetter()
{
// Frequency of english letters
static ProbLetter probas[26] =
{
{ 'Z', 0.074f },
{ 'Q', 0.095f },
{ 'X', 0.150f },
{ 'J', 0.153f },
{ 'K', 0.772f },
{ 'V', 0.978f },
{ 'B', 1.492f },
{ 'P', 1.929f },
{ 'Y', 1.974f },
{ 'G', 2.015f },
{ 'F', 2.228f },
{ 'W', 2.360f },
{ 'M', 2.406f },
{ 'U', 2.758f },
{ 'C', 2.782f },
{ 'L', 4.025f },
{ 'D', 4.253f },
{ 'R', 5.987f },
{ 'H', 6.094f },
{ 'S', 6.327f },
{ 'N', 6.749f },
{ 'I', 6.966f },
{ 'O', 7.507f },
{ 'A', 8.167f },
{ 'T', 9.056f },
{ 'E', 12.702f }
};
// Random number between 0 and 1
float p = 100.0f * float(rand() % 10000) / 9999.0f;
float sum = 0.0f;
for(unsigned int t = 0; t < 26; t++)
{
sum += probas[t].proba;
if(p < sum) return probas[t].letter;
}
return probas[25].letter;
}
Cell& operator()(unsigned int x, unsigned int y)
{
return data[x + y * width];
}
bool inRange(int x, int y)
{
return x >= 0 && x < (int)width && y >= 0 && y < (int)height;
}
void print()
{
for(unsigned int y = 0; y < height; y++)
{
for(unsigned int x = 0; x < width; x++)
std::cout << " " << data[x + y * width].l;
std::cout << "\n";
}
}
unsigned int getWidth() const { return width; }
unsigned int getHeight() const { return height; }
private:
unsigned int width, height;
Cell* data;
};
// **************************************************** //
// //
// Word-search algorithm //
// //
// **************************************************** //
inline void advance(int& x, int& y, Direction d)
{
switch(d)
{
case NORTH:
y--;
return;
case NORTH_EAST:
x++;
y--;
return;
case EAST:
x++;
return;
case SOUTH_EAST:
x++;
y++;
return;
case SOUTH:
y++;
return;
case SOUTH_WEST:
x--;
y++;
return;
case WEST:
x--;
return;
case NORTH_WEST:
x--;
y--;
return;
default:
return;
}
}
struct Data
{
Map& map;
int x;
int y;
};
bool descend(Data& dat, unsigned int n)
{
DicNode* parent = dat.map(dat.x, dat.y).node;
if(n == parent->level) return false;
Direction dir = dat.map(dat.x, dat.y).dir;
if(dir == FINISHED) return false;
else if(dir == INVALID) dir = NORTH;
int pX = dat.x, pY = dat.y;
bool firstIteration = true;
for(; firstIteration || dir != NORTH; dir = incDir(dir))
{
firstIteration = false;
pX = dat.x;
pY = dat.y;
advance(pX, pY, dir);
if(dat.map.inRange(pX, pY) // (pX, pY) is not outside the map
&& dat.map(pX, pY).node == 0 // The cell is not already used
&& parent->at(dat.map(pX, pY).l) != 0) // An entry in the dictionary exists
{
// We found the next node
dat.map(dat.x, dat.y).dir = dir;
dat.map(pX, pY).node = parent->at(dat.map(pX, pY).l);
dat.map(pX, pY).dirParent = -dir;
dat.x = pX;
dat.y = pY;
return true;
}
}
return false;
}
bool ascend(Data& dat)
{
// Go back on the previous node
Direction dp = dat.map(dat.x, dat.y).dirParent;
if(dp == INVALID) return false;
dat.map(dat.x, dat.y).dir = INVALID;
dat.map(dat.x, dat.y).dirParent = INVALID;
dat.map(dat.x, dat.y).node = 0;
advance(dat.x, dat.y, dp);
dat.map(dat.x, dat.y).dir = incDir(dat.map(dat.x, dat.y).dir);
if(dat.map(dat.x, dat.y).dir == NORTH)
dat.map(dat.x, dat.y).dir = FINISHED;
return true;
}
void findWordsFromPosition(Map& map, DicNode* parent, unsigned int x, unsigned int y, unsigned int n, std::vector<std::string>& output)
{
if(!parent) return;
bool ok = true;
// Setup first node
map(x, y).node = parent;
map(x, y).dir = NORTH;
// while we can find next node with direction
// If marked node has right order and is end of word, or if condition on n is not verified:
// add it to the output (if condition on n is verified)
// no need to go further (because of n), so advance direction of parent cell, reset current cell, and go up for the current position
Data dat = { map, x, y };
while(ok)
{
if(map(dat.x,dat.y).node->toString() == "ane")
std::cout << "ok\n";
if(!descend(dat, n))
ok = ascend(dat);
else
{
DicNode* node = dat.map(dat.x, dat.y).node;
if(node->level == n && node->isEndOfWord)
{
std::string str = node->toString();
if(std::find(output.begin(), output.end(), str) == output.end())
output.push_back(str);
ok = ascend(dat);
}
else if(!node->inRange(n - node->level))
ok = ascend(dat);
}
}
// Clean first node
map(x, y).dir = INVALID;
map(x, y).dirParent = INVALID;
map(x, y).node = 0;
}
void getWords(Map& map, DicNode& dic, unsigned int n, std::vector<std::string>& output)
{
if(n > dic.max || n < dic.min) return;
// Search words for every position
for(unsigned int y = 0; y < map.getHeight(); y++)
for(unsigned int x = 0; x < map.getWidth(); x++)
findWordsFromPosition(map,dic.at(map(x,y).l),x,y,n,output);
}
#include <fstream>
int main()
{
srand((unsigned int)time(0));
// Prepare the words, for example, load them from a real dictionary
DicNode dictionary(0);
unsigned int maxSize = 0;
// Note: the following code make sense only if you load words from a file
{
std::ifstream dico("english.txt");
std::string str;
int count = 0;
while(dico >> str)
{
if(str.size() > maxSize) maxSize = str.size();
addWord(&dictionary,str);
count++;
}
std::cout << "Read " << count << " words from dictionary, longest with " << maxSize << " letters.\n";
}
// Prepare the matrix
Map mat(4,4);
/*
mat(0,0) = 'A';
mat(1,0) = 'N';
mat(2,0) = 'F';
mat(3,0) = 'N';
mat(0,1) = 'S';
mat(1,1) = 'L';
mat(2,1) = 'E';
mat(3,1) = 'R';
mat(0,2) = 'G';
mat(1,2) = 'O';
mat(2,2) = 'R';
mat(3,2) = 'R';
mat(0,3) = 'S';
mat(1,3) = 'I';
mat(2,3) = 'U';
mat(3,3) = 'I';
*/
std::cout << "\nMatrix:\n";
mat.print();
// Search words
std::vector<std::string> words;
for(unsigned int t = 5; t <= 7; t++)
getWords(mat,dictionary,t,words);
std::cout << "\nWords:\n";
if(words.size() == 0)
std::cout << " (no words)\n";
else
for(unsigned int t = 0; t < words.size(); t++)
std::cout << " " << words[t] << "\n";
}
Note: the previous code was a code-search resolver. It now removed from the answer, but you can still get it by searching in the edit revisions of this answer.
EDIT2: Some verbal explanations
The dictionary. It's a kind of a radix-tree, but each of the node have only one letter to store. It can have up to 26 children (more if you include accents and/or digits), one for each letter of the considered alphabet. The basic layout is the following:
Word tree http://img571.imageshack.us/img571/2281/wtree.png
Example:
An example of a dictionary http://img706.imageshack.us/img706/1244/wordsr.png
Each node also contains a boolean which indicates whether it's the end of a word. Additionnally, each node store the minimum and maximum size of its children branchs, its parent, and its level from the root (= size of the prefix). It allows to stop the search of words when we see that no children branch can give a word with a specific length.
The matrix. The matrix is a double array of letters. But, for more efficiency, I added some data: its corresponding node in the dictionary (see algorithm), the direction to its children (see algorithm) and to its parent (see algorithm).
The algorithm. The idea is to 'circle' in the map, by 'descending' (increasing the path representing the current found string). When descending is not possible, we have to 'ascend':
FIND WORD:
----------
while(we can continue)
try descend
if(failed to descend)
ascend
else
node = node of current cell
if node is end of word
add it to the found words, then ascend
else if node is out of range (ie, n = 5, but with node you can only have words with 3 characters)
ascend
DESCEND:
--------
we're from cell (x,y), with node D
for each direction (from cell.direction to north-west, clockwise)
advance position (x,y) with direction
if the position is valid
and we haven't already visited the cell for this word
and the node got with D.children[cell.letter] exists
set the current position of the algorithm to the result of the advance
set the direction of the old cell to the direction used to advance
set the direction of the new cell to the opposite (pointing to the old cell)
set the node of the new cell by the found one
return
ASCEND:
-------
we're at (x,y)
cell.node = 0 (the cell is not considered as visited anymore)
advance the parent direction by one clockwise (if this direction is north-west, we use a special direction that say that we cannot rotate anymore: FINISHED)
cell.dir = INVALID
advance (x,y) following cell.direction_to_parent
cell.direction_to_parent = INVALID
set the current position to the result of advance
An image explanation:
Algorithm unrolled http://img822.imageshack.us/img822/1374/algow.png
Steps (number indicated in the losanges):
- Here is our matrix. I tested it with a french dictionary. Let's say we start from the (0,0) cell (the algorithm works wherever you start).
- The first valid direction (word + in map) is east. We advance following it. The node is not the end of a word, and is still valid.
- The first valid direction (word + in map) is south-east. We advance following it. The node is not the end of a word, and is still valid.
- The first valid direction (word + in map) is east. We advance following it. The node is not the end of a word, and is still valid.
- There are no valid directions (either out-of-map, cell already visited (the E), or not in dictionary), so we ascend
- The first valid direction (word + in map) is south-east. We advance following it. The node is not the end of a word, and is still valid.
- The first valid direction (word + in map) is south. We advance following it. The node is not the end of a word, and is still valid.
- The first valid direction (word + in map) is west. We advance following it. The node is the end of a word (which is "anerie"), so we add it to the list of found words, and we ascend. For information, "ânerie" in french is "nonsense" or "stupidity" in english.
- And so on...