What Ruzzle board contains the most unique words?

Question 1

tldr;

S   E   R   O
P   I   T   S
L   A   N   E
S   E   R   G

or any of its reflections.

This board contains 1212 words (and as it turns out, you can exclude 'z', 'q' and 'x').

First things first, turns out you're using the wrong dictionary. After not getting exact matches with Ruzzle's word count, I looked into it, it seems Ruzzle uses a dictionary called TWL06, which has around 180,000 words. Don't ask me what it stands for, but it's freely available in txt.

I also wrote code to find all possible words given a 16 character board, as follows. It builds the dictionary into a tree structure, and then pretty much just goes around recursively while there are words to be found. It prints them in order of length. Uniqueness is maintained by the STL set structure.

#include <cstdlib>
#include <ctime>
#include <map>
#include <string>
#include <set>
#include <algorithm>
#include <fstream>
#include <iostream>

using namespace std;

struct TreeDict {

    bool existing;
    map<char, TreeDict> sub;

    TreeDict() {
        existing = false;
    }

    TreeDict& operator=(TreeDict &a) {

        existing = a.existing;
        sub = a.sub;
        return *this;
    }

    void insert(string s) {

        if(s.size() == 0) {

            existing = true;
            return;
        }

        sub[s[0]].insert(s.substr(1));
    }

    bool exists(string s = "") {

        if(s.size() == 0)
            return existing;

        if(sub.find(s[0]) == sub.end())
            return false;

        return sub[s[0]].exists(s.substr(1));
    }

    TreeDict* operator[](char alpha) {

        if(sub.find(alpha) == sub.end())
            return NULL;

        return &sub[alpha];
    }
};

TreeDict DICTIONARY;

set<string> boggle_h(const string board, string word, int index, int mask, TreeDict *dict) {

    if(index < 0 || index >= 16 || (mask & (1 << index)))
        return set<string>();

    word += board[index];
    mask |= 1 << index;
    dict = (*dict)[board[index]];

    if(dict == NULL)
        return set<string>();

    set<string> rt;

    if((*dict).exists())
        rt.insert(word);

    if((*dict).sub.empty())
        return rt;

    if(index % 4 != 0) {

        set<string> a = boggle_h(board, word, index - 4 - 1, mask, dict);
        set<string> b = boggle_h(board, word, index - 1, mask, dict);
        set<string> c = boggle_h(board, word, index + 4 - 1, mask, dict);

        rt.insert(a.begin(), a.end());
        rt.insert(b.begin(), b.end());
        rt.insert(c.begin(), c.end());
    }

    if(index % 4 != 3) {

        set<string> a = boggle_h(board, word, index - 4 + 1, mask, dict);
        set<string> b = boggle_h(board, word, index + 1, mask, dict);
        set<string> c = boggle_h(board, word, index + 4 + 1, mask, dict);

        rt.insert(a.begin(), a.end());
        rt.insert(b.begin(), b.end());
        rt.insert(c.begin(), c.end());
    }

    set<string> a = boggle_h(board, word, index + 4, mask, dict);
    set<string> b = boggle_h(board, word, index - 4, mask, dict);

    rt.insert(a.begin(), a.end());
    rt.insert(b.begin(), b.end());

    return rt;
}

set<string> boggle(string board) {

    set<string> words;
    for(int i = 0; i < 16; i++) {

        set<string> a = boggle_h(board, "", i, 0, &DICTIONARY);
        words.insert(a.begin(), a.end());
    }
    return words;
}

void buildDict(string file, TreeDict &dict = DICTIONARY) {

    ifstream fstr(file.c_str());
    string s;

    if(fstr.is_open()) {
        while(fstr.good()) {

            fstr >> s;
            dict.insert(s);
        }
        fstr.close();
    }
}

struct lencmp {
    bool operator()(const string &a, const string &b) {

        if(a.size() != b.size())
            return a.size() > b.size();
        return a < b;
    }
};

int main() {

    srand(time(NULL));

    buildDict("/Users/XXX/Desktop/TWL06.txt");

    set<string> a = boggle("SEROPITSLANESERG");
    set<string, lencmp> words;
    words.insert(a.begin(), a.end());
    set<string>::iterator it;
    for(it = words.begin(); it != words.end(); it++)
        cout << *it << endl;
    cout << words.size() << " words." << endl;
}

Randomly generating boards and testing against them didn't turn out too effective, expectedly, I didn't really bother with running that, but I'd be surprised if they crossed 200 words. Instead I changed the board generation to generate boards with letters distributed in proportion to their frequency in TWL06, achieved by a quick cumulative frequency (the frequencies were reduced by a factor of 100), below.

string randomBoard() {

    string board = "";
    for(int i = 0; i < 16; i++)
        board += (char)('A' + rand() % 26);
    return board;
}

char distLetter() {

    int x = rand() % 15833;
    if(x < 1209) return 'A';
    if(x < 1510) return 'B';
    if(x < 2151) return 'C';
    if(x < 2699) return 'D';
    if(x < 4526) return 'E';
    if(x < 4726) return 'F';
    if(x < 5161) return 'G';
    if(x < 5528) return 'H';
    if(x < 6931) return 'I';
    if(x < 6957) return 'J';
    if(x < 7101) return 'K';
    if(x < 7947) return 'L';
    if(x < 8395) return 'M';
    if(x < 9462) return 'N';
    if(x < 10496) return 'O';
    if(x < 10962) return 'P';
    if(x < 10987) return 'Q';
    if(x < 12111) return 'R';
    if(x < 13613) return 'S';
    if(x < 14653) return 'T';
    if(x < 15174) return 'U';
    if(x < 15328) return 'V';
    if(x < 15452) return 'W';
    if(x < 15499) return 'X';
    if(x < 15757) return 'Y';
    if(x < 15833) return 'Z';
}

string distBoard() {

    string board = "";
    for(int i = 0; i < 16; i++)
        board += distLetter();
    return board;
}

This was significantly more effective, very easily achieving 400+ word boards. I left it running (for longer than I intended), and after checking over a million boards, the highest found was around 650 words. This was still essentially random generation, and that has its limits.

Instead, I opted for a greedy maximisation strategy, wherein I'd take a board and make a small change to it, and then commit the change only if it increased the word count.

string changeLetter(string x) {

    int y = rand() % 16;
    x[y] = distLetter();
    return x;
}

string swapLetter(string x) {

    int y = rand() % 16;
    int z = rand() % 16;
    char w = x[y];
    x[y] = x[z];
    x[z] = w;
    return x;
}

string change(string x) {

    if(rand() % 2)
        return changeLetter(x);
    return swapLetter(x);
}

int main() {

    srand(time(NULL));

    buildDict("/Users/XXX/Desktop/TWL06.txt");

    string board = "SEROPITSLANESERG";
    int locmax = boggle(board).size();
    for(int j = 0; j < 5000; j++) {

        int changes = 1;

        string board2 = board;
        for(int k = 0; k < changes; k++)
            board2 = change(board);

        int loc = boggle(board2).size();
        if(loc >= locmax && board != board2) {
            j = 0;
            board = board2;
            locmax = loc;
        }
    }
}

This very rapidly got me 1000+ word boards, with generally similar letter patterns, despite randomised starting points. What leads me to believe that the board given is the best possible board is how it, or one of its various reflections, turned up repeatedly, within the first 100 odd attempts at maximising a random board.

The biggest reason for skepticism is the greediness of this algorithm, and that this somehow would lead to the algorithm missing out better boards. The small changes made are quite flexible in their outcomes – that is, they have the power to completely transform a grid from its (randomised) start position. The number of possible changes, 26*16 for the fresh letter, and 16*15 for the letter swap, are both significantly less than 5000, the number of continuous discarded changes allowed.

The fact that the program was able to repeat this board output within the first 100 odd times implies that the number of local maximums is relatively small, and the probability that there is an undiscovered maximum low.

Although the greedy seemed intuitively right – it shouldn't really be less possible to reach a given grid with the delta changes from a random board – and the two possible changes, a swap and a fresh letter do seem to encapsulate all possible improvements, I changed the program in order to allow it to make more changes before checking for the increase. This again returned the same board, repeatedly.

int main() {

    srand(time(NULL));

    buildDict("/Users/XXX/Desktop/TWL06.txt");

    int glomax = 0;
    int i = 0;
    while(true) {

        string board = distBoard();
        int locmax = boggle(board).size();
        for(int j = 0; j < 500; j++) {

            string board2 = board;
            for(int k = 0; k < 2; k++)
                board2 = change(board);

            int loc = boggle(board2).size();
            if(loc >= locmax && board != board2) {
                j = 0;
                board = board2;
                locmax = loc;
            }
        }

        if(glomax <= locmax) {
            glomax = locmax;
            cout << board << " " << glomax << " words." << endl;
        }

        if(++i % 10 == 0)
            cout << i << endl;
    }
}

Having iterated over this loop around a 1000 times, with this particular board configuration showing up ~10 times, I'm pretty confident that this is for now the Ruzzle board with the most unique words, until the English language changes.

Question 2

Interesting problem. I see (at least, but mainly) two approches

one is to try the hard way to stick as many wordable letters (in all directions) as possible, based on a dictionary. As you said, there are many possible combinations, and that route requires a well elaborated and complex algorithm to reach something tangible
there is another "loose" solution based on probabilities that I like more. You suggested to remove some low-appearance letters to maximize the board yield. An extension of this could be to use more of the high-appearance letters in the dictionary.

A further step could be:

based on the 80k dictionary D, you find out for each l1 letter of our L ensemble of 26 letters the probability that letter l2 precedes or follows l1. This is a L x L probabilities array, and is pretty small, so you could even extend to L x L x L, i.e. considering l1 and l2 what probability has l3 to fit. This is a bit more complex if the algorithm wants to estimate accurate probabilities, as the probas sum depends on the relative position of the 3 letters, for instance in a 'triangle' configuration (eg positions (3,3), (3,4) and (3,5)) the result is probably less yielding than when the letters are aligned [just a supposition]. Why not going up to L x L x L x L, which will require some optimizations...
then you distribute a few high-appearance letters (say 4~6) randomly on the board (having each at least 1 blank cell around in at least 5 of the 8 possible directions) and then use your L x L [xL] probas arrays to complete - meaning based on the existing letter, the next cell is filled with a letter which proba is high given the configuration [again, letters sorted by proba descending, and use randomness if two letters are in a close tie].

For instance, taking only the horizontal configuration, having the following letters in place, and we want to find the best 2 in between ER and TO

  ...ER??TO...

Using L x L, a loop like (l1 and l2 are our two missing letters). Find the absolutely better letters - but bestchoice and bestproba could be arrays instead and keep the - say - 10 best choices. Note: there is no need to keep the proba in the range [0,1] in this case, we can sum up the probas (which don't give a proba - but the number matters. A mathematical proba could be something like p = ( p(l0,l1) + p(l2,l3) ) / 2, l0 and l3 are the R and T in our L x L exemple)

  bestproba = 0
  bestchoice = (none, none)
  for letter l1 in L
    for letter l2 in L
      p = proba('R',l1) + proba(l2,'T')
      if ( p > bestproba ) 
        bestproba = p
        bestchoice = (l1, l2)
      fi
    rof
  rof

the algorithm can take more factors into account, and needs to take the vertical and diagonals into account as well. With L x L x L, more letters in more directions are taken into account, like ER?,R??,??T,?TO - this requires to think more through the algorithm - maybe starting with L x L can give an idea about the relevancy of this algorithm.

Note that a lot of this may be pre-calculated, and the L x L array is of course one of them.