Clustering: finding groups of close items within a large collection

Question 1

How is this function defined? Is it pre-computed? If so you can iterate over the computed function representation and retrieve pairs of words based on your criterion. If not, you have no other option but to compute the function between all pairs of words (This is necessary in your graph approach). Instead of the Bron-Kerbosch algorithm, I would go for a randomization+approximation strategy like,

http://dimacs.rutgers.edu/Workshops/Challenge10/abstracts.html#ovelgonne

http://dl.acm.org/citation.cfm?id=1933306.1934471

It is based on an modularity maximization approach. Modularity is the ratio between number of outgoing edges in a cluster to the number of edges within the cluster. In your case, you will be looking for the clusters with ratio 0, and choose the biggest one of them. This algorithm is really fast and works for most datasets that I have worked with. Though a modularity based method might be overkill for this problem, IMHO I feel this is a nice way to think about the problem + the implementation of this algorithm is available online (C implementation by the authors of the paper).

Question 2

What I understand the problem is:

"A-G" stand for the words. "#" stands for the distance between the two words. And you need to find out the pair which the distance <= 2.

  A B C D E F G
A * # # # # # #
B   * # # # # #
C     * # # # #
D       * # # #
E         * # #
F           * #
G             *

Basically it needs 2-level loop, what we can do is to reduce comparison times, only compare "#" part in matrix above. Here is the code in PHP:

$result = array();
while ( ($word = array_shift($arrWordList)) !== NULL ) {
    foreach ($arrWordList as $otherWord ) {
        if ( calc_dist($word, $otherWord) <= 2 ) {
            $result[] = array($word, $otherWord);
        }
    }
}

And you can use $result to continue doing something.

Question 3

Given that you want your criteria to apply to all members of a group, you need a clustering algorithm that allows for overlapping groups. That's a pretty complex subject, so the best I can do is direct you to the literature on clustering algorithms, and in particular C-Means Clustering which is designed for overlapping groups.