Implementing Proximity Text Search in Java

Question 1

If there's an accepted standard way to do this, it's to use Lucene. There are some regex gimmicks you can use, like this one from RegexBuddy's library (where word1 and word2 are placeholders for the search terms, and the 3 in {1,3}? is the maximum distance):

\b(?:word1(?:\W+\w+){1,3}?\W+word2|word2(?:\W+\w+){1,3}?\W+word1)\b

Trouble is, this relies on an extremely simplistic, arbitrary notion of what constitutes a word. It doesn't match contractions or hyphenated words, but it does match "words" with digits and underscores in them. You could tweak the regex to deal with those problems, but more will pop up to replace them. And ugly as it already was, each tweak makes the regex that much less readable, that much harder to maintain.

This barely scratches the surface of what full-text search engines save you from. If you have a very specific, tightly constrained task to accomplish, regexes or other "syntax-level" tools might suit. But if you need to work at the semantic level, recognizing natural-language words and phrases, you want a search engine or other dedicated tool.

Question 2

If you are looking for the word to the left you could try this.

String str = "Lucene supports finding words are a within a specific distance away.";
boolean found = false;
int start = str.length() -1;
int end = str.length();

    while ( !found )
    {
        if ( str.substring( start, end).contains( "specific" ) )
        {
            int total = end - start;
            System.out.println( "You word has been found " + total + " characters to the left" );
            found = true;
        }
        else
        {
            start -= 1;
        }
    }