Question

Hello fellow programmers,

I would like to ask for some help with regards to near matches of strings.

Currently, I have a program that stores strings of description, users can search for description by typing it completely or partially.

I would like to implement a near match search. For example the actual description is "hello world" but user erroneously enter a search "hello eorld". The programs should be able to return "hello world" to user.

I've tried looking at pattern and matches to implement it, but it requires a regex to match strings, whereby my description does not have a regular pattern. I've also tried string.contains, but it doesn't seems to work either. Below is part of the code i tried to implement.

    ArrayList <String> list = new ArrayList<String>();
    list.add("hello world");
    list.add("go jogging at london");
    list.add("go fly kite");
    Scanner scan = new Scanner(System.in);

    for(int i = 0; i < list.size(); i++){
      if(list.get(i).contains(scan.next())) {
         System.out.println(list.get(i));
      }
    }

Could fellow programmers help me with this??

Was it helpful?

Solution

You can use LCS(Longest Common Subsequence) see these: http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

public class LCS {

    public static void main(String[] args) {
        String x = StdIn.readString();
        String y = StdIn.readString();
        int M = x.length();
        int N = y.length();

        // opt[i][j] = length of LCS of x[i..M] and y[j..N]
        int[][] opt = new int[M+1][N+1];

        // compute length of LCS and all subproblems via dynamic programming
        for (int i = M-1; i >= 0; i--) {
            for (int j = N-1; j >= 0; j--) {
                if (x.charAt(i) == y.charAt(j))
                    opt[i][j] = opt[i+1][j+1] + 1;
                else 
                    opt[i][j] = Math.max(opt[i+1][j], opt[i][j+1]);
            }
        }

        // recover LCS itself and print it to standard output
        int i = 0, j = 0;
        while(i < M && j < N) {
            if (x.charAt(i) == y.charAt(j)) {
                System.out.print(x.charAt(i));
                i++;
                j++;
            }
            else if (opt[i+1][j] >= opt[i][j+1]) i++;
            else                                 j++;
        }
        System.out.println();

    }

}

Other solution is Aho–Corasick string matching algorithm see this : Fast algorithm for searching for substrings in a string

OTHER TIPS

The Levenstein Distance might be useful for this problem. Apache Commons Lang StringUtils has an implementation for it.
Also, the difference method from StringUtils might be interesting, if you want to find out how the strings differ.

The Levenshtein distance is able to qualify the difference between two strings

Here is an implementation taken form here:

public class LevenshteinDistance {
   private static int minimum(int a, int b, int c) {
      return Math.min(Math.min(a, b), c);
   }

   public static int computeLevenshteinDistance(
      CharSequence str1,
      CharSequence str2 )
   {
      int[][] distance = new int[str1.length() + 1][str2.length() + 1];

      for (int i = 0; i <= str1.length(); i++)
         distance[i][0] = i;
      for (int j = 1; j <= str2.length(); j++)
         distance[0][j] = j;

      for (int i = 1; i <= str1.length(); i++)
         for (int j = 1; j <= str2.length(); j++)
            distance[i][j] =
               minimum(
                  distance[i - 1][j] + 1,
                  distance[i][j - 1] + 1,
                  distance[i - 1][j - 1] +
                     ((str1.charAt(i - 1) == str2.charAt(j - 1)) ? 0 : 1));

      return distance[str1.length()][str2.length()];
   }
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top