Question

I am trying to implement an internal search for my website that can point users in the right direction in case the mistype a word, something like the did you mean : in google search.

Does anybody have an idea how such a search can be done? How can we establish the relevance of the word or the phrase we assume the user intended to search for?

  • i use asp.net and sql server 2005 with FTS (fullTextSearch)

Thank you

Was it helpful?

Solution

You could use an algorithm for determining string similarity and then suggest other string from your search index up to a certain difference.

One of these algorithms is the Levenshtein distance.

However, don't forget searching for existing solutions. I think e.g. Lucene has the capability to search for similar strings.

Btw, here's a related post on this topic: How does the Google “Did you mean?” Algorithm work?

OTHER TIPS

This is done querying through regular expression the closest keywords that match the phrase.

Here is a great article that might help you.

With T-SQL You can use the SOUNDEX function to compare words phonetically.

If you take the users input and then compare it with other words in your database by soundex code, you should be able to come up with a list of 'do you mean'? words.

E.g.

select SOUNDEX('andrew')
select SOUNDEX('androo')

will both produce the same output (A536).

There are better algorithms these days, but soundex is built into sql server.

The simplest approach I can think of is to write a function that returns the degree of mismatch between two words, and you loop through all the words and find the best ones.

I've done this with a branch-and-bound method. Let me dig up the code:

bool matchWithinBound(char* a, char* b, int bound){
  // skip over matching characters
  while(*a && *b && *a == *b){a++; b++;}
  if (*a==0 && *b==0) return true;
  // if bound too low, quit
  if (bound <= 0) return false;
  // try assuming a has an extra character
  if (*a && matchWithinBound(a+1, b, bound-1)) return true;
  // try assuming a had a letter deleted
  if (*b && matchWithinBound(a, b+1, bound-1)) return true;
  // try assuming a had a letter replaced
  if (*a && *b && matchWithinBound(a+1, b+1, bound-1)) return true;
  // try assuming a had two adjacent letters swapped
  if (a[0] && a[1]){
    char temp;
    int success;
    temp = a[0]; a[0] = a[1]; a[1] = temp;
    success = matchWithinBounds(a, b, bound-1);
    temp = a[0]; a[0] = a[1]; a[1] = temp;
    if (success) return true;
  }
  // can try other modifications
  return false;
}

int DistanceBetweenWords(char* a, char* b){
  int bound = 0;
  for (bound = 0; bound < 10; bound++){
    if (matchWithinBounds(a, b, bound)) return bound;
  }
  return 1000;
}

why don't you use google power?, you can consume their suggest service

here is an example on c#

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top