Find similar/duplicate field values in MySQL (Sphinx-related)
Question
Let's say, I have a table objects
. It has fields id
, name
, misc
.
How can I find rows with similar or duplicate name
values? I can see that MySQL can be used itself for searching duplicate values, but not for similar ones, eg. PHP Hypertext Preprocessor
and PHP Hypertext Postprocessor
(~90% of source value).
Can it be performed with Sphinx? And how?
Solution
I don't know the details of Sphinx, but what you're talking about sounds like calculating Levenshtein Distances. Quickly googling for "sphinx php levenshtein" I found this thread which describes a method that might work for you. Hopefully that gives you something to go on.
OTHER TIPS
The 'suggest' example from sphinx might be useful starting point.
http://code.google.com/p/sphinxsearch/source/browse/trunk/#trunk%2Fmisc%2Fsuggest