Question

Yesterday I had a question where people suggested I use Levenshtein method. Is it a slow query? Maybe I can use something else?

Was it helpful?

Solution

You can use the BENCHMARK function to test the performance:

SELECT BENCHMARK(10000, LEVENSHTEIN('abc', 'abd'));

Maybe test it with different strings similar to your use case.

OTHER TIPS

It depends on your dataset.

I found I could speed it up considerably by only comparing strings of similar length.

How similar in length your strings need to be will depend on your data.

There is an article on the subject here: http://kerbtier.ch/2008/12/30/levenshtein-to-slow-how-to-speed-it-up

If you want it perform well then normalize your schema.

The problem is that in order to determine how similar other data is, the DBMS has to load that data and compare it with the datum. So it has to read through every single row in the table (except the current one) to find 'similar' values. It cannot use indexes to find data which is close to the datum.

If, on the other hand you used a schema like this:

CREATE TABLE member (
   member_id      INT(11),
   member_data    CLOB,
   PRIMARY KEY (member_id));

CREATE TABLE about_member (
   member_id      INT(11),
   metric         VARCHAR(10),
   value          MEDIUMINT(9),
   PRIMARY KEY (member_id, metric),
   KEY by_value (metric, value, member_id));

Note that you about_member (1-1-2-2-1) string should be implemented as seperate rows, e.g.

 member_id     metric      value
 1234          lost        2
 1234          won         3
 1234          drawn       1
 1234          handicap    9

Then you can use the indexes affectively, for example with the following query.

 SELECT compare.member_id, SUM(ABS(compare.value-datum.value)) AS difference
 FROM about_member compare, about_member datum
 WHERE datum.member_id=$MEMBER_TO_COMPARE
 AND compare.member_id<>datum.member_id
 AND compare.metric=datum.metric
 AND compare.metric BETWEEN (datum.metric-1) AND (datum.metric+1) /* tweak here */
 GROUP BY compare.member_id;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top