Question

create table tbl (
  id int,
  comment varchar(255),
  primary key (id)
);

insert into tbl (id, comment) values ('1', 'dumb,');
insert into tbl (id, comment) values ('2', 'duuumb,');
insert into tbl (id, comment) values ('3', 'dummb');
insert into tbl (id, comment) values ('4', 'duummb');
insert into tbl (id, comment) values ('5', 'very dumb person');

select comment, soundex(comment) 
from tbl;

Result:

+------------------+------------------+
| comment          | soundex(comment) |
+------------------+------------------+
| dumb,            | D510             |
| duuumb,          | D510             |
| dummb            | D510             |
| duummb           | D510             |
| very dumb person | V6351625         |
+------------------+------------------+

I want to find all rows containing 'dumb', including all typos and variations, anywhere in the field.

select comment 
from tbl
where soundex(comment) like '%D510%'

This fails to get the final row #5, how can I also get that row? If there is a better solution than soundex() that would be fine.

Was it helpful?

Solution

This will work for your particular example:

select comment 
from tbl
where soundex(comment) like '%D510%' or comment like '%dumb%';

It won't find misspellings in the comment.

EDIT:

You could do something like this:

select comment
from tbl
where soundex(comment) = soundex('dumb') or
      soundex(substring_index(substring_index(comment, ' ', 2), -1)  = soundex('dumb') or
      soundex(substring_index(substring_index(comment, ' ', 3), -1)  = soundex('dumb') or
      soundex(substring_index(substring_index(comment, ' ', 4), -1)  = soundex('dumb') or
      soundex(substring_index(substring_index(comment, ' ', 5), -1)  = soundex('dumb');

A bit brute force.

The need to do this suggests that you should consider a full text index.

OTHER TIPS

Can you try with MySQL REGEXP? Is a good solution to find a specific word into text.

You can use [[:<:]] and [[:>:]] as word boundaries:

SELECT comment FROM tbl WHERE comment REGEXP '[[:<:]]dumb[[:>:]]'

Assuming you want the whole field and not just the matching element then this should work as soundex will never find an element in the middle of an other element;

select comment from tbl where (soundex(comment) like '%D510%' or comment like '%d%mb')

Edit Changed like for cases where the U is something else ie Damb which gives a soundex of D510

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top