Just an untested quick shot, sorry, but I think you could do something like
SELECT * FROM table GROUP BY REPLACE(text, 'sth', '')
Question
My table looks something like this:
| id (int) | sentence (varchar) |
I want to find all rows that are almost the same except for one particular word. Eg:
| 230 | test |
| 321 | test sth |
...
| 329 | is (sth) it?
| 923 | is it?
The word that can be different is sth in this case. Ideally I could use some sort of "array" with the list of words that can be different.
Is this something I could do purely in SQL?
Solution
Just an untested quick shot, sorry, but I think you could do something like
SELECT * FROM table GROUP BY REPLACE(text, 'sth', '')
OTHER TIPS
You can use SOUNDEX
. So with the examples that you gave, these queries:
SELECT SOUNDEX('test')
SELECT SOUNDEX('test sth')
SELECT SOUNDEX('is (sth) it?')
SELECT SOUNDEX('is it?')
return these results:
T230
T230
I200
I200
That means that the first two and the second two sound like each other. What I can't be sure of is how well this will work with your actual data, you're just going to have to try it.