Question

My table looks something like this:

| id (int) | sentence (varchar) |

I want to find all rows that are almost the same except for one particular word. Eg:

| 230 | test |
| 321 | test sth |
...
| 329 | is (sth) it?
| 923 | is it?

The word that can be different is sth in this case. Ideally I could use some sort of "array" with the list of words that can be different.

Is this something I could do purely in SQL?

Was it helpful?

Solution

Just an untested quick shot, sorry, but I think you could do something like

SELECT * FROM table GROUP BY REPLACE(text, 'sth', '')

OTHER TIPS

You can use SOUNDEX. So with the examples that you gave, these queries:

SELECT SOUNDEX('test')
SELECT SOUNDEX('test sth')
SELECT SOUNDEX('is (sth) it?')
SELECT SOUNDEX('is it?')

return these results:

T230
T230
I200
I200

That means that the first two and the second two sound like each other. What I can't be sure of is how well this will work with your actual data, you're just going to have to try it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top