Question

While working with some legacy data, I want to group the data on a column ignoring spelling mistakes. I think SOUNDEX() could do the job to achieve the desired result. Here is what I tried:

SELECT soundex(AREA)
FROM MASTER
GROUP BY soundex(AREA)
ORDER BY soundex(AREA)

But (obviously) the SOUNDEX returned 4-character code in result rows like this, loosing actual strings:

A131
A200
A236

How could I include at least one occurrence from the group into the query result instead of 4-character code.

Was it helpful?

Solution

SELECT soundex(AREA) as snd_AREA, min(AREA) as AREA_EXAMPLE_1, max(AREA) as AREA_EXAMPLE_2
from MASTER
group by soundex(AREA)
order by AREA_EXAMPLE_1
;

In MySQL you could select group_concat(distinct AREA) as list_area to get all the versions, and I don't know about that in SQL-Server, but min and max give two examples of the areas, and you wanted to discard the diffs anyway.

OTHER TIPS

You could also use row_number() to get one row for each soundex(area) value:

select AREA, snd
from
(
  select AREA, soundex(AREA) snd,
    row_number() over(partition by soundex(AREA)
                      order by soundex(AREA)) rn
  from master
) x
where rn = 1

See SQL Fiddle with Demo

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top