unicode strings: difference between Habo and Håbo

https://stackoverflow.com/questions/22353426

13-06-2023
|

Question

Now I work with Swedish geography data In Sweden there are two different places: Habo and Håbo

If run query like SELECT * FROM g2_se_raw_zip WHERE province EQUALS 'Håbo' or SELECT * FROM g2_se_raw_zip WHERE province='Håbo' it gives me Habo too.

I have same issues with GROUP BY and other queries

Why it works like this and how to fix it?

Additional info: character_set_client utf8, character_set_connection utf8, character_set_database utf8, character_set_filesystem binary, character_set_results utf8, character_set_server utf8, character_set_system utf8,

Solution

This is a collation issue. The Swedish dictionary treats o-ring and o as distinct letters of the alphabet, whereas the international collation treats them as different variants of the same letter.

These queries should do the trick for you.

SELECT * 
  FROM g2_se_raw_zip 
 WHERE province COLLATE utf8_swedish_ci EQUALS 'Håbo'


SELECT * 
  FROM g2_se_raw_zip 
 WHERE province COLLATE utf8_swedish_ci = 'Håbo'

You may wish to change the collation setting of columns in your database containing Swedish place names to the Swedish collation for the sake of index performance. But, if you're developing a pan-European application you may prefer to ask users to tell you their own national language in their user profiles so you can search in a way that meets their expectations.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow