Wrong sort order using utf8_unicode_ci and swedish characters like ö
-
07-01-2021 - |
Question
I am confused about what encoding and collation I should use so my queries returns the correct sort order.
Right now my encoding is UTF-8 Unicode
and collation utf8_unicode_ci
for both firstname
and lastname
.
And the sort order would be something like this:
firstname lastname
aaa aaa
ööö ööö
ooo ooo
ppp ppp
qqq qqq
ööö ööö
should be after qqq qqq
Changing collation to utf8_swedish_ci
works correctly:
firstname lastname
aaa aaa
ooo ooo
ppp ppp
qqq qqq
ööö ööö
But utf8_unicode_520_ci
give me the same result as utf8_unicode_ci
, shouldn't it work too?
The funny part is if I only change firstname
collation to utf8_unicode_520_ci
it seems to work.
Solution
Interestingly, someone addressed this apparent difference.
Matthias Bynens answered the post MySQL Collation utf8_unicode differences abour 4 years ago.
He mentioned "Weight Keys"
- version-4.0.0 UCA weight keys
- utf8_unicode_520_ci is based on UCA 5.2.0 weight keys
For more information, go to The Unicode Consortium.
OTHER TIPS
Where does Ö sort?
- utf8_estonian_ci: Between 'W' and 'X'
- danish, icelandic, swedish: After 'Z'
- utf8_gernan2_ci: As if the two letters 'oe'
- hungarian and turkish: Between 'O' and 'P' (that is, after 'oz')
- Other collations (including unicode, unicode_520, 0900): As if the letter 'O'
These apply to both utf8 and utf8mb4, MySQL 8.0 and before.
Programatically generated collation tester
Danish, icelandic, or swedish will work 'correctly' for 'Ö'. Here are the differences among those three:
In icelandic, AZ < Á < B, E < É, IZ < Í < J, UZ < Ü < V, YZ < Ý < Z
. Also
danish: zz < Ä=Æ=ä=æ < Ö=Ø=ö=ø < Aa=Å=å < Þ=þ
icelandic: zz < Þ=þ < Ä=Æ=ä=æ < Ö=Ø=ö=ø < Å=å
swedish: zz < Å=å < Ä=Æ=ä=æ < Ö=Ø=ö=ø < Þ=þ
(This may not have all the differences.)