I am confused about what encoding and collation I should use so my queries returns the correct sort order.

Right now my encoding is UTF-8 Unicode and collation utf8_unicode_ci for both firstname and lastname.

And the sort order would be something like this:

firstname  lastname
aaa        aaa
ööö        ööö
ooo        ooo
ppp        ppp
qqq        qqq

ööö ööö should be after qqq qqq

Changing collation to utf8_swedish_ci works correctly:

firstname  lastname
aaa        aaa
ooo        ooo
ppp        ppp
qqq        qqq
ööö        ööö

But utf8_unicode_520_ci give me the same result as utf8_unicode_ci, shouldn't it work too?

The funny part is if I only change firstname collation to utf8_unicode_520_ci it seems to work.

有帮助吗?

解决方案

Interestingly, someone addressed this apparent difference.

Matthias Bynens answered the post MySQL Collation utf8_unicode differences abour 4 years ago.

He mentioned "Weight Keys"

For more information, go to The Unicode Consortium.

其他提示

Where does Ö sort?

  • utf8_estonian_ci: Between 'W' and 'X'
  • danish, icelandic, swedish: After 'Z'
  • utf8_gernan2_ci: As if the two letters 'oe'
  • hungarian and turkish: Between 'O' and 'P' (that is, after 'oz')
  • Other collations (including unicode, unicode_520, 0900): As if the letter 'O'

These apply to both utf8 and utf8mb4, MySQL 8.0 and before.

Programatically generated collation tester

Danish, icelandic, or swedish will work 'correctly' for 'Ö'. Here are the differences among those three:

In icelandic, AZ < Á < B, E < É, IZ < Í < J, UZ < Ü < V, YZ < Ý < Z. Also

danish:    zz <          Ä=Æ=ä=æ < Ö=Ø=ö=ø < Aa=Å=å < Þ=þ
icelandic: zz < Þ=þ    < Ä=Æ=ä=æ < Ö=Ø=ö=ø <    Å=å
swedish:   zz <    Å=å < Ä=Æ=ä=æ < Ö=Ø=ö=ø <          Þ=þ

(This may not have all the differences.)

许可以下: CC-BY-SA归因
不隶属于 dba.stackexchange
scroll top