Unicode in SQL Server unique constraints

https://stackoverflow.com/questions/19536783

01-07-2022
|

Question

Consider the following script - the second INSERT statement throws a primary key violation.

BEGIN TRAN

CREATE TABLE UnicodeQuestion
(
    UnicodeCol NVARCHAR(100)
    COLLATE Latin1_General_CI_AI
)

CREATE UNIQUE INDEX UX_UnicodeCol
ON UnicodeQuestion ( UnicodeCol )

INSERT INTO UnicodeQuestion (UnicodeCol) VALUES (N'ae')
INSERT INTO UnicodeQuestion (UnicodeCol) VALUES (N'æ')

ROLLBACK

As I understand it, if I want to have my index treat these values separately, I need to use a binary collation. But there are many binary collations, and they have individual cultures in their names! I don't want culture-sensitive treatment...

Which collation should I use when storing arbitrary Unicode data in nvarchar columns?

Solution

For Unicode data it is irrelevant what binary collation you choose.

For Unicode data types, data comparisons are based on the Unicode code points. For binary collations on Unicode data types, the locale is not considered in data sorts. For example, Latin_1_General_BIN and Japanese_BIN yield identical sorting results when used on Unicode data.

The reason for having locale specific BIN collations is that this determines the code page used when dealing with non Unicode data.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow