Replace function removing Ones / 1 when removing char(185) (superscript 1)

https://dba.stackexchange.com/questions/250586

15-02-2021
|

Frage

Ok as the title suggests this is occuring.

When using the REPLACE function on char(185) for data in a temporary table, ones ("1") are being removed as well.

SQL SERVER 2008R2
UNICODE Data

DB collation - collate SQL_Latin1_General_CP1_CS_AS Tempdb has default

Any other details I could provide which might be helpful, let me know.

Lösung

CHAR(185), being VARCHAR data and using an 8-bit code page, can be a different character depending on the code page being used (which is determined by the collation of the current database when using the CHAR() function). In many code pages, including the common Windows-1252 (Latin1_General), the character is a superscript 1 (i.e. "¹" ). This character, depending on the collation / code page being used, might have a "best fit" mapping to a regular "1".

Also, when using a Windows collation, case-sensitivity can affect the outcome of matches between subscript, superscript, and regular versions of the same characters (seriously! see example below). This also covers NVARCHAR / Unicode data (including UTF-8 in VARCHAR starting in SQL Server 2019).

When needing to match on specific characters you need to use a binary collation (i.e. one ending in _BIN2 unless on SQL Server 2005 or older, in which case only the _BIN collations are available).

SELECT CHAR(185)
WHERE  CHAR(185) = '1' COLLATE SQL_Latin1_General_CP1_CI_AS
-- no rows (SQL Server collation, code page 1252, case insensitive)



SELECT CHAR(185)
WHERE  CHAR(185) = '1' COLLATE SQL_Latin1_General_CP437_CI_AS
-- 1 row (SQL Server collation, code page 437, case insensitive)

SELECT CHAR(185)
WHERE  CHAR(185) = '1' COLLATE SQL_Latin1_General_CP437_CS_AS
-- 1 row (SQL Server collation, code page 437, case sensitive)



SELECT CHAR(185)
WHERE  CHAR(185) = '1' COLLATE Latin1_General_100_CI_AS
-- 1 row (Windows collation, code page 1252, case insensitive)

SELECT CHAR(185)
WHERE  CHAR(185) = '1' COLLATE Latin1_General_100_CS_AS
-- no rows (Windows collation, code page 1252, case sensitive)



SELECT CHAR(185)
WHERE  CHAR(185) = '1' COLLATE Latin1_General_100_BIN2;
-- no rows (Windows collation, code page 1252, _BIN2 collation; always works!)

As noted above, Unicode data (regardless of type of collation) works the same as VARCHAR data used with a Windows collation (because both scenarios use the same Unicode rules):

SELECT NCHAR(185)
WHERE  NCHAR(185) = N'1' COLLATE Latin1_General_100_CI_AS
-- 1 row (Windows collation, Unicode data, case insensitive)

SELECT NCHAR(185)
WHERE  NCHAR(185) = N'1' COLLATE Latin1_General_100_CS_AS
-- no rows (Windows collation, Unicode data, case sensitive)

Andere Tipps

In case anyone wants to know - the issue was collation. When I wrote the question I didn't realize the #temp table I was using had a different collation than the default for the database I was working within. It was a simple solution and honestly something I shouldve thought of long before coming here with questions.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit dba.stackexchange