Picking the right SQL Server collation for storage

https://stackoverflow.com/questions/1826085

22-07-2019
|

Question

How does the collation impact SQL Server in terms of storage and how does this affect the Unicode and non-unicode data types?

Does the collation impact Unicode storage? or just govern sort rules within the database?
When I use the non-unicode data types what restictions are tied to the collation?
If restrictions apply, what happens when I try to store a character not in database collation of a non-unicode data type?

My understanding is that the Unicode data type can always store the full set of Unicode data while the non-unicode data types storage capabilties depend on the code page (which is defined by the collation) and can only represent a number of common characters in that collation.

Obviously each character in an Unicode data type would at least occupy 2 bytes while the non-unicode data types occupy 1 byte per character (or does this vary with collation as well?)

Set me straight here, how does this work exactly?

Solution

SQL Server stores Unicode data (NTEXT, NVARCHAR) in UCS2, always resulting in 2 bytes per character.

A collation only affects sorting (and casing).

In non-Unicode data types (TEXT, VARCHAR), only a single byte is used per character, and only characters of the collation's code page can be stored (just as you stated). See this MSDN article on collations

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow