Question

I've read all about varchar versus nvarchar. But I didn't see an answer to what I think is a simple question. How do you determine the length of your nvarchar column? For varchar it's very simple: my Description, for example, can have 100 characters, so I define varchar(100). Now I'm told we need to internationalize and support any language. Does this mean I need to change my Description column to nvarchar(200), i.e. simply double the length? (And I'm ignoring all the other issues that are involved with internationalization for the moment.)

Is it that simple?

Was it helpful?

Solution

Generally it is the same as for varchar really. The number is still the maximum number of characters not the data length.

nvarchar(100) allows 100 characters (which would potentially consume 200 bytes in SQL Server).

You might want to allow for the fact that different cultures may take more characters to express the same thing though.

An exception to this is however is if you are using an SC collation (which supports supplementary characters). In that case a single character can potentially take up to 4 bytes.

So worst case would be to double the character value declared.

OTHER TIPS

From microsoft web site:

A common misconception is to think that NCHAR(n) and NVARCHAR(n), the n defines the number of characters. But in NCHAR(n) and NVARCHAR(n) the n defines the string length in byte-pairs (0-4,000). n never defines numbers of characters that can be stored. This is similar to the definition of CHAR(n) and VARCHAR(n). The misconception happens because when using characters defined in the Unicode range 0-65,535, one character can be stored per each byte-pair. However, in higher Unicode ranges (65,536-1,114,111) one character may use two byte-pairs. For example, in a column defined as NCHAR(10), the Database Engine can store 10 characters that use one byte-pair (Unicode range 0-65,535), but less than 10 characters when using two byte-pairs (Unicode range 65,536-1,114,111). For more information about Unicode storage and character ranges, see

https://docs.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql?view=sql-server-ver15

@Musa Calgar - exactly right. That link has the information for the answer to this question.

But to make sure the question itself is clear, we are talking about the 'length' attribute we see when we look at the column definition for a given table, right? That is the storage allocated per column. On the other hand, if we want to know the number of characters for a given string in the table at a given moment you can: "SELECT myColumn, LEN(myColumn) FROM myTable"

But if the storage length is desired, you can drag the table name into the query window using SSMS, highlight it, and use 'Alt-F1' to see the defined lengths of each column.

So as an example, I created a table like this specifiying collations. (Latin1_General_100_CI_AS_SC allows for supplemental characters - that is, characters that take more than just 2 bytes):

CREATE TABLE [dbo].[TestTable1](
    [col1] [varchar](10)  COLLATE Latin1_General_100_CI_AS,
    [col2] [nvarchar](10) COLLATE Latin1_General_100_CI_AS_SC,
    [col3] [nvarchar](10) COLLATE Latin1_General_100_CI_AS
) ON [PRIMARY]

The lengths show up like this (Highlight in query window and Alt-F1):

Column_Name    Type        Length  [...] Collation

col1           varchar      10           Latin1_General_100_CI_AS
col2           nvarchar     20           Latin1_General_100_CI_AS_SC
col3           nvarchar     20           Latin1_General_100_CI_AS

If you insert ASCII characters into the varchar and nvarchar fields, it will allow you to put 10 characters into all of them. There will be an error if you try to put more than 10 characters into those fields:

"String or binary data would be truncated. The statement has been terminated."

If you insert non-ASCII characters like 'ā' you can still put 10 of them into each one, but SQL Server will convert the values going into col1 to the closest known character that fits into 1-byte. In this case, 'ā' will be converted to 'a'.

However, if you insert characters that require 4 bytes to store, like for example, '𠜎', you will only be allowed to put FIVE of them into the varchar and nvarchar fields. Any more than that will result in the truncation error shown above. The varchar field will show question marks because it has no single-byte character that it can convert that input to.

So when you insert five of these '𠜎', do a select of that row using len(<colname>) and you will see this:

col1          len(col1)    col2          len(col2)      col3           len(col3)
??????????    10           𠜎𠜎𠜎𠜎𠜎     5              𠜎𠜎𠜎𠜎𠜎      10

So the length of col2 shows 5 characters since supplemental characters were defined when the table was created (see above CREATE TABLE DDL statement). However, col3 did not have _SC for its collation, so it is showing length 10 for the five characters we inserted. Note that col1 has ten question marks. If we had defined the col1 varchar using the _SC collation instead of the non-supplemental one, it would behave the same way.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top