Question

I have a database with a VARCHAR column that contains integers of varying length. I want to sort them so 10 comes after 9, not 1, and 70A comes after 70. I was able do this with PATINDEX(), a CTE, and CASE statements in the WHERE clause.

However, I was wondering if there was a collation where this would be unecessary.

Was it helpful?

Solution

No. Collation is about alphabetical sorting, depending on code page, accent, case, width, kana. Numbers characters (0-9) have none of there properties.

So 9 is always after 10B in any sort.

You have to split it up as you noted or sort like this:

ORDER BY
    RIGHT('                              ' + MyColumn, 30)

The length in the right determines how many spaces you have.

You could of course:

  • have 2 columns to make this unnecessary (and far quicker) and have a computed column to combine them
  • insist on leading zeros
  • right justify in a char (a stored version of my RIGHT above)

The latter 2 suggestions are like my RIGHT above and slightly different. Quicker to sort (no processing of the colukmn needed) but more storage required

OTHER TIPS

I would setup a computed column then sort based on that. Something like

CAST( 
     CASE WHEN IS_NUMERIC(left(OtherColumn, 2) = 1) then 
         left(OtherColumn,2) 
     else 
         left(otherColumn, 1)  
AS INT)

Then use this column to sort by as you can now index the column.

If you want a painful way to prove what @gbn is saying (essentially that you can't tell a collation to order substrings differently), you can make a quick #temp table that has a coefficient for the order you expect, and see if ordering by any collation returns the same order:

CREATE TABLE #foo(id INT, n NVARCHAR(10));

CREATE TABLE #bar(collation SYSNAME);

SET NOCOUNT ON;

INSERT #foo SELECT 1,'1'
UNION SELECT 2,'2'
UNION SELECT 3,'3'
UNION SELECT 4,'6'
UNION SELECT 5,'10'
UNION SELECT 6,'10A'
UNION SELECT 7,'10B'
UNION SELECT 8,'11';

DECLARE @sql NVARCHAR(MAX) = N'';

SELECT @sql += N'
    WITH x AS 
    (
        SELECT n, rn = ROW_NUMBER() OVER 
        (ORDER BY n COLLATE ' + name + ') FROM #foo
    ) 
    INSERT #bar 
    SELECT TOP (1) ''' + name + ''' FROM x
    WHERE NOT EXISTS
    (
        SELECT COUNT(*) FROM #foo AS f
        WHERE f.id = x.rn
        AND f.n <> x.n
    );' FROM sys.fn_helpcollations();

EXEC sp_executesql @sql;

SELECT collation FROM #bar;

GO
DROP TABLE #foo, #bar;

This runs for me in about 10 seconds and yields 0 rows - meaning no collation available to SQL Server (at least 2008 R2, haven't tried Denali) will sort in the way you expect. You need a different way to define sorting.

Want a sensible, efficient means of sorting numbers in strings as actual numbers? Consider voting for my Microsoft Connect suggestion: Support "natural sorting" / DIGITSASNUMBERS as a Collation option


While this Question is specific to SQL Server, and this Answer is not, I felt that I should still post this information simply to raise awareness of it and not to be in opposition of any of the other answers.

That being said, outside of SQL Server, in certain environments it is possible to do this type of sorting. It is something that is at least specified in the Unicode documentation. In the UNICODE LOCALE DATA MARKUP LANGUAGE (LDML) PART 5: COLLATION standard / report, there is a chart for Collation Settings that describes various options for tailoring the sorting behavior. One of the options is -kn-true or [numericOrdering on]:

If set to on, any sequence of Decimal Digits (General_Category = Nd in the [UAX44]) is sorted at a primary level with its numeric value. For example, "A-21" < "A-123". The computed primary weights are all at the start of the digit reordering group. Thus with an untailored UCA table, "a$" < "a0" < "a2" < "a12" < "a⓪" < "aa".

However, this document is a "technical standard" and not a part of the core Unicode specification. A note at the top of the document states:

A Unicode Technical Standard (UTS) is an independent specification. Conformance to the Unicode Standard does not imply conformance to any UTS.

Hence, this particular behavior is not available in SQL Server or even in .NET (at least not natively), even though both do conform to the core Unicode specification.

The ICU project (International Components for Unicode) is a set of C/C++ and Java libraries that implements this functionality, and there is even an online demo of it. And under "related projects" there is a link to a .NET project that seems to be a COM object wrapper for the ICU library that would allow this functionality to be exposed to managed code. But it is not clear if that .NET project is still active.

But to see this behavior in action, go to the ICU Collation Demo.

Paste the following into the Input text area on the left side:

1
2
10B
6
11
10A
3
10

Set all options to "default". Check the "input line numbers" option to the right of the sort button, and make sure that the "diff strengths" option is un-checked.

Click the sort button and you should get back the following:

[1] 1
[8] 10
[6] 10A
[3] 10B
[5] 11
[2] 2
[7] 3
[4] 6

This is what should be expected when doing a typical string sort, and what you are seeing in SQL Server.

Now, in the series of radio buttons just above the sort button, the second row is labeled "numeric". Select the "on" radio button.

Click the sort button again and you should get back the following:

[1] 1
[2] 2
[7] 3
[4] 6
[8] 10
[6] 10A
[3] 10B
[5] 11

Questioning if this works when the numeric portion is in the middle of the string? Ok, paste the following into the Input text area on the left side (replacing the previous list):

Script - 1.sql
Script - 2.sql
Script - 10B.sql
Script - 6.sql
Script - 11.sql
Script - 10A.sql
Script - 3.sql
Script - 10.sql

Make sure that the numeric setting is still set to "on". Click the sort button again and you should get back the following:

[1] Script - 1.sql
[2] Script - 2.sql
[7] Script - 3.sql
[4] Script - 6.sql
[8] Script - 10.sql
[6] Script - 10A.sql
[3] Script - 10B.sql
[5] Script - 11.sql

Want to see this in another place? Create a folder on your harddrive, something like C:\temp\sorting\, and create empty files of those same "Script -..." names. Do a DIR in a command window and you will see the standard sorting. But when looking at the list of files in Windows Explorer you will see the list sorted using the "numeric" option :-).

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top