Preserving ORDER BY in SELECT INTO

Question 1

What for?

Point is – data in a table is not ordered. In SQL Server the intrinsic storage order of a table is that of the (if defined) clustered index.

The order in which data is inserted is basically "irrelevant". It is forgotten the moment the data is written into the table.

As such, nothing is gained, even if you get this stuff. If you need an order when dealing with data, you HAVE To put an order by clause on the select that gets it. Anything else is random - i.e. the order you et data is not determined and may change.

So it makes no sense to have a specific order on the insert as you try to achieve.

SQL 101: sets have no order.

Question 2

I know this is a bit old, but I needed to do something similar. I wanted to insert the contents of one table into another, but in a random order. I found that I could do this by using select top n and order by newid(). Without the 'top n', order was not preserved and the second table had rows in the same order as the first. However, with 'top n', the order (random in my case) was preserved. I used a value of 'n' that was greater than the number of rows. So my query was along the lines of:

insert Table2 (T2Col1, T2Col2)
  select top 10000 T1Col1, T1Col2
  from Table1
  order by newid()

Question 3

I've found a specific scenario where we want the new table to be created with a specific order in the columns' content:

Amount of rows is very big (from 200 to 2000 millions of rows), so we are using SELECT INTO instead of CREATE TABLE + INSERT because needs to be loaded as fast as possible (minimal logging). We have tested using the trace flag 610 for loading an already created empty table with a clustered index but still takes longer than the following approach.
We need the data to be ordered by specific columns for query performances, so we are creating a CLUSTERED INDEX just after the table is loaded. We discarded creating a non-clustered index because it would need another read for the data that's not included in the ordered columns from the index, and we discarded creating a full-covering non-clustered index because it would practically double the amount of space needed to hold the table.

It happens that if you manage to somehow create the table with columns already "ordered", creating the clustered index (with the same order) takes a lot less time than when the data isn't ordered. And sometimes (you will have to test your case), ordering the rows in the SELECT INTO is faster than loading without order and creating the clustered index later.

The problem is that SQL Server 2012+ will ignore the ORDER BY column list when doing INSERT INTO or when doing SELECT INTO. It will consider the ORDER BY columns if you specify an IDENTITY column on the SELECT INTO or if the inserted table has an IDENTITY column, but just to determine the identity values and not the actual storage order in the underlying table. In this case, it's likely that the sort will happen but not guaranteed as it's highly dependent on the execution plan.

A trick we have found is that doing a SELECT INTO with the result of a UNION ALL makes the engine perform a SORT (not always an explicit SORT operator, sometimes a MERGE JOIN CONCATENATION, etc.) if you have an ORDER BY list. This way the select into already creates the new table in the order we are going to create the clustered index later and thus the index takes less time to create.

So you can rewrite this query:

SELECT
    FirstColumn = T.FirstColumn,
    SecondColumn = T.SecondColumn
INTO
    #NewTable
FROM
    VeryBigTable AS T
ORDER BY            -- ORDER BY is ignored!
    FirstColumn,
    SecondColumn

to

SELECT
    FirstColumn = T.FirstColumn,
    SecondColumn = T.SecondColumn
INTO
    #NewTable
FROM
    VeryBigTable AS T

UNION ALL

-- A "fake" row to be deleted
SELECT
    FirstColumn = 0,
    SecondColumn = 0

ORDER BY
    FirstColumn,
    SecondColumn

We have used this trick a few times, but I can't guarantee it will always sort. I'm just posting this as a possible workaround in case someone has a similar scenario.

Question 4

Just add top to your sql with a number that is greater than the actual number of rows:

SELECT top 25000 * 
into spx_copy
  from SPX
  order by date

Question 5

You cannot do this with ORDER BY but if you create a Clustered Index on VibeFGEvents.id after your SELECT INTO the table will be sorted on disk by VibeFGEvents.id.

Question 6

I'v made a test on MS SQL 2012, and it clearly shows me, that insert into ... select ... order by makes sense. Here is what I did:

create table tmp1 (id int not null identity, name sysname);
create table tmp2 (id int not null identity, name sysname);

insert into tmp1 (name) values ('Apple');
insert into tmp1 (name) values ('Carrot');
insert into tmp1 (name) values ('Pineapple');
insert into tmp1 (name) values ('Orange');
insert into tmp1 (name) values ('Kiwi');
insert into tmp1 (name) values ('Ananas');
insert into tmp1 (name) values ('Banana');
insert into tmp1 (name) values ('Blackberry');

select * from tmp1 order by id;

And I got this list:

1 Apple
2 Carrot
3 Pineapple
4 Orange
5 Kiwi
6 Ananas
7 Banana
8 Blackberry

No surprises here. Then I made a copy from tmp1 to tmp2 this way:

insert into tmp2 (name)
select name
from tmp1
order by id;

select * from tmp2 order by id;

I got the exact response like before. Apple to Blackberry. Now reverse the order to test it:

delete from tmp2;

insert into tmp2 (name)
select name
from tmp1
order by id desc;

select * from tmp2 order by id;

9 Blackberry
10 Banana
11 Ananas
12 Kiwi
13 Orange
14 Pineapple
15 Carrot
16 Apple

So the order in tmp2 is reversed too, so order by made sense when there is a identity column in the target table!

Question 7

The reason why one would desire this (a specific order) is because you cannot define the order in a subquery, so, the idea is that, if you create a table variable, THEN make a query from that table variable, you would think you would retain the order(say, to concatenate rows that must be in order- say for XML or json), but you can't. So, what do you do? The answer is to force SQL to order it by using TOP in your select (just pick a number high enough to cover all your rows).

Question 8

I have run into the same issue and one reason I have needed to preserve the order is when I try to use ROLLUP to get a weighted average based on the raw data and not an average of what is in that column. For instance, say I want to see the average of profit based on number of units sold by four store locations? I can do this very easily by creating the equation Profit / #Units = Avg. Now I include a ROLLUP in my GROUP BY so that I can also see the average across all locations. Now I think to myself, "This is good info but I want to see it in order of Best Average to Worse and keep the Overall at the bottom (or top) of the list)." The ROLLUP will fail you in this so you take a different approach.

Why not create row numbers based on the sequence (order) you need to preserve?

    SELECT OrderBy = ROW_NUMBER() OVER(PARTITION BY 'field you want to count' ORDER BY 'field(s) you want to use ORDER BY')
    , VibeFGEvents.*  
    FROM VibeFGEvents
    LEFT OUTER JOIN VibeFGEventsStudyStart
    ON 
        CHARINDEX(REPLACE(REPLACE(REPLACE(logName, 'MyVibe ', ''), ' new laptop', ''), ' old laptop', ''), excelFilename) > 0
        AND VibeFGEventsStudyStart.MIN_TitleInstID <= VibeFGEvents.TitleInstID
        AND VibeFGEventsStudyStart.MIN_WinInstId <= VibeFGEvents.WndInstID
    WHERE VibeFGEventsStudyStart.excelFilename IS NOT NULL

Now you can use the OrderBy field from your table to set the order of values. I removed the ORDER BY statement from the query above since it does not affect how the data is loaded to the table.

Question 9

Try using INSERT INTO instead of SELECT INTO

INSERT INTO VibeFGEventsAfterStudyStart 
SELECT VibeFGEvents.* 
FROM VibeFGEvents
LEFT OUTER JOIN VibeFGEventsStudyStart
ON 
    CHARINDEX(REPLACE(REPLACE(REPLACE(logName, 'MyVibe ', ''), ' new laptop', ''), ' old laptop', ''), excelFilename) > 0
    AND VibeFGEventsStudyStart.MIN_TitleInstID <= VibeFGEvents.TitleInstID
    AND VibeFGEventsStudyStart.MIN_WinInstId <= VibeFGEvents.WndInstID
WHERE VibeFGEventsStudyStart.excelFilename IS NOT NULL
ORDER BY VibeFGEvents.id`