Optimizing a union all order by when selected tables are already sorted

https://stackoverflow.com/questions/20196059

04-08-2022
|

Question

I have say 3+ tables each containing 10+ million rows and each having the same structure as below:

Table1:    ColName | Type
           --------------
              cDT  | DateTime2(7)
              cID  | int
              c3   | ...
            ...    | ...

There is a clustered index on (cDT,cID) such that each individual table is already sorted physically by cDT. cID is used because often I only want rows which contain certain cID.

From these tables I want to create a 'stream' for my application ordered by time (i.e. cDT). This is currently done by the following:

SELECT t.cDT AS cDT, t.cID AS cID, t.c3 AS c3, t.cTAB as cTab
FROM
(
SELECT cDT AS cDT, cID AS cID, c3 AS c3, 'tab1' as cTAB FROM TABLE1
UNION ALL
SELECT cDT AS cDT, cID AS cID, c3 AS c3, 'tab2' as cTAB FROM TABLE2
UNION ALL
SELECT cDT AS cDT, cID AS cID, c3 AS c3, 'tab3' as cTAB FROM TABLE3
)
WHERE t.cID IN (SELECT ID FROM TABLEIDs)
ORDER BY t.cDT

Seeing that my tables are already correctly sorted with a clustered index I am trying to find ways of improving the performance of this query. I tried using views but that didn't work (couldn't created an index on the view). I also tried having a separate unique cDT only column and using joins but that was messy (maybe someone can offer a decent solution using joins?).

The obvious answer is just to put everything into one table. I don't mind doing this on the fly but I don't want do this statically.

Any thoughts of how to optimise a union all query where the incoming tables are all sorted individually and you want a global sorting?

Thanks in Advance.

P.S. optimizing the where statement isn't critical so any solutions ignoring my where statement will still be very much appreciated.

Query Plan:

enter image description here

Solution

The plan that SQL Server generated does not seem very good. It would have been better to merge-union the three tables and to the join to the IDs-table once. Maybe we can trick SQL Server into doing that:

SELECT t.cDT AS cDT, t.cID AS cID, t.c3 AS c3, t.cTAB as cTab
FROM
(
SELECT TOP 1000000000 *
FROM (
 SELECT cDT AS cDT, cID AS cID, c3 AS c3, 'tab1' as cTAB FROM TABLE1
 UNION ALL
 SELECT cDT AS cDT, cID AS cID, c3 AS c3, 'tab2' as cTAB FROM TABLE2
 UNION ALL
 SELECT cDT AS cDT, cID AS cID, c3 AS c3, 'tab3' as cTAB FROM TABLE3
) x
ORDER BY cDT,cID --CI order
)
WHERE t.cID IN (SELECT ID FROM TABLEIDs)
ORDER BY t.cDT

This practically unlimited TOP clause might cause it to evaluate the union before doing the join. The order-by should help maintain the CI order of the base tables so that a sort operation is not necessary.

If this does not work right away, play with the idea a bit.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow