SQL Select taking too much time to execute
-
16-10-2019 - |
Question
It's a simple select from a temporary table, left joining an existing table on its primary key, with two sub selects using top 1 referring the joined table.
In code:
SELECT
TempTable.Col1,
TempTable.Col2,
TempTable.Col3,
JoinedTable.Col1,
JoinedTable.Col2,
(
SELECT TOP 1
ThirdTable.Col1 -- Which is ThirdTable's Primary Key
FROM
ThirdTable
WHERE
ThirdTable.SomeColumn = JoinedTable.SomeColumn
) as ThirdTableColumn1,
(
SELECT TOP 1
ThirdTable.Col1 -- Which is also ThirdTable's Primary Key
FROM
ThirdTable
WHERE
ThirdTable.SomeOtherColumn = JoinedTable.SomeColumn
) as ThirdTableColumn2,
FROM
#TempTable as TempTable
LEFT JOIN
JoinedTable
ON (TempTable.PKColumn1 = JoinedTable.PKColumn1 AND
TempTable.PKColumn2 = JoinedTable.PKColumn2)
WHERE
JoinedTable.WhereColumn IN (1, 3)
This is an exact replica of my query.
If I remove the two sub selects, it runs just fine and quickly. With the two sub selects, I get about 100 records per second, which is extremely slow for this query because it should return almost a million records.
I've checked to see if every table has a Primary Key, they all do. They all have Indexes AND statistics for their important columns, like the ones in those WHERE clauses, and the ones in the JOIN clause. The only table with no primary key defined nor index is the temporary table, but it's not the problem either because it's not the one related to the slow sub selects, and as I mentioned, with no sub selects it runs just fine.
Without those TOP 1
it returns more than one result, and raises an error.
Help, anyone?
EDIT:
So the execution plan told me I was missing an Index. I've created it, and recreated some of the other indexes. After a while, the execution plan was using them, and the query now runs fast. The only problem is I'm not succeeding in doing this again on another server, for the same query. So my solution will be to HINT which index SQL Server will use.
Solution
I think in a million records query, you have to avoid things like OUTER JOINS
. I suggest you use UNION ALL
Instead of LEFT JOIN
.
As long as I think CROSS APPLY
is more efficient than sub-query in the select clause I will modify the query written by Conard Frix, which I think is correct.
now: when I started to modify your query I noticed that you have a WHERE clause saying: JoinedTable.WhereColumn IN (1, 3)
. in this case, if the field is null the condition will become false. then why are you using LEFT JOIN while you are filtering null valued rows?
just replace LEFT JOIN
With INNER JOIN
, I guarantee that it will become faster.
about INDEX:
please note that when you have an index on a table, say
table1(a int, b nvarchar)
and your index is :
nonclustered index ix1 on table1(a)
and you want to do something like this:
select a,b from table1
where a < 10
in your index you have not included the column b
so what happens?
if sql-server uses your index, it will have to search in the index, called "Index Seek" and then refer to main table to get column b
, called "Look Up". This procedure might take much longer than scanning the table itself: "Table Scan".
but based on the statistics that sql-server has, in such situations, it might not use your index at all.
so first of all check the Execution Plan
to see if the index is used at all.
if yes or no both, alter your index to include all columns that you are selecting. say like:
nonclustered index ix1 on table1(a) include(b)
in this case Look Up will not be needed, and your query will execute so much faster.
OTHER TIPS
Its the sub selects in your column selection that is causing the slow return. You should try using your sub-selects in left joins, or use a derived table as I have defined below.
Using Left Joins to two instances of Third Table
SELECT
TempTable.Col1,
TempTable.Col2,
TempTable.Col3,
JoinedTable.Col1,
JoinedTable.Col2,
ThirdTable.Col1 AS ThirdTableColumn1,
ThirdTable2.Col1 AS ThirdTableColumn2
FROM #TempTable as TempTable
LEFT JOIN JoinedTable ON (TempTable.PKColumn1 = JoinedTable.PKColumn2 AND
TempTable.PKColumn 2 = JoinedTable.PKColumn2)
LEFT JOIN ThirdTable ON ThirdTable.SomeColumn = JoinedTable.SomeColumn
LEFT JOIN ThirdTable ThirdTable2 ON ThirdTable.SomeOtherColumn = JoinedTable.SomeColumn
WHERE
JoinedTable.WhereColumn IN (1, 3)
Using a Derived Table
SELECT
TempTable.Col1,
TempTable.Col2,
TempTable.Col3,
DerivedTable.Col1,
DerivedTable.Col2,
DerivedTable.ThirdTableColumn1,
DerivedTable.ThirdTableColumn2
FROM #TempTable as TempTable
LEFT JOIN (SELECT
JoinedTable.PKColumn2,
JoinedTable.Col1,
JoinedTable.Col2,
JoinedTable.WhereColumn,
ThirdTable.Col1 AS ThirdTableColumn1,
ThirdTable2.Col1 AS ThirdTableColumn2
FROM JoinedTable
LEFT JOIN ThirdTable ON ThirdTable.SomeColumn = JoinedTable.SomeColumn
LEFT JOIN ThirdTable ThirdTable2 ON ThirdTable.SomeOtherColumn = JoinedTable.SomeColumn)
DerivedTable ON (TempTable.PKColumn1 = DerivedTable .PKColumn2 AND
TempTable.PKColumn2 = DerivedTable.PKColumn2)
WHERE
DerivedTable.WhereColumn IN (1, 3)
Try a cross apply instead
SELECT
TempTable.Col1,
TempTable.Col2,
TempTable.Col3,
JoinedTable.Col1,
JoinedTable.Col2,
ThirdTableColumn1.col1,
ThirdTableColumn2.col1
FROM
#TempTable as TempTable
LEFT JOIN
JoinedTable
ON (TempTable.PKColumn1 = JoinedTable.PKColumn2 AND
TempTable.PKColumn 2 = JoinedTablePKColumn2)
CROSS APPLY
(
SELECT TOP 1
ThirdTable.Col1 -- Which is ThirdTable's Primary Key
FROM
ThirdTable
WHERE
ThirdTable.SomeColumn = JoinedTable.SomeColumn
) as ThirdTableColumn1
CROSS APPLY (
SELECT TOP 1
ThirdTable.Col1 -- Which is also ThirdTable's Primary Key
FROM
ThirdTable
WHERE
ThirdTable.SomeOtherColumn = JoinedTable.SomeColumn
) as ThirdTableColumn2,
WHERE
JoinedTable.WhereColumn IN (1, 3)
You can also use CTE's and row_number or an inline query using MIN
Move the JOIN bits out of the main part of the clause and put it as a subselect. Moving it to the WHERE and JOIN section guarantees you do not have to SELECT TOP 1 over and over again, which I believe is the reason for hte slowness. If you want to check this, examine the execution plan.
The ThirdTable
references, (sub selects in your example), need the same index attention as any other part of a query.
Regardless of whether you use sub selects:
(
SELECT TOP 1
ThirdTable.Col1 -- Which is ThirdTable's Primary Key
FROM
ThirdTable
WHERE
ThirdTable.SomeColumn = JoinedTable.SomeColumn
) as ThirdTableColumn1,
(
SELECT TOP 1
ThirdTable.Col1 -- Which is also ThirdTable's Primary Key
FROM
ThirdTable
WHERE
ThirdTable.SomeOtherColumn = JoinedTable.SomeColumn
) as ThirdTableColumn2,
LEFT JOINS (as proposed by John Hartsock):
LEFT JOIN ThirdTable ON ThirdTable.SomeColumn = JoinedTable.SomeColumn
LEFT JOIN ThirdTable ThirdTable2 ON ThirdTable.SomeOtherColumn = JoinedTable.SomeColumn
CROSS APPLY (as proposed by Conrad Frix):
CROSS APPLY
(
SELECT TOP 1
ThirdTable.Col1 -- Which is ThirdTable's Primary Key
FROM
ThirdTable
WHERE
ThirdTable.SomeColumn = JoinedTable.SomeColumn
) as ThirdTableColumn1
CROSS APPLY (
SELECT TOP 1
ThirdTable.Col1 -- Which is also ThirdTable's Primary Key
FROM
ThirdTable
WHERE
ThirdTable.SomeOtherColumn = JoinedTable.SomeColumn
) as ThirdTableColumn2
You need to ensure covering indexes
are defined for ThirdTable.SomeColumn
and ThirdTable.SomeOtherColumn
and the indexes are unique. This means you will need to further qualify the ThirdTable
references to eliminate the selection of multiple rows and improve performance. The choice of sub selects
, LEFT JOIN
, or CROSS APPLY
will not really matter until you improve the selectivity for ThirdTable.SomeColumn
and ThirdTable.SomeOtherColumn
by including more columns to ensure unique selectivity. Until then, I expect your performance will continue to suffer.
The covering index
topic is nicely introduced by Maziar Taheri; while not repeating his work, I do emphasize the need to take to heart the use of covering indexes.
In short:
Improve the selectivity for the ThirdTable.SomeColumn
and ThirdTable.SomeOtherColumn
queries (or joins) by adding related in-table columns to ensure a unique row match. If this is not possible, then you will continue to suffer performance issues as the engine is busy pulling in rows which are subsequently thrown away. This impacts your i/o, cpu, and, ultimately, the execution plan.