Question

It's a simple select from a temporary table, left joining an existing table on its primary key, with two sub selects using top 1 referring the joined table.

In code:

SELECT
    TempTable.Col1,
    TempTable.Col2,
    TempTable.Col3,
    JoinedTable.Col1,
    JoinedTable.Col2,
    (
        SELECT TOP 1
            ThirdTable.Col1 -- Which is ThirdTable's Primary Key
        FROM
            ThirdTable
        WHERE
            ThirdTable.SomeColumn = JoinedTable.SomeColumn
    ) as ThirdTableColumn1,
    (
        SELECT TOP 1
            ThirdTable.Col1 -- Which is also ThirdTable's Primary Key
        FROM
            ThirdTable
        WHERE
            ThirdTable.SomeOtherColumn = JoinedTable.SomeColumn
    ) as ThirdTableColumn2,
FROM
    #TempTable as TempTable
LEFT JOIN
    JoinedTable
ON (TempTable.PKColumn1 = JoinedTable.PKColumn1 AND 
    TempTable.PKColumn2 = JoinedTable.PKColumn2)
WHERE
    JoinedTable.WhereColumn IN  (1, 3)

This is an exact replica of my query.

If I remove the two sub selects, it runs just fine and quickly. With the two sub selects, I get about 100 records per second, which is extremely slow for this query because it should return almost a million records.

I've checked to see if every table has a Primary Key, they all do. They all have Indexes AND statistics for their important columns, like the ones in those WHERE clauses, and the ones in the JOIN clause. The only table with no primary key defined nor index is the temporary table, but it's not the problem either because it's not the one related to the slow sub selects, and as I mentioned, with no sub selects it runs just fine.

Without those TOP 1 it returns more than one result, and raises an error.

Help, anyone?

EDIT:

So the execution plan told me I was missing an Index. I've created it, and recreated some of the other indexes. After a while, the execution plan was using them, and the query now runs fast. The only problem is I'm not succeeding in doing this again on another server, for the same query. So my solution will be to HINT which index SQL Server will use.

Was it helpful?

Solution

I think in a million records query, you have to avoid things like OUTER JOINS. I suggest you use UNION ALL Instead of LEFT JOIN. As long as I think CROSS APPLY is more efficient than sub-query in the select clause I will modify the query written by Conard Frix, which I think is correct.

now: when I started to modify your query I noticed that you have a WHERE clause saying: JoinedTable.WhereColumn IN (1, 3). in this case, if the field is null the condition will become false. then why are you using LEFT JOIN while you are filtering null valued rows? just replace LEFT JOIN With INNER JOIN, I guarantee that it will become faster.

about INDEX:

please note that when you have an index on a table, say

table1(a int, b nvarchar)

and your index is :

nonclustered index ix1 on table1(a)

and you want to do something like this:

select a,b from table1
where a < 10

in your index you have not included the column b so what happens?

if sql-server uses your index, it will have to search in the index, called "Index Seek" and then refer to main table to get column b, called "Look Up". This procedure might take much longer than scanning the table itself: "Table Scan".

but based on the statistics that sql-server has, in such situations, it might not use your index at all.

so first of all check the Execution Plan to see if the index is used at all.

if yes or no both, alter your index to include all columns that you are selecting. say like:

nonclustered index ix1 on table1(a) include(b)

in this case Look Up will not be needed, and your query will execute so much faster.

OTHER TIPS

Its the sub selects in your column selection that is causing the slow return. You should try using your sub-selects in left joins, or use a derived table as I have defined below.

Using Left Joins to two instances of Third Table

SELECT
  TempTable.Col1,
  TempTable.Col2,
  TempTable.Col3,
  JoinedTable.Col1,
  JoinedTable.Col2,
  ThirdTable.Col1 AS ThirdTableColumn1,
  ThirdTable2.Col1 AS ThirdTableColumn2
FROM #TempTable as TempTable
LEFT JOIN JoinedTable ON (TempTable.PKColumn1 = JoinedTable.PKColumn2 AND 
    TempTable.PKColumn 2 = JoinedTable.PKColumn2)
LEFT JOIN ThirdTable ON ThirdTable.SomeColumn = JoinedTable.SomeColumn
LEFT JOIN ThirdTable ThirdTable2 ON ThirdTable.SomeOtherColumn = JoinedTable.SomeColumn
WHERE
    JoinedTable.WhereColumn IN  (1, 3)

Using a Derived Table

 SELECT 
      TempTable.Col1,
      TempTable.Col2,
      TempTable.Col3,
      DerivedTable.Col1,
      DerivedTable.Col2,
      DerivedTable.ThirdTableColumn1,
      DerivedTable.ThirdTableColumn2
 FROM #TempTable as TempTable
    LEFT JOIN (SELECT
                 JoinedTable.PKColumn2,
                 JoinedTable.Col1,
                 JoinedTable.Col2,
                 JoinedTable.WhereColumn,
                 ThirdTable.Col1 AS ThirdTableColumn1,
                 ThirdTable2.Col1 AS ThirdTableColumn2
               FROM JoinedTable
               LEFT JOIN ThirdTable ON ThirdTable.SomeColumn = JoinedTable.SomeColumn
               LEFT JOIN ThirdTable ThirdTable2 ON ThirdTable.SomeOtherColumn = JoinedTable.SomeColumn) 
        DerivedTable ON (TempTable.PKColumn1 = DerivedTable .PKColumn2 AND 
        TempTable.PKColumn2 = DerivedTable.PKColumn2)
    WHERE
        DerivedTable.WhereColumn IN  (1, 3)

Try a cross apply instead

SELECT
    TempTable.Col1,
    TempTable.Col2,
    TempTable.Col3,
    JoinedTable.Col1,
    JoinedTable.Col2,
    ThirdTableColumn1.col1,
    ThirdTableColumn2.col1

FROM
    #TempTable as TempTable
LEFT JOIN
    JoinedTable
ON (TempTable.PKColumn1 = JoinedTable.PKColumn2 AND 
    TempTable.PKColumn 2 = JoinedTablePKColumn2)

CROSS APPLY
(
        SELECT TOP 1
            ThirdTable.Col1 -- Which is ThirdTable's Primary Key
        FROM
            ThirdTable
        WHERE
            ThirdTable.SomeColumn = JoinedTable.SomeColumn
    ) as ThirdTableColumn1
CROSS APPLY    (
        SELECT TOP 1
            ThirdTable.Col1 -- Which is also ThirdTable's Primary Key
        FROM
            ThirdTable
        WHERE
            ThirdTable.SomeOtherColumn = JoinedTable.SomeColumn
    ) as ThirdTableColumn2,
WHERE
    JoinedTable.WhereColumn IN  (1, 3)

You can also use CTE's and row_number or an inline query using MIN

Move the JOIN bits out of the main part of the clause and put it as a subselect. Moving it to the WHERE and JOIN section guarantees you do not have to SELECT TOP 1 over and over again, which I believe is the reason for hte slowness. If you want to check this, examine the execution plan.

The ThirdTable references, (sub selects in your example), need the same index attention as any other part of a query.

Regardless of whether you use sub selects:

(
    SELECT TOP 1
        ThirdTable.Col1 -- Which is ThirdTable's Primary Key
    FROM
        ThirdTable
    WHERE
        ThirdTable.SomeColumn = JoinedTable.SomeColumn
) as ThirdTableColumn1,
(
    SELECT TOP 1
        ThirdTable.Col1 -- Which is also ThirdTable's Primary Key
    FROM
        ThirdTable
    WHERE
        ThirdTable.SomeOtherColumn = JoinedTable.SomeColumn
) as ThirdTableColumn2,

LEFT JOINS (as proposed by John Hartsock):

LEFT JOIN ThirdTable ON ThirdTable.SomeColumn = JoinedTable.SomeColumn
LEFT JOIN ThirdTable ThirdTable2 ON ThirdTable.SomeOtherColumn = JoinedTable.SomeColumn

CROSS APPLY (as proposed by Conrad Frix):

CROSS APPLY
(
        SELECT TOP 1
            ThirdTable.Col1 -- Which is ThirdTable's Primary Key
        FROM
            ThirdTable
        WHERE
            ThirdTable.SomeColumn = JoinedTable.SomeColumn
    ) as ThirdTableColumn1
CROSS APPLY    (
        SELECT TOP 1
            ThirdTable.Col1 -- Which is also ThirdTable's Primary Key
        FROM
            ThirdTable
        WHERE
            ThirdTable.SomeOtherColumn = JoinedTable.SomeColumn
    ) as ThirdTableColumn2

You need to ensure covering indexes are defined for ThirdTable.SomeColumn and ThirdTable.SomeOtherColumn and the indexes are unique. This means you will need to further qualify the ThirdTable references to eliminate the selection of multiple rows and improve performance. The choice of sub selects, LEFT JOIN, or CROSS APPLY will not really matter until you improve the selectivity for ThirdTable.SomeColumn and ThirdTable.SomeOtherColumn by including more columns to ensure unique selectivity. Until then, I expect your performance will continue to suffer.

The covering index topic is nicely introduced by Maziar Taheri; while not repeating his work, I do emphasize the need to take to heart the use of covering indexes.

In short: Improve the selectivity for the ThirdTable.SomeColumn and ThirdTable.SomeOtherColumn queries (or joins) by adding related in-table columns to ensure a unique row match. If this is not possible, then you will continue to suffer performance issues as the engine is busy pulling in rows which are subsequently thrown away. This impacts your i/o, cpu, and, ultimately, the execution plan.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top