Question

A coworker asked me to look at indexing on some tables because his query was running very long. Over an hour.

select count(1)
from databaseA.dbo.table1
inner join databaseA.dbo.table2 on (table1.key = table2.key)
inner join databaseB.dbo.table3 on (table1.key = table3.key)

Note the different databases. This was being run from DatabaseB

Tables 1 and 2 were over 2 million records long. Table3 had a dozen records or so.

I looked at the query plan and the optimizer decided to do nested-loop index seeks into tables 1 and 2 with Table3 as the driving table!

My first assumption was that statistics were seriously messed up on Tables1 & 2 but before updating statistics I tried adding a join hint thusly:

select count(1)
from databaseA.dbo.table1
inner HASH join databaseA.dbo.table2 on (table1.key = table2.key)
inner join databaseB.dbo.table3 on (table1.key = table3.key)

Results returned in 15 seconds.

Since I was short on time, I passed the results back to him but I'm worried that this might result in problems down the road.

Should I revisit the statistics issue and resolve the problem that way? Could the bad query plan have resulted from the join being from a separate databases?

Can anyone offer me some ideas based on your experience?

Was it helpful?

Solution

I would suspect the statistics first.

As you are no doubt aware, Join hints should be avoided in 99% of cases and used only when you have proof that they are absolutely required.

OTHER TIPS

Check statistics, and indexing on the table first. Index hints can cause problems. If the data in the tables changes the optimizer will be unable to choose a more efficent plan since you have forced it to always use a hash.

Wouldn't a nested loop be the most appropiate? Take the 12 records from Table 3, ,match to the 12 records in Table 1, match to 12 records in Table 2.

Otherwise, your hash join would enforce ordering as well - meaning you'd hash 1 million records from Table 1 and Table 2, then join to the 12 records in Table 3.

I'd look at statistics for both the plans - and I'd suspect the loop join is actually more efficient, but was blocked or your hash join was taking advantage of cached data.

But - yeah - in general, join hints are a last resort.

A slow-running query involving linked servers might have to do with collation. See here for some background: http://blogs.msdn.com/psssql/archive/2008/02/14/how-it-works-linked-servers-and-collation-compatibility.aspx The hash join hint forces the sortorder, so that explains the performance gain.

Here's how to set the options:

EXEC master.dbo.sp_serveroption 
    @server=N'databaseA', 
    @optname=N'collation compatible', 
    @optvalue=N'true'

EXEC master.dbo.sp_serveroption 
    @server=N'databaseA', 
    @optname=N'use remote collation', 
    @optvalue=N'false'

-Edoode

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top