SQL Server LEFT JOIN fails to match rows without JOIN hint

https://stackoverflow.com/questions/9793667

25-05-2021
|

質問

I have what appears to be a corrupt index?

Here is what is happening. I have two table-functions which the first is a set of cases and the second is a set of aware dates. These two sets have a 1 (case) to 0 or 1 (aware date) relationship. Normally I query them like;

SELECT c.CaseID, a.AwareDate  
FROM Cases(@date) AS c  
LEFT JOIN AwareDates(@date) AS a ON c.CaseID = a.CaseID;

The trouble is that not all of the rows from AwareDates which match seem to be JOIN'd. If I add a join hint, they then do. say;

SELECT c.CaseID, a.AwareDate  
FROM Cases(@date) AS c  
LEFT MERGE JOIN AwareDates(@date) AS a ON c.CaseID = a.CaseID;

What I notice from the query plan is that adding the join hint adds a sort of the AwareDate data before the join which is not there otherwise. Also, the query planner flips the join to a RIGHT OUTER JOIN when there is no hint, and of course keeps the LEFT JOIN where the hint is present.

I've done the following with no errors detected;

DBCC UPDATEUSAGE (0) WITH INFO_MESSAGES, COUNT_ROWS;  
EXECUTE sp_updatestats 'resample';  
DBCC CHECKDB (0) WITH ALL_ERRORMSGS, EXTENDED_LOGICAL_CHECKS;

I'm stumped... any ideas?

Here are the UDF definitions

ALTER FUNCTION dbo.Cases( @day date ) RETURNS TABLE
WITH SCHEMABINDING
AS RETURN (
SELECT 
    CaseID -- other 42 columns ommitted
FROM (
    SELECT
        ROW_NUMBER() OVER (PARTITION BY CaseID ORDER BY UpdateDate DESC, UpdateNumber DESC) AS RecordAge,
        CaseID, 
        Action
    FROM
        dbo.CaseAudit
    WHERE
        convert(date,UpdateDate) <= @day
    ) AS History
WHERE
    RecordAge = 1                    -- only the most current record version
    AND isnull(Action,'') != N'DEL'  -- only include cases that have not been deleted
)

ALTER FUNCTION dbo.AwareDates( @day date ) RETURNS TABLE
WITH SCHEMABINDING
AS RETURN (
WITH 
    History AS (
        SELECT row_number() OVER (PARTITION BY CaseID, ContactID ORDER BY UpdateDate DESC, UpdateNumber DESC) AS RecordAge,
            CaseID, InfoReceived, ReceiveDate, ResetClock, Action
        FROM dbo.ContactLogAudit WITH (NOLOCK)
        WHERE convert(date,UpdateDate) <= @day
        ),
    Notes AS (
        SELECT 
            CaseID,
            convert(date,ReceiveDate,112) AS ReceiveDate,
            ResetClock
        FROM History 
        WHERE RecordAge = 1                -- only the most current record version
        AND isnull(Action,'') != N'DEL'    -- only include notes that have not been deleted
        AND InfoReceived = N'Y'            -- only include notes that have Info Rec'd checked 
        AND len(ReceiveDate) = 8 AND isnumeric(ReceiveDate) = 1 AND isdate(ReceiveDate) = 1 -- only include those with a valid aware date
        ),
    Initials AS (
        SELECT CaseID, min(ReceiveDate) AS ReceiveDate
        FROM Notes 
        GROUP BY CaseID
        ),
    Resets AS (
        SELECT CaseID, max(ReceiveDate) AS ReceiveDate
        FROM Notes 
        WHERE ResetClock = N'Y'
        GROUP BY CaseID
        )
SELECT 
    i.CaseID                              AS CaseID,
    i.ReceiveDate                         AS InitialAwareDate, -- the oldest valid aware date value (must have AE Info Reveived checked and a received date)
    coalesce(r.ReceiveDate,i.ReceiveDate) AS AwareDate  -- either the newest valid aware date value with the Reset Clock checked, otherwise the initial aware date value
FROM Initials AS i
LEFT JOIN Resets AS r 
    ON i.CaseID = r.CaseID
);

I have further found that if I drop the "WITH (NOLOCK)" table hint, I get correct results. Also if add a join hint to the AwareDates UTF or even add a COLLATE Latin1_General_BIN on the LEFT JOIN relation between Initials and Resets.

Query plan row counts -- without join hint (broken)

Cases { Actual: 25,891, Estimate: 19,071.9 }
AwareDates { Actual: 24,693, Estimated: 1,463.09 }
- Initials { Actual: 24,693, Estimated: 1,463.09 }
- Rests { Actual: 985, Estimated: 33.2671 }
AwareDates matches 8,108 of the Cases rows in the join'd result-set

Query plan row counts -- with join hint (working)

Cases { Actual: 25,891, Estimate: 19,071.9 }
AwareDates { Actual: 24,673, Estimated: 1,837.67 }
- Initials { Actual: 24,673, Estimated: 1,837.67 }
- Rests { Actual: 982, Estimated: 42.6238 }
AwareDates matches 24,673 of the Cases rows in the join'd result-set

I have further whittled down the scope of the issue. I can;

SELECT * FROM AwareDate(@date);

and

SELECT * FROM AwareDate(@date) ORDER BY CaseID;

With different row counts.

解決

You don't specify the specific version of SQL (@@version), but this seems suspiciously like a bug that was fixed in Cumulative Update 6 for SQL 2008 R2 (apparently it also applies to SQL 2008).

KB 2433265
FIX: You may receive an incorrect result when you run a query that uses the ROW_NUMBER function together with a left outer join in SQL Server 2008

The example in the article specifies DISTINCT. The article, however, is worded ambiguously -- it's not clear whether you NEED a distinct or if DISTINCT is one of the triggers.

Your example doesn't have a distinct like the article, but it appears modified for the sake of asking the question(i.e. 42 columns missing). Is there a distinct? Also in the AwareDates udf by the time i get down to the Initials CTE you do a GROUP BY which could have the same effect as a DISTINCT.

UPDATE

@Dennis from your comment I still can't tell if you're using SQL 20080 or 2008 R2.

If you're running 2008, the KB article says "The fix for this issue was first released in Cumulative Update 11 for SQL Server 2008 Service Pack 1." So, post SP1.

On the other hand, if you're using SQL 2008 R2, you are correct that this was fixed in CU 6, which was part of SP1. But this bug appears to have resurfaced. Look at Cumulative update package 4 for SQL Server 2008 R2 Service Pack 1 -- released post SP1.

970198      FIX: You receive an incorrect result when you run a 
            query that uses the row_number function in SQL Server 2008 
            or in SQL Server 2008 R2

In the associated KB article MS dropped the reference to distinct:

Consider the following scenario. You run a query against a table that has a 
clustered index in Microsoft SQL Server 2008 or in Microsoft SQL Server 2008
R2. In the query, you use the row_number function. In this scenario, you 
receive an incorrect result when a parallel execution plan is used for the 
query. If you run the query many times, you may receive different results.

This seems to confirm my earlier reading of KB 2433265 -- the phrasing suggests distinct is just one of many conditions that can cause the behavior. It seems that a parallel execution plan is the culprit this time around.

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow