Question

I have a table to update. The value to update is an FK to a PK ID on a data table.

The data table has date ranges and the table being updated has Date of Birth fields (Month, Day, Year). My Update statement could loop through all the records RBAR (row by agonizing row) But I was hoping to use a more set based solution. I've tried using the case statement and table joins in the from clause on the update statement but something about this problem is eluding me on how to approach it. Here are the the table schemas and my attempt at an update statement

Table 1 Person:
TABLE [dbo].[TFI_PERSON](
    [PERSON_ID] [int] IDENTITY(3500,1) NOT NULL,
    [HOROSCOPE_SIGN_ID] [int] NULL,
    [DOB_DAY] [int] NOT NULL,
    [DOB_MONTH] [int] NOT NULL,
    [DOB_YEAR] [int] NOT NULL,

Table 2 Horoscope

TABLE [dbo].[TFI_HOROSCOPE_SIGN](
    [HOROSCOPE_SIGN_ID] [int] IDENTITY(1,1) NOT NULL,
    [HOROSCOPE_SIGN] [nvarchar](100) NOT NULL,
    [HOROSCOPE_BEGIN_DATE] [datetime] NOT NULL,
    [HOROSCOPE_END_DATE] [datetime] NOT NULL,

Attempt(s) 1 & 2

UPDATE P
SET P.HOROSCOPE_SIGN_ID = HS.[HOROSCOPE_SIGN_ID] 
                FROM dbo.TFI_PERSON AS P JOIN [dbo].[TFI_HOROSCOPE_SIGN] AS HS
                    ON P.[HOROSCOPE_SIGN_ID] = HS.[HOROSCOPE_SIGN_ID] 
WHERE 
 CAST(DOB_YEAR AS NVARCHAR)+ '-' + CAST(DOB_MONTH AS NVARCHAR) + '-' + CAST(DOB_DAY AS NVARCHAR) BETWEEN HS.[HOROSCOPE_BEGIN_DATE] AND HS.[HOROSCOPE_END_DATE] 

UPDATE dbo.TFI_PERSON
SET HOROSCOPE_SIGN_ID = (SELECT HOROSCOPE_SIGN_ID 
    FROM dbo.TFI_HOROSCOPE_SIGN 
    WHERE CAST(CAST(DOB_YEAR AS NVARCHAR)+ '/' + CAST(DOB_MONTH AS NVARCHAR) + '/' + CAST(DOB_DAY AS NVARCHAR) AS DATETIME) BETWEEN [HOROSCOPE_BEGIN_DATE] AND [HOROSCOPE_END_DATE] )

Thanks for the assist.

Was it helpful?

Solution 2

TL;DR: Your updates are not RBAR since set based methods will generate the same query plan, causing SQL server will execute them the exact same way as a set based method.

I disagree that your update statements are doing the updates RBAR. The idea of RBAR is doing some having some kind of loop construct in your statement such as cursor or temporary table (https://www.simple-talk.com/sql/t-sql-programming/rbar--row-by-agonizing-row/). The idea of set operations is letting SQL Server loop through the rows. In these simple subqueries, the optimizer is plenty smart to figure out that the appropriate execution is a join, whether you state it or not.

There does not exist a method for SQL Server to magically update all rows as a "set". All set based solutions will produce a query plan that SQL Server will eventually need to do things row by row. Since individual rows are stored in different locations, someone needs to to iterate through them. In the RBAR methods, the client explicitly iterates through them. In the set based method, SQL Server implicitly iterates through them. As long as you don't have a loop such as WHILE in SQL or foreach in C#, then you are not really doing things RBAR. Since neither update statements explicitly have a loop, you are not doing updates RBAR. This is clear when you view the query plan of the updates side by side.

I created a fiddle (http://sqlfiddle.com/#!3/6c6a6/6) with the OP's two queries, an explicit join query, and @GliM's to show that all four queries get executed as joins. These all loop joins, but given the right indices and statistics, the optimizer will turn them to the appropriate join. I don't think there will be a significant difference in performance between the any of them, especially since 3/4 of them have the exact same query plan.

There is an interesting difference here. The second query is a left outer join where the others are an inner join. In the second query, all rows will be updated. For rows with no matching horoscope sign, the id will be updated with NULL. In the other queries, rows with no matching horoscope sign will keep whatever id they had before. You can change the queries with explicit joins to be a left join as well.

EDIT:

I wasn't very explicit before. I've added a lot of clarifications to show that the OP's queries are not RBAR since all single statement queries will eventually be executed with a similar execution plan.

OTHER TIPS

What I had in mind is this:

UPDATE P
SET HOROSCOPE_SIGN_ID = HS.[HOROSCOPE_SIGN_ID] 
FROM dbo.TFI_PERSON AS P 
JOIN [dbo].[TFI_HOROSCOPE_SIGN] AS HS
ON CAST(CAST((P.DOB_DAY + P.DOB_MONTH * 100 + P.DOB_YEAR * 10000) AS char(8)) AS datetime) 
   BETWEEN HS.[HOROSCOPE_BEGIN_DATE] AND HS.[HOROSCOPE_END_DATE]

Of course, this assumes that the TFI_HOROSCOPE_SIGN table has date ranges covering all possible dates of birth of people in TFI_Person.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top