Duplicate rows with a few changes in a table with a composite primary key, but only if target does not exist
-
08-10-2020 - |
문제
Consider the following table:
CREATE TABLE [xxx].[MyTable](
[period] [date] NOT NULL,
[genusId] [int] NOT NULL,
[subjectId] [int] NOT NULL,
[waitingStageId] [tinyint] NOT NULL,
[value] [tinyint] NULL,
[savedOn] [datetime2](2) NULL,
[savedBy] [nvarchar](50) NULL,
CONSTRAINT [PK_ExpectedActivity] PRIMARY KEY CLUSTERED
(
[period] ASC,
[genusId] ASC,
[subjectId] ASC,
[waitingStageId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
INSERT [xxx].[MyTable] ([period], [genusId], [subjectId], [waitingStageId], [value], [savedOn], [savedBy]) VALUES (CAST(N'2017-05-01' AS Date), 71, 92, 1, 1, CAST(N'2017-05-29T13:45:20.6300000' AS DateTime2), NULL)
GO
INSERT [xxx].[MyTable] ([period], [genusId], [subjectId], [waitingStageId], [value], [savedOn], [savedBy]) VALUES (CAST(N'2017-05-01' AS Date), 71, 96, 1, 1, CAST(N'2017-05-29T13:45:21.2400000' AS DateTime2), NULL)
GO
INSERT [xxx].[MyTable] ([period], [genusId], [subjectId], [waitingStageId], [value], [savedOn], [savedBy]) VALUES (CAST(N'2017-05-01' AS Date), 71, 128, 0, 2, CAST(N'2017-05-29T13:45:21.6500000' AS DateTime2), NULL)
GO
INSERT [xxx].[MyTable] ([period], [genusId], [subjectId], [waitingStageId], [value], [savedOn], [savedBy]) VALUES (CAST(N'2017-05-01' AS Date), 71, 128, 1, 3, CAST(N'2017-05-29T13:45:21.4400000' AS DateTime2), NULL)
GO
INSERT [xxx].[MyTable] ([period], [genusId], [subjectId], [waitingStageId], [value], [savedOn], [savedBy]) VALUES (CAST(N'2017-06-01' AS Date), 73, 35, 0, 1, CAST(N'2017-05-26T20:05:40.2800000' AS DateTime2), NULL)
GO
INSERT [xxx].[MyTable] ([period], [genusId], [subjectId], [waitingStageId], [value], [savedOn], [savedBy]) VALUES (CAST(N'2017-06-01' AS Date), 73, 94, 0, 3, CAST(N'2017-05-26T20:05:40.6700000' AS DateTime2), NULL)
GO
INSERT [xxx].[MyTable] ([period], [genusId], [subjectId], [waitingStageId], [value], [savedOn], [savedBy]) VALUES (CAST(N'2017-06-01' AS Date), 73, 94, 1, 1, CAST(N'2017-05-26T20:05:40.4800000' AS DateTime2), NULL)
GO
INSERT [xxx].[MyTable] ([period], [genusId], [subjectId], [waitingStageId], [value], [savedOn], [savedBy]) VALUES (CAST(N'2017-06-01' AS Date), 74, 23, 0, 6, CAST(N'2017-05-26T20:03:26.5400000' AS DateTime2), NULL)
GO
INSERT [xxx].[MyTable] ([period], [genusId], [subjectId], [waitingStageId], [value], [savedOn], [savedBy]) VALUES (CAST(N'2017-06-01' AS Date), 74, 47, 0, 6, CAST(N'2017-05-26T20:03:26.7700000' AS DateTime2), NULL)
GO
INSERT [xxx].[MyTable] ([period], [genusId], [subjectId], [waitingStageId], [value], [savedOn], [savedBy]) VALUES (CAST(N'2017-06-01' AS Date), 74, 67, 0, 3, CAST(N'2017-05-26T20:03:27.1800000' AS DateTime2), NULL)
GO
I am trying to create a stored procedure that will allow us to duplicate values from the previous month into the current month -- BUT ONLY if there isn't a value for the current month already.
DROP TABLE IF EXISTS #PreviousMonth
DECLARE @schemaName AS varchar(5) = 'xxx' -- To be used later as a parameter in stored procedure
DECLARE @lastMonth AS DATE = CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE())-1, 0) AS DATE) --First day of previous month
DECLARE @currentMonth AS DATE = CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE()), 0) AS DATE) --First day of current month
-- Add rows for existing month into a temporary table
SELECT @currentMonth AS period, genusId, subjectId, waitingStageId, value, GETDATE() AS savedOn, '<Automated by Stored Procedure>' AS savedBy INTO #PreviousMonth FROM [xxx].[ExpectedActivity]
WHERE period in(@lastMonth)
SELECT * INTO xxx.MyTable FROM #PreviousMonth
-- Here is where I got stuck. I tried WHERE caluses and NOT EXISTS but my skills are limited.
This is as far as I managed to get unfortunately. My skill is limited where it gets to do an INSERT with a condition with the NOT EXIST condition.
How can I make sure that I only insert rows in the xxx.MyTable where rows with the same primary key (period AND genusId AND subjectId AND waitingStageId) does not exist already?
해결책
Taking what your query says literally, you don't have to worry about what exists in xxx.myTable
- because your statement will create it, therefore it won't have anything in it (or, if it already exists, your statement will fail). Presumably, that's not really what you were going for.
If we assume that xxx.myTable
does exist, and may already have some rows whose values match your defaults, then this should work.
INSERT INTO xxx.myTable (period, genusID, subjectID, waitingStageID, value, savedOn, savedBy)
SELECT period, genusID, subjectID, waitingStageID, value, savedOn, savedBy
FROM #PreviousMonth pm
WHERE NOT EXISTS (SELECT 1
FROM xxx.myTable
WHERE period = pm.period
AND genusID = pm.genusID
AND subjectID = pm.subjectID
AND waitingStageID = pm.waitingStageID
)
;
In the NOT EXISTS
clause, you're checking if the current row from the temp table matches the primary key values for any rows already in xxx.myTable
. If you find a match for the current row, then you won't SELECT
it (and thus, won't INSERT
it).
You can also write this as what's called an "anti-join":
INSERT INTO xxx.myTable (period, genusID, subjectID, waitingStageID, value, savedOn, savedBy)
SELECT period, genusID, subjectID, waitingStageID, value, savedOn, savedBy
FROM #PreviousMonth pm
LEFT OUTER JOIN xxx.myTable mt ON ( pm.period = mt.period
AND pm.genusID = mt.genusID
AND pm.subjectID = mt.subjectID
AND pm.waitingStageID = mt.waitingStageID
)
WHERE mt.period IS NULL
;
Here, we're matching all the rows in the temp table with any matching records in xxx.myTable
. via a LEFT JOIN
. Because of the left join, if there's no matching row in xxx.myTable
, we'll still return columns from that table, but all the values will be NULL. So, if we find a row where period
is NULL in xxx.myTable
, that should mean that there isn't a matching xxx.myTable
row for our temp table row. So, only those rows (temp table rows with no matching xxx.myTable
row) will be SELECT
ed, and thus will be INSERT
ed.
In most cases, I wouldn't expect dramatically different performance between the two versions; test in your specific case, to see if one seems better than the other.
UPDATE: If you want to use the previous month's data anywhere the key fields don't exists, or where they do exist but the value = 0, you've got a couple of options available.
You could delete all rows in xxx.myTable
where the value is 0, then do the insert as above.
I would be inclined to UPDATE
the existing rows, and then perform the INSERT
as above. The UPDATE
statement would be:
UPDATE mt
SET value = pm.value
FROM xxx.myTable mt
INNER JOIN #PreviousMonth pm ON ( mt.period = pm.period
AND mt.genusID = pm.genusID
AND mt.subjectID = pm.subjectID
AND mt.waitingStageID = pm.waitingStageID
)
WHERE mt.value = 0
AND pm.value <> 0
;