Identifying field changes between tables and consecutive rows
-
07-10-2020 - |
Frage
I need to build an audit report to identify field changes between two tables, as well as between consecutive rows in one of the tables. The first table (data) holds the current data:
id data_id field1 field2 data_dttm
1 100 data3 data3 2017-05-03 00:00:00.000
CREATE TABLE [dbo].[data](
[id] [bigint] NULL,
[data_id] [nchar](10) NULL,
[field1] [nchar](10) NULL,
[field2] [nchar](10) NULL,
[adt_dttm] [datetime] NULL
)
Insert INTO data
Values (1, 100, 'data3', 'data3', '2017-05-03 00:00:00')
The second table (data_hst) holds data prior to it changing (trigger type operation).
data_hst_id data_id field1 field2 chng_fld_txt data_dttm
1 100 data1 data2 field1|field2 2017-05-01 00:00:00.000
2 100 data2 data3 field1 2017-05-02 00:00:00.000
CREATE TABLE [dbo].[data_hst](
[data_hst_id] [bigint] NULL,
[data_id] [bigint] NULL,
[field1] [nvarchar](200) NULL,
[field2] [nvarchar](200) NULL,
[chng_fld_txt] [nvarchar](200) NULL,
[data_dttm] [datetime] NULL
)
Insert INTO data_hst
Values (1, 100, 'data1', 'data2', 'field1|field2', '2017-05-01 00:00:00'),
(2, 100, 'data2', 'data3', 'field1', '2017-05-02 00:00:00')
The 'chng_fld_txt' field holds the list of fields that were modified, pipe delimited. I need to identify what has changed between the data table row and the most recent data_hst table row, as well as the change between the consecutive rows in the data_hst table. An audit report tracking each change identifying the old and new value as they occurred.
The result like this:
table_name db_field old_value new_value data_dttm
data field1 data1 data2 5/1/2017
data field2 data2 data3 5/1/2017
data field1 data2 data3 5/2/2017
I have a convoluted bit of dynamic sql with a cursor that works for the first condition, but not both. Hoping there's a cleaner way to satisfy both conditions.
DROP TABLE #changed
CREATE TABLE #changed(
[tbl_hst_id] [bigint] NULL,
[change_field] [nvarchar](200) NULL
) ON [PRIMARY]
DECLARE @db VARCHAR(200)
SET @db = 'data'
DECLARE @change_date DATETIME
DECLARE @change_date_varchar NVARCHAR(200)
SET @change_date = GETDATE()
SET @change_date_varchar = LEFT(CONVERT(VARCHAR, @change_date, 120), 10)
DECLARE @changed_table nvarchar(max)
SELECT @changed_table =
'SELECT TOP 2 t1.' + @db + '_hst_id as tbl_hst_id, t1.change_field
FROM (
SELECT A.' + @db + '_hst_id
,Split.a.value(''.'', ''VARCHAR(100)'') AS change_field
FROM (
SELECT ' + @db + '_hst_id
,CAST(''<M>'' + REPLACE(upsrt_chng_fld_txt, '','', ''</M><M>'') + ''</M>'' AS XML) AS String
,adt_dttm
FROM ' + @db + '_hst
) AS A
CROSS APPLY String.nodes(''/M'') AS Split(a)
WHERE adt_dttm >= DATEADD(D,-4,''' + @change_date_varchar + ''')
) T1
WHERE T1.change_field NOT IN (
''upsrt_dttm''
,''upsrt_trnsctn_nmbr''
)'
exec ('insert #changed ' + @changed_table)
DECLARE @delta_field VARCHAR(max)
DECLARE @db_sql_all NVARCHAR(max) = ''
DECLARE @getDeltaField CURSOR SET @getDeltaField = CURSOR
FOR
SELECT change_field
FROM #changed
OPEN @getDeltaField
FETCH NEXT
FROM @getDeltaField
INTO @delta_field
WHILE @@FETCH_STATUS = 0
BEGIN
DECLARE @db_sql NVARCHAR(MAX)
SELECT @db_sql = 'select ''' + @db + ''' as table_name
, ''1'' as id
,''' + @delta_field + ''' as db_field
,CAST(hst.' + @delta_field + ' AS NVARCHAR(200)) as old_value
,CAST(c.' + @delta_field + ' AS NVARCHAR(200)) as new_value
--,c.upsrt_usr_id as change_by
--,c.upsrt_dttm as change_time
from ' + @db + ' c
inner join ' + @db + '_hst hst
on c.' + @db + '_id = hst.' + @db + '_id
join #changed ch on hst.' + @db + '_hst_id = ch.tbl_hst_id
UNION '
SET @db_sql_all = @db_sql_all + @db_sql
FETCH NEXT
FROM @getDeltaField
INTO @delta_field
END
CLOSE @getDeltaField
DEALLOCATE @getDeltaField
IF len(@db_sql_all) > 0
BEGIN
SET @db_sql_all = SUBSTRING(@db_sql_all, 1, len(@db_sql_all) - 6)
END
PRINT(@db_sql_all);
EXEC (@db_sql_all);
Lösung
If you've got the old row, and the new row, you can ignore the information about what changed, and just look at the data.
IMPORTANT: this solution requires SQL Server 2012, as that's when the LEAD
function was added.
USE tempdb;
CREATE TABLE [dbo].[data](
[id] [bigint] NULL,
[data_id] [bigint] NULL,
[field1] [nchar](10) NULL,
[field2] [nchar](10) NULL,
[adt_dttm] [datetime] NULL
);
Insert INTO data
Values (1, 100, 'data3', 'data3', '2017-05-03 00:00:00')
;
CREATE TABLE [dbo].[data_hst](
[data_hst_id] [bigint] NULL,
[data_id] [bigint] NULL,
[field1] [nvarchar](200) NULL,
[field2] [nvarchar](200) NULL,
[chng_fld_txt] [nvarchar](200) NULL,
[data_dttm] [datetime] NULL
);
Insert INTO data_hst
Values (1, 100, 'data1', 'data2', 'field1|field2', '2017-05-01 00:00:00'),
(2, 100, 'data2', 'data3', 'field1', '2017-05-02 00:00:00')
;
WITH hist_and_curr AS
(SELECT data_id
,field1
,field2
,data_dttm
FROM data_hst
UNION ALL
SELECT data_id
,field1
,field2
,adt_dttm
FROM data
)
,find_changes as
(SELECT data_id
,'field1' as [db_field]
,field1 as old_value
,LEAD(field1) OVER (PARTITION BY data_id ORDER BY data_dttm) as new_value
,data_dttm
FROM hist_and_curr
UNION ALL
SELECT data_id
,'field2' as [db_field]
,field2 as old_value
,LEAD(field2) OVER (PARTITION BY data_id ORDER BY data_dttm) as new_value
,data_dttm
FROM hist_and_curr
)
SELECT 'data' AS [tablename]
,[db_field]
,[old_value]
,[new_value]
,[data_dttm]
FROM find_changes
WHERE old_value <> new_value
ORDER BY [tablename], [data_dttm], [db_field]
;
DROP TABLE [data_hst];
DROP TABLE [data];
So. The hist_and_curr
CTE is a simple UNION ALL of the data
and data_hst
rows , as they seem to have the same layout, except for `chng_fld_txt, which we're going to ignore.
In the find_changes
CTE, we UNION ALL
separate SELECT
s for each column. We use the LEAD
function to access the column's value from the next row for this data_id
(with the rows sorted by the data_dttm
column), to get old_value
and new_value
.
Then, we keep the rows where old_value
and new_value
are different.
ASSUMPTIONS:
- We're dealing with few enough columns for hard-coding this to make sense; if not, you could again use dynamic SQL to build the statements you want. If you're trying to combine information for different tables, you might store intermediate results in a temp table, instead of trying to combine
SELECT
s for potentially dozens of tables and hundreds of columns into a single CTE. - All the columns are compatible data types. If this isn't true, then you'd need to convert the fields in
find_changes
, or they couldn't be combined into a single virtual table. - As noted above, that you're using SQL Server 2012 or later. I'm going to post this even though I don't know your SQL Server version yet, since the CTE to combine history and current data may let you proceed with your own ideas, and since others who are interested may be using an appropriate version of SQL Server.
I did glance over your code, but since it involves references to column data we didn't have, I decided to focus on the problem as stated, which was relatively clear. Hopefully, this approach will give you something you can use.