Identifying field changes between tables and consecutive rows

https://dba.stackexchange.com/questions/173562

07-10-2020
|

Frage

I need to build an audit report to identify field changes between two tables, as well as between consecutive rows in one of the tables. The first table (data) holds the current data:

id  data_id field1  field2  data_dttm
1   100     data3   data3   2017-05-03 00:00:00.000

CREATE TABLE [dbo].[data](
[id] [bigint] NULL,
[data_id] [nchar](10) NULL,
[field1] [nchar](10) NULL,
[field2] [nchar](10) NULL,
[adt_dttm] [datetime] NULL
)

Insert INTO data 
Values (1, 100, 'data3', 'data3', '2017-05-03 00:00:00')

The second table (data_hst) holds data prior to it changing (trigger type operation).

data_hst_id data_id field1  field2  chng_fld_txt    data_dttm
1           100     data1   data2   field1|field2   2017-05-01 00:00:00.000
2           100     data2   data3   field1          2017-05-02 00:00:00.000

CREATE TABLE [dbo].[data_hst](
[data_hst_id] [bigint] NULL,
[data_id] [bigint] NULL,
[field1] [nvarchar](200) NULL,
[field2] [nvarchar](200) NULL,
[chng_fld_txt] [nvarchar](200) NULL,
[data_dttm] [datetime] NULL
)

Insert INTO data_hst
Values (1, 100, 'data1', 'data2', 'field1|field2', '2017-05-01 00:00:00'),
       (2, 100, 'data2', 'data3', 'field1', '2017-05-02 00:00:00')

The 'chng_fld_txt' field holds the list of fields that were modified, pipe delimited. I need to identify what has changed between the data table row and the most recent data_hst table row, as well as the change between the consecutive rows in the data_hst table. An audit report tracking each change identifying the old and new value as they occurred.

The result like this:

table_name  db_field    old_value   new_value   data_dttm
data        field1      data1       data2       5/1/2017
data        field2      data2       data3       5/1/2017
data        field1      data2       data3       5/2/2017

I have a convoluted bit of dynamic sql with a cursor that works for the first condition, but not both. Hoping there's a cleaner way to satisfy both conditions.

    DROP TABLE #changed

    CREATE TABLE #changed(
        [tbl_hst_id] [bigint] NULL,
        [change_field] [nvarchar](200) NULL
    ) ON [PRIMARY]

    DECLARE @db VARCHAR(200)
    SET @db = 'data'
    DECLARE @change_date DATETIME
    DECLARE @change_date_varchar NVARCHAR(200)
    SET @change_date = GETDATE()
    SET @change_date_varchar = LEFT(CONVERT(VARCHAR, @change_date, 120), 10)

    DECLARE @changed_table nvarchar(max)
    SELECT @changed_table = 
    'SELECT TOP 2 t1.' + @db + '_hst_id as tbl_hst_id, t1.change_field
    FROM (
        SELECT A.' + @db + '_hst_id
            ,Split.a.value(''.'', ''VARCHAR(100)'') AS change_field
        FROM (
            SELECT ' + @db + '_hst_id
                ,CAST(''<M>'' + REPLACE(upsrt_chng_fld_txt, '','', ''</M><M>'') + ''</M>'' AS XML) AS String
                ,adt_dttm
            FROM ' + @db + '_hst
            ) AS A
        CROSS APPLY String.nodes(''/M'') AS Split(a)
        WHERE adt_dttm >= DATEADD(D,-4,''' + @change_date_varchar + ''')
        ) T1
    WHERE T1.change_field NOT IN (
            ''upsrt_dttm''
            ,''upsrt_trnsctn_nmbr''
            )'

    exec ('insert #changed ' + @changed_table)

    DECLARE @delta_field VARCHAR(max)
    DECLARE @db_sql_all NVARCHAR(max) = ''

    DECLARE @getDeltaField CURSOR SET @getDeltaField = CURSOR
    FOR
    SELECT change_field
    FROM #changed

    OPEN @getDeltaField

    FETCH NEXT
    FROM @getDeltaField
    INTO @delta_field

    WHILE @@FETCH_STATUS = 0
    BEGIN

        DECLARE @db_sql NVARCHAR(MAX)

        SELECT @db_sql = 'select ''' + @db + ''' as table_name
            , ''1'' as id
            ,''' + @delta_field + ''' as db_field
            ,CAST(hst.' + @delta_field + ' AS NVARCHAR(200)) as old_value
            ,CAST(c.' + @delta_field + ' AS NVARCHAR(200)) as new_value
            --,c.upsrt_usr_id as change_by
            --,c.upsrt_dttm as change_time
            from ' + @db + ' c
            inner join ' + @db + '_hst hst
            on c.' + @db + '_id = hst.' + @db + '_id
            join #changed ch on hst.' + @db + '_hst_id = ch.tbl_hst_id
            UNION '

        SET @db_sql_all = @db_sql_all + @db_sql

        FETCH NEXT
        FROM @getDeltaField
        INTO @delta_field
    END

    CLOSE @getDeltaField

    DEALLOCATE @getDeltaField
    IF len(@db_sql_all) > 0
    BEGIN
        SET @db_sql_all = SUBSTRING(@db_sql_all, 1, len(@db_sql_all) - 6)
    END

    PRINT(@db_sql_all);
    EXEC (@db_sql_all);

Lösung

If you've got the old row, and the new row, you can ignore the information about what changed, and just look at the data.

IMPORTANT: this solution requires SQL Server 2012, as that's when the LEAD function was added.

USE tempdb;

CREATE TABLE [dbo].[data](
[id] [bigint] NULL,
[data_id] [bigint] NULL,
[field1] [nchar](10) NULL,
[field2] [nchar](10) NULL,
[adt_dttm] [datetime] NULL
);

Insert INTO data 
Values (1, 100, 'data3', 'data3', '2017-05-03 00:00:00')
;

CREATE TABLE [dbo].[data_hst](
[data_hst_id] [bigint] NULL,
[data_id] [bigint] NULL,
[field1] [nvarchar](200) NULL,
[field2] [nvarchar](200) NULL,
[chng_fld_txt] [nvarchar](200) NULL,
[data_dttm] [datetime] NULL
);

Insert INTO data_hst
Values (1, 100, 'data1', 'data2', 'field1|field2', '2017-05-01 00:00:00'),
       (2, 100, 'data2', 'data3', 'field1', '2017-05-02 00:00:00')
;



WITH hist_and_curr AS
     (SELECT data_id
            ,field1
            ,field2
            ,data_dttm
        FROM data_hst
      UNION ALL
      SELECT data_id
            ,field1
            ,field2
            ,adt_dttm
        FROM data
     )
    ,find_changes as
     (SELECT data_id
            ,'field1' as [db_field]
            ,field1 as old_value
            ,LEAD(field1) OVER (PARTITION BY data_id ORDER BY data_dttm) as new_value
            ,data_dttm
        FROM hist_and_curr
      UNION ALL
      SELECT data_id
            ,'field2' as [db_field]
            ,field2 as old_value
            ,LEAD(field2) OVER (PARTITION BY data_id ORDER BY data_dttm) as new_value
            ,data_dttm
        FROM hist_and_curr
     )
SELECT 'data' AS [tablename]
      ,[db_field]
      ,[old_value]
      ,[new_value]
      ,[data_dttm]
  FROM find_changes
 WHERE old_value <> new_value
 ORDER BY [tablename], [data_dttm], [db_field]
;


DROP TABLE [data_hst];
DROP TABLE [data];

So. The hist_and_curr CTE is a simple UNION ALL of the data and data_hst rows , as they seem to have the same layout, except for `chng_fld_txt, which we're going to ignore.

In the find_changes CTE, we UNION ALL separate SELECTs for each column. We use the LEAD function to access the column's value from the next row for this data_id (with the rows sorted by the data_dttm column), to get old_value and new_value.

Then, we keep the rows where old_value and new_value are different.

ASSUMPTIONS:

We're dealing with few enough columns for hard-coding this to make sense; if not, you could again use dynamic SQL to build the statements you want. If you're trying to combine information for different tables, you might store intermediate results in a temp table, instead of trying to combine SELECTs for potentially dozens of tables and hundreds of columns into a single CTE.
All the columns are compatible data types. If this isn't true, then you'd need to convert the fields in find_changes, or they couldn't be combined into a single virtual table.
As noted above, that you're using SQL Server 2012 or later. I'm going to post this even though I don't know your SQL Server version yet, since the CTE to combine history and current data may let you proceed with your own ideas, and since others who are interested may be using an appropriate version of SQL Server.

I did glance over your code, but since it involves references to column data we didn't have, I decided to focus on the problem as stated, which was relatively clear. Hopefully, this approach will give you something you can use.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit dba.stackexchange