Question

I am working on a project where I need to synchronize data from our system to an external system. What I want to achieve, is to periodically send only changed items (rows) from a custom query. This query looks like this (but with many more columns) :

SELECT T1.field1,
    T1.field2,
    T1.field2,
    T1.field3,
    CASE WHEN T1.field4 = 'some-value' THEN 1 ELSE 0 END,
    T2.field1,
    T3.field1,
    T4.field1
FROM T1
INNER JOIN T2 ON T2.pk = T2.fk
INNER JOIN T3 ON T3.pk = T2.fk
INNER JOIN T4 ON T4.pk = T2.fk

I want to avoid to have to compare every field one to one between synchronizations. I came with the idea that I could generate a hash for every row from my query, and compare this with the hash from the previous synchronization, which will return only the changed rows. I am aware of the CHECKSUM function, but it is very collision-prone and might miss changes sometimes. However I like the way I could just make a temp table and use CHECKSUM(*), which makes maintenance easier (not having to add fields in the query and in the CHECKSUM) :

SELECT T1.field1,
    T1.field2,
    T1.field2,
    T1.field3,
    CASE WHEN T1.field4 = 'some-value' THEN 1 ELSE 0 END,
    T2.field1,
    T3.field1,
    T4.field1
INTO #tmp
FROM T1
INNER JOIN T2 ON T2.pk = T2.fk
INNER JOIN T3 ON T3.pk = T2.fk
INNER JOIN T4 ON T4.pk = T2.fk;

-- get all columns from the query, plus a hash of the row
SELECT *, CHECKSUM(*)
FROM #tmp;

I am aware of HASHBYTES function (which supports sha1, md5, which are less prone to collisions), but it only accept varchar or varbinary, not a list of columns or * the way CHECKSUM does. Having to cast/convert every column from the query is a pain in the ... and opens the door to errors (forget to include a new field for instance)

I also noticed Change Data Capture and Change Tracking features of SQL Server, but they all seems complicated and overkill for what I am doing.

So my question : is there an other method to generate a hash from a query or a temp table that meets my criterias ?

If not, is there an other way to achieve this kind of work (to sync differences from a query)

Was it helpful?

Solution

I found a way to do exactly what I wanted, thanks to the FOR XML clause :

SELECT T1.field1,
    T1.field2,
    T1.field2,
    T1.field3,
    CASE WHEN T1.field4 = 'some-value' THEN 1 ELSE 0 END,
    T2.field1,
    T3.field1,
    T4.field1
INTO #tmp
FROM T1
INNER JOIN T2 ON T2.pk = T2.fk
INNER JOIN T3 ON T3.pk = T2.fk
INNER JOIN T4 ON T4.pk = T2.fk;

-- get all columns from the query, plus a hash of the row (converted in an hex string)
SELECT T.*, CONVERT(VARCHAR(100), HASHBYTES('sha1', (SELECT T.* FOR XML RAW)), 2) AS sHash
FROM #tmp AS T;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top