Question

We're doing an ETL process. When all is said and done there are a bunch of tables that should be identical. What is the quickest way to verify that those tables (on two different servers) are in fact identical. I'm talking both schema and data.

Can I do a hash on the table it's self like I would be able to on an individual file or filegroup - to compare one to the other. We have Red-Gate data compare but since the tables in question contain millions of rows each I'd like something a little more performant.

One approach that intrigues me is this creative use of the union statement. But, I'd like to explore the hash idea a little further if possible.

POST ANSWER UPDATE

For any future vistors... here is the exact approach I ended up taking. It worked so well we're doing it on every table in each database. Thanks to answers below for pointing me in the right direction.

CREATE PROCEDURE [dbo].[usp_DatabaseValidation]
    @TableName varchar(50)

AS
BEGIN

    SET NOCOUNT ON;

    -- parameter = if no table name was passed do them all, otherwise just check the one

    -- create a temp table that lists all tables in target database

    CREATE TABLE #ChkSumTargetTables ([fullname] varchar(250), [name] varchar(50), chksum int);
    INSERT INTO #ChkSumTargetTables ([fullname], [name], [chksum])
        SELECT DISTINCT
            '[MyDatabase].[' + S.name + '].['
            + T.name + ']' AS [fullname],
            T.name AS [name],
            0 AS [chksum]
        FROM MyDatabase.sys.tables T
            INNER JOIN MyDatabase.sys.schemas S ON T.schema_id = S.schema_id
        WHERE 
            T.name like IsNull(@TableName,'%');

    -- create a temp table that lists all tables in source database

    CREATE TABLE #ChkSumSourceTables ([fullname] varchar(250), [name] varchar(50), chksum int)
    INSERT INTO #ChkSumSourceTables ([fullname], [name], [chksum])
        SELECT DISTINCT
            '[MyLinkedServer].[MyDatabase].[' + S.name + '].['
            + T.name + ']' AS [fullname],
            T.name AS [name],
            0 AS [chksum]
        FROM [MyLinkedServer].[MyDatabase].sys.tables T
            INNER JOIN [MyLinkedServer].[MyDatabase].sys.schemas S ON 
            T.schema_id = S.schema_id
        WHERE
            T.name like IsNull(@TableName,'%');;

    -- build a dynamic sql statement to populate temp tables with the checksums of each table

    DECLARE @TargetStmt VARCHAR(MAX)
    SELECT  @TargetStmt = COALESCE(@TargetStmt + ';', '')
            + 'UPDATE #ChkSumTargetTables SET [chksum] = (SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM '
            + T.FullName + ') WHERE [name] = ''' + T.Name + ''''
    FROM    #ChkSumTargetTables T

    SELECT  @TargetStmt

    DECLARE @SourceStmt VARCHAR(MAX)
    SELECT  @SourceStmt = COALESCE(@SourceStmt + ';', '')
            + 'UPDATE #ChkSumSourceTables SET [chksum] = (SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM '
            + S.FullName + ') WHERE [name] = ''' + S.Name + ''''
    FROM    #ChkSumSourceTables S

    -- execute dynamic statements - populate temp tables with checksums

    EXEC (@TargetStmt);
    EXEC (@SourceStmt);

    --compare the two databases to find any checksums that are different

    SELECT  TT.FullName AS [TABLES WHOSE CHECKSUM DOES NOT MATCH]
    FROM #ChkSumTargetTables TT
    LEFT JOIN #ChkSumSourceTables ST ON TT.Name = ST.Name
    WHERE IsNull(ST.chksum,0) <> IsNull(TT.chksum,0)

    --drop the temp tables from the tempdb

    DROP TABLE #ChkSumTargetTables;
    DROP TABLE #ChkSumSourceTables;

END

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top