Best way to compare two tables containing more columns in SQL Server

Question 1

I would advice against concatenating the fields before comparing but rather suggest to create a checksum value and work from there. I've done something similar and approached it like this: first I added an extra field to the table that holds the BINARY_CHECKSUM() of the key fields (you can use a calculated field for this!) and put an index on just that crc-field but INCLUDE() all the actual key fields. The generated values are not going to be unique, but it will be close enough to work. Mind that it might take a while for the index to be created, depending on the size it could take up quite a bit of space too.

When JOINing both tables join on the crc + on all the key fields (**). SQL will be quite good at matching just the integer to find the right row(s) and then comparing the other fields can be done from the included columns.

(**: don't rely on just the crc-value, BINARY_CHECKSUM() is fast and easy to use but will probably have collisions!)

Question 2

This is not a complete answer, but it may help save some time

I was doing a very similar thing - normalising a 2.5 million row one dimensional database into half a dozen tables.

What I ended up doing was using 'C' to write short programs to loåd data in and step through it and then insert new records in the new database tables because the clever SQL took all night to not actually generated a result.

Using C I could write debug output to see where things were happening, and better still, when things went wrong I could see what was gong wrong, and best of all, when the job was half done, it was half done - aborting the program halfway through at least left some good new data in the new table.

So my main point is this: dont try to be clever and construct that magic sql query that does all the work. Break it into steps and program it. It runs massively faster.

And you know far better how the data is arranged than SQL ever will.

(Adding indexes is always good on tables that dont change much and really speeds up searches. You can always remove them later on)