Question

We have a scenario in which we want to change the collation of our production database (including columns) from SQL_Scandinavian_Pref_CP850_CI_AS to Finnish_Swedish_CI_AS. We have developed scripts to do that. But executing this script in a large database of size more than 100GB will take considerable time and we can't afford to take very long downtime. So we decided to reduce this downtime using the below strategy:

  1. We will setup Transactional Replication and we will initialize the subscriber using the database backup method.
  2. Publisher database will be live with the application and its transactions will be delivered to the subscriber database through Transactional Replication.
  3. We will execute the Collation Change Script on the subscriber side, and it does allow us to execute this script when SQL Server is the same for publisher and subscriber databases. We recently found this in SQL Server 2019.
  4. Now, the pain point is, it's not replicating correctly the varchar, char column's data when it contains special characters like 'åÅäÄöÖ'. On the subscriber side, we are getting weird characters like '†„Ž”™'

Can you please suggest how we can resolve this bug or any alternative architecture to minimize the downtime in production while changing the database collation(including columns)?

Furthermore, my collation change script is doing the following tasks on the subscriber database to change its collation:

  1. Drop Foreign Key Constraints
  2. Drop indexes including Primary Key
  3. Drop check and default constraints
  4. Drop user statistics
  5. Drop views, computed columns, SPs to resolve Object Bound bugs
  6. After execution of the above steps, the Tables are ready for collation change. So, the script will change the collations of the columns for every table one by one.
  7. Recreate above listed constraints after successful execution of step 6.
Was it helpful?

Solution

I just tried this and it worked in Transactional Replication:

  1. Server Level Collation is Finnish_Swedish_CI_AS
  2. Database Level Collation of both sides of the databases are the same which is SQL_Scandinavian_Pref_CP850_CI_AS
  3. Column Level Collation at publisher side database is SQL_Scandinavian_Pref_CP850_CI_AS and subscriber side database is Finnish_Swedish_CI_AS

The above setup delivered correct data from the publisher side database to the subscriber side database. Its column-level collation change, which takes time and that depends upon how much data the table contains. Database level collation change can be managed in downtime by just dropping the dependencies first and recreating them. So, its the Database level collation which was causing this data issue, when it was differing, not column-level collation.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top