In which order does a SQL Replication write the data
-
12-03-2021 - |
Question
I use a unidirectional Replication of a database. In the Database I have a column with the datatype "timestamp" (not datetime).
When the replication is initiated it writes table after table to the subscriber.
My question is: Is the data within a table writen to the subscriber in the same order as it was created in the published database?
So is the timestamp in a table of the subscriber also usable as a sort order as it was on the publisher e.g. for last inserted and updated records? Becouse unfortunatly I do not have a column like EditDateTime.
Solution
I did a test, and it seems that you are at the mercy of how the BCP file was generated for the snapshot. I.e., in what order did SQL server happened to read the rows when generating the snapshot file (bcp file).
I.e., the same story as "Can we guarantee an ordering without ORDER BY?" The answer (of course) being "no".
I did a test and created a clustered index on the table as DESC, just to increase the likelihood that the data was to be read in a "different" order when producing the snapshot.
Then I did a few queries to test, and my fears were confirmed. The sequence for the ts column was indeed dependent on the order which were used to read the data when generating the BCP file.
So I guess this is one of those "Do you feel lucky?" :-)
I created both the pub and sub on the same SQL Server, using two databases: a and b. Below is a rough script of what I was using, in case you want to play with it.
--In database a
USE a
CREATE TABLE t(c1 int identity, c2 timestamp, c3 varchar(100) default 'hej')
CREATE CLUSTERED INDEX x ON t(c1 DESC)
SET NOCOUNT ON
GO
INSERT INTO t DEFAULT VALUES
GO 1000
SELECT * FROM t
USE b
--Just to get different ts numbers
CREATE TABLE t2(c1 int identity, c2 timestamp, c3 varchar(100) default 'hej')
GO
INSERT INTO t2 DEFAULT VALUES
GO 1000
SELECT * FROM t2
--Create the publication, and then do some comparison
SELECT TOP(10) * FROM a..t ORDER BY c2 --desc
SELECT TOP(10) * FROM b..t ORDER BY c2 --desc
1 0x0000000000001796 hej
2 0x0000000000001797 hej
3 0x0000000000001798 hej
4 0x0000000000001799 hej
5 0x000000000000179A hej
6 0x000000000000179B hej
7 0x000000000000179C hej
8 0x000000000000179D hej
9 0x000000000000179E hej
10 0x000000000000179F hej
131 0x0000000000011281 hej
130 0x0000000000011282 hej
129 0x0000000000011283 hej
128 0x0000000000011284 hej
127 0x0000000000011285 hej
126 0x0000000000011286 hej
125 0x0000000000011287 hej
124 0x0000000000011288 hej
123 0x0000000000011289 hej
122 0x000000000001128A hej
SELECT TOP(10) * FROM a..t ORDER BY c1 --desc
SELECT TOP(10) * FROM b..t ORDER BY c1 --desc
1 0x0000000000001796 hej
2 0x0000000000001797 hej
3 0x0000000000001798 hej
4 0x0000000000001799 hej
5 0x000000000000179A hej
6 0x000000000000179B hej
7 0x000000000000179C hej
8 0x000000000000179D hej
9 0x000000000000179E hej
10 0x000000000000179F hej
1 0x0000000000011ADB hej
2 0x0000000000011ADA hej
3 0x0000000000011AD9 hej
4 0x0000000000011AD8 hej
5 0x0000000000011AD7 hej
6 0x0000000000011AD6 hej
7 0x0000000000011AD5 hej
8 0x0000000000011AD4 hej
9 0x0000000000011AD3 hej
10 0x0000000000011AD2 hej
Having said that, there is an option for the pub to convert timestamp to binary(8). As far as I can imagine, that should keep the data, and you'd of course lose the ts attribute and functionality on the sub.
After changing this attribute, the values did indeed come out the same for both my a and b table.
So, short story seems to be to convert your timestamps to a passive binary(8) in the replication property and you'll keep the values (and by that the ordering as well).
OTHER TIPS
I'm not sure if you mean Transactional Replication or Snapshot Replication in your current situation, but in either case the ordering of the data should be preserved. You can read more on how Transactional Replication works in that linked Brent Ozar article.
Transactional Replication uses the the Transaction Log to ensure Atomicity and Consistency of the ACID principals in SQL Server. Snapshot Replication is a literal copy (snapshot) of your data in a point in time and exactly preserved that way when distributed to the Subscriber database.
That being said, the data type TIMESTAMP is uniquely generated based on the rowversion by the server and therefore will not match between your Publisher and Subscriber database(s).
For Transactional Replication it should be in the same sequential ordering, for Snapshot Replication I don't know if it can be guaranteed and you'd have to test, but my guess is there's still a consistent row ordering by the underlying rowid of the table, if not the primary key, and therefore should also be consistent (despite being new values in the TIMESTAMP field.)