SQL Server Bi-Directional Transactional Replication - Is it a good use-case?

https://stackoverflow.com/questions/13503081

01-12-2021
|

Question

We're having a problem with scaling out with SQL server. This is largely because of a few reasons: 1) poorly designed data structures, 2) heavy lifting and business/processing logic is all done in T-SQL. This was verified by a Microsoft SQL guy from Redmond we hired to perform an analysis on our server. We're literally solving issues by continually increasing the command timeout, which is ridiculous, and not a good long term solution. We have since put together the following strategy and set of phases:

Phase 1: Throw hardware/software at the issue to stop the bleeding.

This includes a few different things like a caching server, but what I'd like to ask everyone here about is specifically related to implementing bi-directional transactional replication on a new SQL server. We have two use-cases for wanting to implement this:

We were thinking of running the long running (and table/row locking) SELECTs on this new SQL "processing box" and throwing them into a caching layer and having the UI read them from the cache. These SELECTs are generating reports and also returning results on the web.
Most of the business logic is in SQL. We have some LONG running queries for SELECTs, INSERTs, UPDATEs, and DELETEs which perform processing logic. The end result is really just a hand-full of INSERTs, UPDATEs, and DELETEs after the processing is complete (lots of cursors). The thought would be to balance the load between these two servers.

I have some questions:

Are these good use-cases for bi-directional transactional replication?
I need to ensure that this solution is going to "just work" and not have to worry about conflicts. Where would conflicts arise within this solution? I have read a few articles about resetting the increment on your identity seed in order to prevent collisions, which makes sense, but how does it handle UPDATEs/DELETEs or other places where conflicts might occur?
What other issues might I run into and we need to watch out for?
Is there a better solution to this problem?

Phase 2: Rewrite the logic into .NET, where it should be, and optimize SQL stored procedures to perform only set-based operations, as it should also be.

This will obviously take a while, which is why we wanted to see if there were some preliminary steps we could take to stop the pain our users are experiencing.

Thanks.

Solution

Imho bidirectional replication is very very far from 'it will just work'. Preventing update conflicts requires exquisite planning, ensuring that all that 'processing' is carefully orchestrated never to work on overlapping data. Master-master replication is one of the most difficult solution to pull off.

Consider this: you envision a solution that is providing a cheap 2x scale out with nearly no code modification. such a solution would be quite useful, one would expect to see it deployed everywhere. Yet is nowhere to be seen.

I recommend you search for the many blogs and articles describing gotchas and warnings about (the much more popular) MySQL master-master deployments (eg. If You Must Deploy Multi-Master Replication, Read This First), judge for yourself is the trouble is worth it.

I don't have all the details you do, but I would focus on the application. If you want to just throw money at the problem short term I would make sure that the cheap scale-up is exhausted before considering scale-out (SSD/Fusion drives, more RAM). Also investigate snapshot isolation level/read committed snapshot first, if locking is the main issue.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow