Dropping historic Partitions that are no longer required while in P2P replication

https://dba.stackexchange.com/questions/118474

29-09-2020
|

Question

The Company I work for has a set of databases (largest just below 1Tb) on different servers - 2 in the US, 2 in Europe.

We run full peer-to-peer replication per database between the 4 nodes - so they can all take transactions (Insert/Update/Delete) and all have the data that the other nodes have gathered (within a variable latency - worst connection is on average about 30-40 secs).

The largest database carries data from early 2008 to today. All of this data is further replicated to Reporting nodes which hold ALL data.

I need to remove data at the Transactional nodes, up to 2013, to remove the Drive space deficit at the Transactional nodes, and therefore historic data will only be available at the Reporting nodes.

What is the best way to do this? Data is relatively manageable in that it is well partitioned (monthly - by partition, and then Yearly into separate Files/Filegroups). However, there is the problem of not being able to drop the Partitions while they are involved in replication and reading up on Partition switching - this is not allowed with that either. (Switching Partitions Prerequisites - Point 18)

As a full Production Environment I am trying to avoid anything that will affect replication - including resynchronisation (lot of data to resync, over large distances).

Does anyone have any good suggestions for how to perform this task?

Solution

So, no answers from here, but after a certain amount of discussion and thought, I came up with a plan some months ago.

I will make this answer concise for this forum (you may not agree that I have!), to try to assist anyone that needs to perform a similar task in the future, feel free to ask questions if I miss anything out - though the method is straight-forward.

So, the primary concern is to remove the data without significant impact on Production traffic at the nodes we are replicating to/from. The easiest way to do this is to segregate a node that you want to work on, removing the data from that node whilst leaving all others unaffected (including the Reporting nodes).

Best way to do this (remember you can't drop partitions and any/most operations are replicated and therefore create a large amount of traffic and a large amount of row changes), is to create a new SP and setup a Publication around this one SP. It should therefore be available at all nodes. The important bit is to set Replication to replicate the Execution of the SP - NOT the result (i.e. replicate the EXEC Sp_delete call NOT the delete WHERE ID = 1, delete WHERE ID = 2 - row level changes). This is set in Right click your new Publication (before setting up the other nodes in the Topology) > Properties > Articles > Click the SP_Delete you have setup > Article Properties button > Set Properties of Highlighted Stored Procedure Article > Replicate line = EXECUTION of the stored procedure. Complete your P2P topology.

BUT MHSQLDBA, you might be saying, that's just going to separately delete the rows at every node via the SP. - This is why you set the SP to only do the deletions:

IF @@SERVERNAME='The current server you want to affect'

Follow it with your delete procedure.

Thus, when this EXEC call is picked up at the server(s) that you don't want to perform the Deletes, it will be ignored as @@Servername will not equal the Server you have selected.

You may think - why not just create an SP at only the Server you are interested in and run that? - this is because if you do this, replication will break down the changes into how they affect the Article (table) rows and replicate the actual changes - you have to replicate the SP so that you can specify that the EXEC of the SP be replicated rather than the resultant changes.

This is the suggested order of events in my opinion/experience:

Create SP with delete code that specifies that it will only run the delete code if @@SERVERNAME = your desired server
Setup a new publication that Replicates this 1 SP with the Replicate = Execution of Stored Procedure within Article Properties
Run the SP at your desired server and be happy that you haven't brought down the whole estate with thousands of replicated DELETE commands

Points of Note:

This is still a laborious task. By using this method you have diminished your effect on all servers apart from the one you are working on. You have not diminished the workload for you, in fact you have made it worse - you are going to have to run this same SP at each node (with the IF line changed to the server you are targeting), effectively increasing the work you have to do, by the number of servers you have to affect. It is massively safer though as you will have minimal effect on all other nodes (I'm presuming you've failed traffic away from the node you're working on of course!)
By using this method, you have created inconsistency between your nodes - you really need to be sure that the data you're removing IS NOT going to change before you can finish performing the same operation on all nodes that require work. If a row you have deleted at 1 node is changed within the rest of the Estate you are going to end up with consistency errors.
You are likely going to put your normal replication expected SLA's behind by the amount of time it takes to perform the deletes at the node you're working on (I highly recommend you read up on batching the deletes) - therefore you need to be aware that once the operation completes, you won't have the node back in action until normal replication has caught back up after your Delete operation's locks have released. If you are replicating over high latency lines, I seriously suggest you check out using PULL agents instead of PUSH - it makes a humongous difference.

There is probably a better way of moving the data away in the SP than using delete - maybe moving it to another table that is not involved in replication and then dropping the 'new' table - or the reverse, if the data you want to keep is less than the amount to delete, move the data you want to keep to a new table, drop the old one then rename your new table - there is lots of advice out there from these perspectives - I work in an environment where it was easier to fight for deletion than promote a concept that some personnel won't understand, so I'm describing the painful but basic way.

DISCLAIMER: All of the above is dangerous. If done hastily without appropriate forethought, you can seriously mess up a replication topology, your Company's data and probably your employment. Please take the above method and develop your own battleplan - create a test environment to prove the concept, test test and retest, do not take this task lightly. With enough consideration you will achieve your task - but it's not worth doing on Friday afternoon after a couple of lunchtime beers. Do it right, do it once (for real-test as much as you can), do it properly.

I hope this helps someone else. - I'm adding this bit as it is what I would have searched for if I wanted this answer:

Delete large amount of data from a Peer-to-Peer replication topology

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange