Good practices for patching SQL nodes in High Availability solution [duplicate]

https://dba.stackexchange.com/questions/270938

04-03-2021
|

题

Our database structure utilizes two roles in the Window Failover Cluster Manager.

One is for FCI - switching between the active/passive node in the Primary Data center.
One is for SQL Availability Group setup to copy data from the primary data center to the DR data center. This also use for manual failover in the case of disaster scenario occurs.

Our IT team wants to perform OS patching on these servers regularly on a scheduled down time, the application will be offline when this happens. This helps us out a bit because we don't have to worry about failover to a node while patching another, but instead we can assume we can bring the database offline completely during patching period.

My question is what is the best way to do this since IT team probably does not know much about FCI or Database. I don't want our databases to failover unexpectedly because during this process one of the active SQL node may needs reboot. I'm thinking of giving them the following instruction.

RDP into the node active Cluster Group, you can find this by running the powershell command: Get-ClusterGroup and find the OwnerNodeValue of the "Cluster Group".

Open up Failover Cluster Manager Expands to Roles, there are two roles one for AG and one for FCI.
Right click on each and select "Stop Role"
Perform patch, restart ect..
Once done, RDP back in and "Start Role" on the two roles to resume.

Would this work and is it considered good practice? I am hoping to force the failover service to stop so that the SQL nodes don't fail-over to another node during a server reboot.

解决方案

Approaching this by thinking about each SQL instance separately.

Patching SQLDR\Inst_B

This is a secondary replica, so it can be patched/rebooted independently.

Switch SQLDR\Inst_B to be an ASYNC replica (if it's not already)
Patch SQLDB\Inst_B and reboot (windows patches, SQL patches, etc.)
Verify SQLDR comes back up and the AG is healthy.
Switch back to SYNC if required.

Notes: While Inst_B is offline, the T-Logs on Inst_A will be kept around to replay when Inst_B returns. I wouldn't recommend patching Inst_B while doing index maintenance or large data loads on Inst_A. If Inst_B is down for a long period of time (think days) then you'll probably want to break the AG replica.

Patching SQLFCIPrimary\Inst_A

Patch passive node and reboot.
Failover to Passive node. This will cause your (hopefully brief, e.g. < 1 min) outage.
Confirm AG is healthy
Patch new passive node and reboot.
Failback if required.

Notes: Since you are using MANUAL failover, there is no risk that you'll inadvertently failover to Inst_B.

This is the simplest way to accomplish regular patching, and just needs one or two FCI failovers. There is no reason to stop WSFC roles.

Whether you do Inst_A or Inst_B first doesn't matter, but I'm risk adverse, so I'd start with Inst_B.

许可以下： CC-BY-SA 和归因

不隶属于 dba.stackexchange