Can you switch recovery mode to simple from full in an Always ON cluster setup?

https://dba.stackexchange.com/questions/285887

16-03-2021
|

Question

I need help understanding my options. Given:

SQL server cluster setup that is maintained by a DBA.
System is NOT in production yet (not handed over to the customer).
Initial high volume push of data that has NO breaks for days
I can stop and start the push of data and change the database at any point. Basically I can stop all input.

I recommend to our DBA that we switch to simple mode because we are seeing within the activity monitor a pile up of queries in a wait state which eventually causes errors for the application servers.

I've never had this problem before and in a process of elimination I see we are in FULL recovery mode instead of SIMPLE.

I asked for us to switch to SIMPLE from FULL and this is the reply I get. Is there another option perhaps that the DBA isn't aware of?

How can I help?

"We cannot switch to simple using AlwaysOn in SQL."

Can I turn off AlwaysOn for the initial loading phase or am I out of luck?

--- Addition ----

Solution

The DBA is correct - if the database is part of an Availability Group (AG), it has to be in the FULL recovery model. This is because of the way AGs work - they ship transaction log blocks from the primary server to the secondary server(s). So all the detailed log records provided by the FULL recovery model are needed.

You could remove the database from the AG during your initial data load.

ALTER AVAILABILITY GROUP [YourAG]
REMOVE DATABASE [YourDatabaseName];

However, the (potentially major) downside to this is that the secondary will need to be "re-initialized" once the data load is done. That means either using "automatic seeding" to replicate the entire primary database to the secondary server(s) (Enable automatic seeding on an existing availability group), or taking a backup of the primary, restoring it on each of the secondaries, and then re-adding the database to the AG (Prepare a secondary database for an Always On availability group).

Depending on how big the database is after the data load, this might be "no big deal," or it could be a real hassle. But yeah, that's the other option available to the DBA basically.

By the way, regarding this:

we are seeing within the activity monitor a pile up of queries in a wait state

You didn't mention the actual wait, but unless it's HADR_SYNC_COMMIT or WRITELOG, removing from the AG / switching to SIMPLE recovery might not help.

Even if these are your waits, be aware that you are treating a symptom. If the app ever needs to dump in data like this once it's live, you probably won't be able to monkey with the AG or recovery models.

Maybe your app could reduce the amount of log data generated by grouping some of its transactions into larger batches.

Or maybe you need to investigate the network speed between the primary and secondary servers (keeping in mind how to properly measure AG throughput).

Those are just some possible remedies. All of this is just to say that you're better off, in the long run, figuring out the root cause of this "wait state leading to application errors" business now rather than later.

The waits you've showed are lock waits, indicative of a classic blocking chain.

Notice the "Blocked By" column is 131 for all of the rows in the screenshot. That's a "Session ID" (first column of Activity Monitor). Keep following that chain until you find the lead blocker ("Blocked By" will be blank), and look in the "Wait Type" column to see what it is waiting on. That's where you should focus your efforts to unravel the root cause.

Regarding this update:

Apparently the default for doing a backup of the logging directory was set to a large value. When that kicked off to clear out the logging (incremental log thing??), it makes the whole system stutter and caused the issue. That's my current theory. Now he has changed it to 5 minutes and so far, all we see is a stutter and recovery on our side. I have my engineer gather more real data (the actual block if he can catch it).

It sounds like increased resource usage when transaction log backups occur is interfering with your workload. One solution to this would be to run transaction logs more frequently, as the DBA did. You could even run the backups every minute if they finish fast enough.

All that said, the original solution of switching to SIMPLE recovery is a viable option if it's feasible to re-initialize the secondary after the initial load is done.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange