Question

We are running an SQL Server 2014 Availability Group with 3 replicas, one synchronous (SQL2 for this matter) and one asynchronous secondary replica. We also configured read-only routing to the synchronous secondary replica.

Last night SQL2 rebooted from automatic windows update installation. The server went back online, SQL Server service started (delayed start) and the database went into recovery. After a while the event viewer showed the database integrity check succeeded and the database was ready for use.

The database showed synchronized state in SQL Management studio. The AG state was healthy, but no queries were getting results from the database.

The queries were blocked by the wait type: HADR_DATABASE_WAIT_FOR_TRANSITION_TO_VERSIONING.

Sometimes the wait type changed to ‘lck_m_s’ wait type and the blocked by a pid that was a process performing a DB Startup command. I know this has to do with the Fast Recovery option that comes with SQL Server Enterprise, but I don’t understand why a simple select was blocked forever.

The main question is: How can SQL Server show the AG database is healthy but actually it isn’t? Do you recognize this problem?

To fix this, we removed the secondary from the AG and joined the database back again to the AG and now everything is working again.

Was it helpful?

Solution

It sounds like you experienced the behavior that's described in this post on the PFE blog:

AlwaysOn Availability Groups unable to query against readable secondary replica database: Wait Type HADR_DATABASE_WAIT_FOR_TRANSITION_TO_VERSIONING

Essentially, there happened to be a long-running transaction on the primary when the secondary was made readable, and thus queries will be blocked in the secondary until the snapshot-related row versions are available. I imagine removing and re-adding the database was coincidental with the long-running transaction finally completing.

So this behavior, as described in that blog post, is by design.

However, if there was not a long-running transaction, then this could be a bug. There is a comment on that blog that indicates others have had this problem:

I am facing the issue after Secondary SQL Server reboot after OS patching. Do not see any open transactions in primary prior to the reboot. There are two databases in the AG having this issue. And we have been waiting for more than 15 hours but still readable replica is not able to process any select query for those databases.

If you are able to repeat the behavior, it would be good to report it on the feedback site and / or engage Microsoft support if you have a support agreement.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top