Question

We have WSFC with 4 nodes + a file share witness.

Only nodes 1-3 have SQL server on it with Availability Groups (AGs), and node 4 is shut down gracefully.

Node 1 has the primary AAG server.

Now, we recently had an uncontrollable outage for which the only thing we know for certain is that nodes 1 and 2 were together with power supply and network, but nodes 3-4, and the witness, were cut off - but probably still up.

Since the power supply renewed, WSFC is showing results that I don't understand:

  • Current host server is Node 3;
  • In Cluster Core Resources, IP addresses that belong to nodes 1 and 2, are shown as down;
  • In Nodes section, nodes 1-3 are shown as up, and node 4 down (as expected);

BUT: The primary AG server is still on node 1 as it was before, and it accepts read/write connections as usually.

This make me doubt if I understand correctly how WSFC and AGs work together.

  1. Should not the current WSFC host also be the primary AG server?
  2. How does the listener work? In our case, we have multi-subnet networking, and the listener's DNS name always resolves to 3 IP addresses among which one belongs to the current primary AG server, and works for connections. Does WSFC determine the listener's addresses?
  3. Is there any other principle (except quorum) that AG depends on WSFC?

Thanks.

Était-ce utile?

La solution

Should not the current WSFC host also be the primary AG server?

No. The hosting of the core cluster resources is independent of the AG primary. Generally speaking, it's not super important which server is the WSFC host when dealing with a WSFC cluster for SQL Server Availability Groups.

How does the listener work? In our case, we have multi-subnet networking, and the listener's DNS name always resolves to 3 IP addresses among which one belongs to the current primary AG server, and works for connections. Does WSFC determine the listener's addresses?

This part mostly happens on the client that's connecting to SQL Server. The client gets that same list of IPs from DNS, and then tries each of them until one succeeds. The current primary server will respond and either accept the connection or redirect the client to another server in the cluster (depending on connection settings and other config like read-only routing).

Is there any other principle (except quorum) that AG depends on WSFC?

There's a neat breakdown on this in the docs as well: Relationship of SQL Server AlwaysOn Components to WSFC

In general though, the concept of quorum (which I'll take to be a part of the larger health monitoring / failover concept) is the most relevant one for WSFC and AGs.

Licencié sous: CC-BY-SA avec attribution
Non affilié à dba.stackexchange
scroll top