Question

I am attempting to create a distributed AG on SQL 2016 SP1, but keep getting the following errors. I have tried this in a VirtualBox (local desktop) and actual VMWare virtual environment but getting the same errors each time.

The AGs creates successfully on both WFCs, but when I try to "ALTER AVAILABILITY GROUP [Dist AG Name] JOIN" on the primary replica of the 'remote' AG, the command completes successfully, but I immediately get this error in the log:

A connection timeout has occurred while attempting to establish a connection to availability replica 'AG1' with id [FCAA4083-D6B8-1BAC-9E8C-6AECE34513E6]. Either a networking or firewall issue exists, or the endpoint address provided for the replica is not the database mirroring endpoint of the host server instance.

On the primary replica of the 'local' AG, I start getting these errors in the log repeatedly:

Length specified in network packet payload did not match number of bytes read; the connection has been closed. Please contact the vendor of the client library. [CLIENT: 192.168.56.104]

These are the bytes that show up in Event Viewer event:

0000: AC 45 00 00 14 00 00 00   ¬E...... 
0008: 14 00 00 00 41 00 47 00   ....A.G. 
0010: 31 00 53 00 45 00 52 00   1.S.E.R.
0018: 56 00 45 00 52 00 31 00   V.E.R.1.
0020: 5C 00 41 00 47 00 31 00   \.A.G.1.
0028: 49 00 4E 00 53 00 54 00   I.N.S.T.
0030: 31 00 00 00 00 00 00 00   1.......

In both environments, I am only dealing with a single subnet and a single domain. I've also tried removing all special characters from VM hostnames, instance names, AG names, and listener names during troubleshooting.

Wondering if anyone has ran into this before. I don't see anything in the SP2 release notes that suggests its a bug, but I'm willing to try an SP2 upgrade if anyone thinks its worthwhile. (Unfortunately I am not in a position to be able to upgrade my prod environment for some time which is why I haven't tried it yet.)

I can also post full endpoint/AG/listener creation scripts if anyone is willing to try to reproduce or provide troubleshooting tips.

Thanks!

Was it helpful?

Solution

Well I finally found the problem and it was my own dumb mistake. When you create the distributed availability group, for the LISTENER_URL clauses, you have to specify the listener DNS names, but the mirroring endpoint ports. I was specifying the listener DNS names and the listener ports.

For example, here is my endpoint definition (note port 5022):

CREATE ENDPOINT [Hadr_endpoint] 
    STATE=STARTED
    AS TCP (LISTENER_PORT = 5022, LISTENER_IP = (192.168.159.102))
    FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE
, ENCRYPTION = REQUIRED ALGORITHM AES)
GO

Here is my listener definition (note port 1435):

ALTER AVAILABILITY GROUP [AG1]
ADD LISTENER N'AG1LISTENER' (
WITH IP
((N'192.168.232.203', N'255.255.255.0')
)
, PORT=1435);
GO

And here is the CORRECT way to define the distributed AG (note listener DNS name but endpoint port):

CREATE AVAILABILITY GROUP [DistAG]
WITH (DISTRIBUTED)
AVAILABILITY GROUP ON
    'AG1' WITH
    (
    LISTENER_URL = 'tcp://AG1LISTENER:5022',    
    AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,   
    FAILOVER_MODE = MANUAL,   
    SEEDING_MODE = AUTOMATIC
    ),
    'AG2' WITH
    (
    LISTENER_URL = 'tcp://AG2LISTENER:5022',    
    AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,   
    FAILOVER_MODE = MANUAL,   
    SEEDING_MODE = AUTOMATIC
    );
GO

This is explained very clearly in the huge purple note highlighted in this article that I somehow completely overlooked before: https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/configure-distributed-availability-groups?view=sql-server-2017

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top