Question

I'm trying to write a bash script to monitor simple master-slave replication with 5.6.33 and send email and log messages when anything goes wrong.

I want to test scenarios where replication fails and see if the script catches them all.

I tried to simulate network issues by blocking the slave (10.0.0.3) from accessing 3306 on master (10.0.0.2), but allowing the web server (10.0.0.1) to continue running the site.

On Master (10.0.0.2) I removed the ufw rule that allowed slave to connect and reloaded the firewall:

ubuntu@db:~$ sudo ufw delete 6
ubuntu@db:~$ sudo ufw reload
ubuntu@db:~$ sudo ufw status numbered
Status: active
     To                         Action      From
     --                         ------      ----
[ 1] 22                      ALLOW IN    Anywhere
[ 2] 22/tcp                  ALLOW OUT   Anywhere (out)
[ 3] 22/udp                  ALLOW OUT   Anywhere (out)
[ 4] 22                      ALLOW IN    10.0.0.0/28
[ 5] 3306                    ALLOW IN    10.0.0.1
[ 6] 22/tcp (v6)             ALLOW OUT   Anywhere (v6) (out)
[ 7] 22/udp (v6)             ALLOW OUT   Anywhere (v6) (out)

And on Slave (10.0.0.3) it shows that I cannot connect via telnet now, whereas previously I could:

ubuntu@ip-10-0-0-3:~$ telnet 10.0.0.2 3306
Trying 10.0.0.2...
^C
ubuntu@ip-10-0-0-3:~$ 

But still replication works.

Here's a sample of the grepped output of SHOW SLAVE STATUS in a loop with 1 second delay:

20170309_154122
               Slave_IO_State: Waiting for master to send event
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
        Seconds_Behind_Master: 0
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
20170309_154123
               Slave_IO_State: Waiting for master to send event
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
        Seconds_Behind_Master: 0
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it

I also checked with

master> show master status;

and

slave> show slave status \G

and the log positions match continuously.


EDIT: Ufw's default is to deny incoming requests:

ubuntu@db:~$ sudo ufw status verbose
Status: active
Logging: on (low)
Default: deny (incoming), allow (outgoing), disabled (routed)

What mistake am I making?

I first tried blocking master's IP in the slave's firewall. That did not stop replication.
I figured that was obviously wrong because slaves request and read from the master, and that I would have to block slave's attempts to read, in the master's firewall.
But this too seems to work fine.

Given that I cannot telnet from slave to master, how is this working? I tested this by enabling and disabling the firewall and telnet. Is telnet an unreliable tool for testing replication connectivity?

Does mysql employ some "push" functionality from the master's side, because it knows the slave's details?

Any help is greatly appreciated.

EDIT2: I think I found the answer to the mystery but it exposes a security issue.

I ran netstat -plantu on both master and slave and here are the relevant outputs:

Master:

ubuntu@db:~$ sudo netstat -plantu
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
...
tcp6       0      0 :::3306                 :::*                    LISTEN      1319/mysqld     
...
tcp6       0      0 10.0.0.2:3306           10.0.0.3:57128          ESTABLISHED 1319/mysqld 

Slave:

ubuntu@ip-10-0-0-3:~$ sudo netstat -plantu
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
...
tcp        0      0 10.0.0.3:57128          10.0.0.2:3306           ESTABLISHED 3576/mysqld     
...
tcp6       0      0 :::3306                 :::*                    LISTEN      3576/mysqld

This would mean that replication happens over "tcp6" even though plain telnet does not work.

So now, how do I block ipv6 connections if not through ufw?

Was it helpful?

Solution

The excellent support techs at AWS clarified the issue.

As far as I understood, the tcp6 entries were a red herring and caused due to netstat using AF_INET6 sockets which would have been using IPv6 had the address been of the type ::ffff:1.2.3.4, 1.2.3.4 being the corresponding IPv4 address. However that is not the case here.

Here, replication continued even though I blocked the slave using master's ufw firewall because the ufw rule I added only blocks new connections and does nothing to disrupt ESTABLISHED ones.

So to simulate network connectivity issues, you have to somehow kill the established connections - which is not as easy as say, kill for processes, because you need to use tcpkill or other scripts knowing that tcpkill comes with the dsniff package (which has other network scanning / testing and MITM tools and should be used very carefully), or, use iptables rules to drop packets on the master like so:

# iptables -I INPUT 1 -s SLAVE_IP -j DROP

and then later when done with testing, delete the rule by:

# iptables -D INPUT 1

Alternatively, you could stop and start both master and slave after deleting the ufw rule that allowed replication - given you can stop master (obviously do not test in production).

So this has nothing to do with IPv6 - except that MySQL listens to IPv6 too as the :::3306 lines above show. But since I haven't configured IPv6 on my servers or network, it is not being used.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top