Contradictory reports from MySQL that a binlog file exists

https://dba.stackexchange.com/questions/140158

02-10-2020
|

Question

Our replication recently stopped working. I submitted a question but was unsuccessful in fixing the problem (Error 1236 - "Could not find first log file name in binary log index file"). Since the occurrence of the events described in the question, I've executed a RESET MASTER on the master, a RESET SLAVE on the slave, and created a new full dump, but am still getting the same results.

I thought I'd take a step back and try taking a different approach, attacking the specifics of the error message itself.

When I execute SHOW SLAVE STATUS on the slave, it reports that the master cannot "find first log file name":

Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'

The "first log file name" is /var/lib/mysql/mysql-bin.000001:

root@master [905 13:30:56 /var/lib/mysql]# cat mysql-bin.index
/var/lib/mysql/mysql-bin.000001
/var/lib/mysql/mysql-bin.000002
/var/lib/mysql/mysql-bin.000003
/var/lib/mysql/mysql-bin.000004
/var/lib/mysql/mysql-bin.000005
/var/lib/mysql/mysql-bin.000006

MySQL itself seems to be aware that mysql-bin.000001 is the first binary log:

MariaDB [(none)]> SHOW BINARY LOGS;
+------------------+------------+
| Log_name         | File_size  |
+------------------+------------+
| mysql-bin.000001 |      10421 |
| mysql-bin.000002 | 1073919628 |
| mysql-bin.000003 | 1074488806 |
| mysql-bin.000004 | 1073744707 |
| mysql-bin.000005 | 1074366770 |
| mysql-bin.000006 | 1069984818 |
+------------------+------------+
6 rows in set (0.00 sec)

mysqlbinlog shows that this file seems to be accessible and valid:

root@master [911 13:48:04 /var/lib/mysql]# mysqlbinlog /var/lib/mysql/mysql-bin.000001
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#160526 12:24:14 server id 5  end_log_pos 248   Start: binlog v 4, server v 10.0.21-MariaDB-log created 160526 12:24:14 at startup

MySQL itself also seems to have access to this first log file:

MariaDB [(none)]> SHOW BINLOG EVENTS in 'mysql-bin.000001' from 0 limit 4;
+------------------+-----+-------------------+-----------+-------------+------------------------------------------------+
| Log_name         | Pos | Event_type        | Server_id | End_log_pos | Info                                           |
+------------------+-----+-------------------+-----------+-------------+------------------------------------------------+
| mysql-bin.000001 |   4 | Format_desc       |         5 |         248 | Server ver: 10.0.21-MariaDB-log, Binlog ver: 4 |
| mysql-bin.000001 | 248 | Gtid_list         |         5 |         273 | []                                             |
| mysql-bin.000001 | 273 | Binlog_checkpoint |         5 |         312 | mysql-bin.000001                               |
| mysql-bin.000001 | 312 | Gtid              |         5 |         350 | BEGIN GTID 0-5-1                               |
+------------------+-----+-------------------+-----------+-------------+------------------------------------------------+
4 rows in set (0.00 sec)

Question

To recap, MySQL is complaining that it "Could not find first log file name in binary log index file". However, as shown above, it appears that MySQL does know what the "first log file name" is, and it can, indeed, access it.

What else should I check to ensure that MySQL can actually "find [the] first log file name"?

Solution

As it turns out, the problem was that we were connecting to an old (wrong) master. The previous master was domain.com, but many months ago, we migrated to db.domain.com. It seems that we were erroneously connecting to domain.com.

We’re using autossh to set up SSH tunnels for our replication. To confirm that we were connecting to the old (wrong) master, I executed telnet 127.0.0.1:3305 on the slave. From this, I could see that the reported MySQL version matched the version from the old master. I did a full reboot of the slave which seems to have fixed the problem. I think the root of the problem was likely due to some DNS quirks on our network - it would seem that when the autossh connection was established, db.domain.com resolved to domain.com, hence the symptoms of connecting to the wrong master. (Certainly, these DNS issues on our network are something to look into)

Note: Major thanks to @Andrew for his answer on the original question: https://dba.stackexchange.com/a/140259/55530

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange