MySQL Error 1062 after reboot node in multi master replication

https://dba.stackexchange.com/questions/278892

09-03-2021
|

Question

I setup 2 MySQL Community Server (8.0.20) with a replication master to master. Today one of the nodes went down, and when it went up, the replication failed. I tried to just ignore a registry with SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;, but the errors keep coming in.

This is the status log of the particular node that failed:

*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: database_1
                  Master_User: replicador
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000122
          Read_Master_Log_Pos: 217195469
               Relay_Log_File: 33afd376b907-relay-bin.000105
                Relay_Log_Pos: 2829
        Relay_Master_Log_File: mysql-bin.000122
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1062
                   Last_Error: Could not execute Write_rows event on table gc57125800.ate_logs; Duplicate entry '49328847' for key 'ate_logs.PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000122, end_log_pos 188792316
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 188791943
              Relay_Log_Space: 28407134
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1062
               Last_SQL_Error: Could not execute Write_rows event on table gc57125800.ate_logs; Duplicate entry '49328847' for key 'ate_logs.PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000122, end_log_pos 188792316
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
                  Master_UUID: bef45e1b-99d6-11ea-a355-3e2547e4f083
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 201029 17:59:51
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
       Master_public_key_path: 
        Get_master_public_key: 0
            Network_Namespace: 
1 row in set (0.00 sec)

Not sure what I should do. Should I dump the another master and do it all over again? Do I need to do this every time a server fails?

Solution

You should use pt-table-checksum to see the differences between the source and the replica. If the differences are many, importing data again from the source could be the fastest solution. Otherwise, you could restart MySQL with --slave-skip-errors=1062, fix the errors with pt-table-sync, and then restart MySQL normally.

To understand how these tools work and how they are used, take a look at this article: MySQL replication primer with pt-table-checksum and pt-table-sync.

You may also want to take a look at another tool called twindb-table-compare, which is complementary to pt-table-checksum. It reads the diffs generated by pt-table-checksum on the source and on the replica, and prints them in a format that is easier to understand, similar to the output of Linux utility diff.

OTHER TIPS

First stop traffic from restarted node and enable slave-skip-errors 1062 on same node. Once in sync then you should use pt-table-checksum which will provide difference between tables.

Then you need to take action accordingly.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange