MySQL Error 1062 after reboot node in multi master replication
-
09-03-2021 - |
Question
I setup 2 MySQL Community Server (8.0.20) with a replication master to master. Today one of the nodes went down, and when it went up, the replication failed. I tried to just ignore a registry with SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;
, but the errors keep coming in.
This is the status log of the particular node that failed:
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: database_1
Master_User: replicador
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000122
Read_Master_Log_Pos: 217195469
Relay_Log_File: 33afd376b907-relay-bin.000105
Relay_Log_Pos: 2829
Relay_Master_Log_File: mysql-bin.000122
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1062
Last_Error: Could not execute Write_rows event on table gc57125800.ate_logs; Duplicate entry '49328847' for key 'ate_logs.PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000122, end_log_pos 188792316
Skip_Counter: 0
Exec_Master_Log_Pos: 188791943
Relay_Log_Space: 28407134
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1062
Last_SQL_Error: Could not execute Write_rows event on table gc57125800.ate_logs; Duplicate entry '49328847' for key 'ate_logs.PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000122, end_log_pos 188792316
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_UUID: bef45e1b-99d6-11ea-a355-3e2547e4f083
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 201029 17:59:51
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
Master_public_key_path:
Get_master_public_key: 0
Network_Namespace:
1 row in set (0.00 sec)
Not sure what I should do. Should I dump the another master and do it all over again? Do I need to do this every time a server fails?
Solution
You should use pt-table-checksum to see the differences between the source and the replica. If the differences are many, importing data again from the source could be the fastest solution. Otherwise, you could restart MySQL with --slave-skip-errors=1062
, fix the errors with pt-table-sync, and then restart MySQL normally.
To understand how these tools work and how they are used, take a look at this article: MySQL replication primer with pt-table-checksum and pt-table-sync.
You may also want to take a look at another tool called twindb-table-compare, which is complementary to pt-table-checksum. It reads the diffs generated by pt-table-checksum on the source and on the replica, and prints them in a format that is easier to understand, similar to the output of Linux utility diff
.
OTHER TIPS
First stop traffic from restarted node and enable slave-skip-errors 1062 on same node. Once in sync then you should use pt-table-checksum which will provide difference between tables.
Then you need to take action accordingly.