Group Replication plugin crashes MySQL 8.0

https://dba.stackexchange.com/questions/227199

20-01-2021
|

Question

Problem description:

INSTALL PLUGIN group_replication SONAME 'group_replication.so';

results in

ERROR 2013 (HY000): Lost connection to MySQL server during query

Specs:

MySQL version: 8.0.13 MySQL Community Server
OS: Ubuntu 16.04.3 LTS (Xenial Xerus) Plugin source: Included with MySQL 8 CE
MySQL configuration:

/etc/mysql/my.cnf
!includedir /etc/mysql/conf.d/
!includedir /etc/mysql/mysql.conf.d/

[mysqld]

# Enable logging
general_log = on
general_log_file=/var/log/mysql/mysql.log
log_error=/var/log/mysql/mysql_error.log

# General replication settings
gtid_mode = ON
enforce_gtid_consistency = ON
master_info_repository = TABLE
relay_log_info_repository = TABLE
binlog_checksum = NONE
log_slave_updates = ON
log_bin = binlog
relay_log = relaylog
binlog_format = ROW
transaction_write_set_extraction = XXHASH64
loose-group_replication_bootstrap_group = OFF
loose-group_replication_start_on_boot = ON
loose-group_replication_ssl_mode = REQUIRED
loose-group_replication_recovery_use_ssl = 1

# Shared replication group configuration
loose-group_replication_group_name = "9dc4e512-e745-11e8-961c-020000db4596"
loose-group_replication_ip_whitelist = "myserver.domainname.com"
loose-group_replication_group_seeds = "myserver.domainname.com:33061"

# Single or Multi-primary mode? Uncomment these two lines
# for multi-primary mode, where any host can accept writes
loose-group_replication_single_primary_mode = OFF
loose-group_replication_enforce_update_everywhere_checks = ON

# Custom thresholds
net_read_timeout=3600
net_write_timeout=3600
connect_timeout=300
max_connections=451
max_connect_errors=400

# Host specific replication configuration
server_id = 2
bind-address = "0.0.0.0"
report_host = "myserver.domainname.com"
loose-group_replication_local_address = "myserver.domainname.com:33061"

What I know about it/have tried

Increased the net_read and net_write timeouts to 3600 as featured
Increased server RAM to 4GB
Rebuilt the entire server and MySQL installation from scratch - issue persists
The setup used to work and only has been discovered broken recently. There are automatic updates in place.
Have read the documentation on gone away and crashing
It shouldn't be due a timeout or network issues, as there is evidence of a crash as follows:

Crash report:

2019-01-14T16:11:55.583514Z 8 [ERROR] [MY-013173] [Repl] Plugin group_replication reported: 'The plugin encountered a critical error and will abort: Fatal error during execution of Group Replication group joining process'
16:11:55 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=3
max_threads=451
thread_count=3
connection_count=2
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 186365 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fc9e8000b50
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fca38148db0 thread_stack 0x46000
/usr/sbin/mysqld(my_print_stacktrace(unsigned char*, unsigned long)+0x3d) [0x1c6677d]
/usr/sbin/mysqld(handle_fatal_signal+0x4c1) [0xe28501]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390) [0x7fca53fcc390]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38) [0x7fca524be428]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a) [0x7fca524c002a]
/usr/lib/mysql/plugin/group_replication.so(abort_plugin_process(char const*)+0x1cf) [0x7fca00ff3a1f]
/usr/lib/mysql/plugin/group_replication.so(initialize_plugin_and_join(enum_plugin_con_isolation, Delayed_initialization_thread*)+0x399) [0x7fca00fefdc9]
/usr/lib/mysql/plugin/group_replication.so(plugin_group_replication_start(char**)+0x1ac9) [0x7fca00ff2259]
/usr/lib/mysql/plugin/group_replication.so(plugin_group_replication_init(void*)+0xd00) [0x7fca00ff3120]
/usr/sbin/mysqld() [0xd26050]
/usr/sbin/mysqld() [0xd32809]
/usr/sbin/mysqld(Sql_cmd_install_plugin::execute(THD*)+0x1a) [0xd32bfa]
/usr/sbin/mysqld(mysql_execute_command(THD*, bool)+0x85c) [0xd0594c]
/usr/sbin/mysqld(mysql_parse(THD*, Parser_state*, bool)+0x3dc) [0xd0ac1c]
/usr/sbin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x2f2f) [0xd0df8f]
/usr/sbin/mysqld(do_command(THD*)+0x1a8) [0xd0ec88]
/usr/sbin/mysqld() [0xe17dd8]
/usr/sbin/mysqld() [0x1d2165f]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fca53fc26ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fca5259041d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fc9e8009488): is an invalid pointer
Connection ID (thread ID): 8
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
2019-01-14T16:11:56.137462Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.13) starting as process 23085

I'm a bit stuck as to what to do, so any suggestions would be appreciated.

Solution

There is an option in Group Replication for group_replication_exit_state_action which determines what the database does when a server leaves the group involuntarily:

https://dev.mysql.com/doc/refman/8.0/en/group-replication-options.html

The default is to ABORT_SERVER (e.g. shutdown), but this can be changed to Read_Only to prevent the database from shutting down.

OTHER TIPS

This is a MySQL bug. This, like any crash/backtrace, should be reported at bugs.mysql.com. I couldn't find an existing bug report.

If you don't report it its not likely to get fixed, at least not as soon as you hope.

If you want to use Group Replication I really recommend you to set it up via MySQL Shell. Aslo as now it's possible to use SET PERSIST, I would recommend to avoid the use of 'loop-' in my.cnf.

Like @iGGT mentioned, this is likely due to the default value of the "group_replication_exit_state_action" variable. That variable allows you to configure what happens when a server involuntarily leaves the group. The default is ABORT_SERVER which, as you saw, aborts the process. Changing it to READ_ONLY should fix that behaviour, though you'll probably want to investigate what's causing the member to leave the group.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange