Galera MariaDB not joining with IST

https://dba.stackexchange.com/questions/133920

01-10-2020
|

Question

Until now, I have a single MariaDB 10.1 node running as Galera Cluster. So, now I want to join the next one.

systemctl start mysql

initiates the snapshot state transfer (which takes about 10 hours for 200 GBytes).

The config of donor and joiner are quite similar:

[client]
port        = 3306
socket      = /var/run/mysqld/mysqld.sock
[mysqld_safe]
socket      = /var/run/mysqld/mysqld.sock
nice        = 0
[mysqld]
user        = mysql
pid-file    = /var/run/mysqld/mysqld.pid
socket      = /var/run/mysqld/mysqld.sock
port        = 3306
basedir     = /usr
datadir     = /var/lib/mysql
tmpdir      = /tmp
lc_messages_dir = /usr/share/mysql
lc_messages = en_US
skip-external-locking
secure_auth = off
skip-name-resolve
bind-address        = 0.0.0.0
max_connections     = 100
connect_timeout     = 5
wait_timeout        = 600
max_allowed_packet  = 16M
thread_cache_size       = 128
sort_buffer_size    = 4M
bulk_insert_buffer_size = 16M
tmp_table_size      = 32M
max_heap_table_size = 32M
myisam_recover_options = BACKUP
key_buffer_size     = 128M
table_open_cache    = 400
myisam_sort_buffer_size = 512M
concurrent_insert   = 2
read_buffer_size    = 2M
read_rnd_buffer_size    = 1M
query_cache_limit       = 1G
query_cache_size        = 200M
log_warnings        = 2
slow_query_log=1
slow_query_log_file = /var/log/mysql/mariadb-slow.log
long_query_time = 3
log_slow_verbosity  = query_plan
server-id       = 103
log_bin         = /var/log/mysql/mariadb-bin
log_bin_index       = /var/log/mysql/mariadb-bin.index
expire_logs_days    = 3
max_binlog_size         = 100M
binlog_format       = ROW
default_storage_engine  = InnoDB
relay_log       = /var/log/mysql/relay-bin
relay_log_index = /var/log/mysql/relay-bin.index
relay_log_info_file = /var/log/mysql/relay-bin.info
log_slave_updates
default_storage_engine  = InnoDB
innodb_buffer_pool_size = 110G
innodb_log_buffer_size  = 8M
innodb_file_per_table   = 1
innodb_file_format  = Barracuda
innodb_open_files   = 400
innodb_io_capacity  = 400
innodb_flush_method = O_DIRECT
innodb_autoinc_lock_mode=2
[galera]
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="dbcluster1"
wsrep_cluster_address="gcomm://"
wsrep_node_name="db1"
wsrep_node_address="db1"
wsrep_replicate_myisam=on
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=sstuser:pa55w0rt
[sst]
streamfmt=tar
transferfmt=nc
rlimit=3000k
bind-address=0.0.0.0
[mysqldump]
quick
quote-names
max_allowed_packet  = 16M
[mysql]
[isamchk]
key_buffer      = 16M

On the second node, db1 is replaced by db2, it has another server-idand

wsrep_cluster_address="gcomm://db1,db2"

After doing the SST, the server shuts down cleanly, and the position is stored in /var/lib/mysql/xtrabackup_binlog_pos_innodb. However, when I start the server again,

systemctl start mysql

it does not perform an incremental state transfer, but again flushes the entire data directory /var/lib/mysql and starts the SST again.

What do I have to do to join the cluster correctly (with incremental state transfer)?

Solution

My own hint with the Write Set Cache Size (aka Galera Cache Size) nailed it (I should have asked earlier, it seems some supernatural voice told me the answer.)

However I do not quite understand why 15 GBytes were too few for 45 KBytes/sec for an 18 hours transfer after a next try, obviously it was only a bit too short... (In my question I wrote about 10 hours, which I found out was not true).

Changing the transfer method to rsync finally fixed it as the transfer took only 1 hour.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange