Amazon RDS instance recovery during mysqldump

https://dba.stackexchange.com/questions/245775

07-02-2021
|

Question

I have a db.t2.small read replica, which crashes every time I run mysqldump.

The issues started after I made some table changes - my main suspicion is the use of InnoDB.

This replica is used as a backup (incase the main server was to fail); and allowed me to run mysqldump every night, so I can create an off-site backup without effecting the main server.

AWS now goes though the recovery process when I use mysqldump.

01:53:13 - Recovery of the DB instance has started. Recovery time will vary with the amount of data to be recovered.
01:57:11 - DB instance restarted
01:57:36 - Recovery of the DB instance is complete.

Annoying the error logs are pretty much empty, so I have no idea what's causing it to fail.

The largest tables are:

788 MB = report_data
410 MB = report_data_log
418 MB = email_log_boy

The report_data table used to contain 36 TEXT fields (an example of more fields being added over time), and used MyISAM.

I changed this to use a simpler 1 TEXT field per row (an report_id, field_name, field_value, created_date, author_id), and it's now setup to use InnoDB.

The overall table size has decreased a bit.

The report_data_log table is basically the same table structure, but with a date to record when each value was replaced (as in, on editing, the record is copied from the report_data table, so I have a history of what's changed in the last 3 months).

I export each table individually with:

/usr/bin/mysqldump --defaults-extra-file=/backup/database.cnf --skip-extended-insert --skip-dump-date db report_data_log

I have tried adding --single-transaction, but that hasn't changed anything.

The config file just contains the username, password, host, ssl-ca, and sets ssl-verify-server-cert to true.

The /etc/mysql/conf.d/mysqldump.cnf file contains:

[mysqldump]
quick
quote-names
max_allowed_packet = 1024M

The server configuration hasn't been changed recently, where it includes:

max_allowed_packet: 1073741824
net_read_timeout: 3600
net_write_timeout: 3600

Oddly, it's able to export the 788MB report_data table, but crashes after exporting ~200MB of the 410MB report_data_log table.

Using db.t2.medium will probably fix this issue, but it doubles the cost - which seems a little silly considering the main server is fine, and both are idle most of the time (never appears to exceed 25% CPU, average 1.5%; or running low on Freeable Memory).

I'm tempted to switch back to MyISAM, but was wondering if there are any better AWS RDS logs to find out what's happened, or if there is something different about InnoDB that I'm missing?

As an interesting aside, the "db.t2.small" MariaDB instance provides 2GB of RAM.

The parameter group sets "innodb_buffer_pool_size" to "{DBInstanceClassMemory*3/4}".

But when I run:

SHOW GLOBAL VARIABLES LIKE "innodb_buffer_pool_size";

It returns 2147483648... so it's being set to 2GB, not 1.5GB.

Solution

This is a confirmed bug with Amazon RDS.

Reply from Amazon on the 6th Jan 2020:

"Starting with MariaDB 10.2.2 resizing the buffer pool is performed in chunks determined by the size of the innodb_buffer_pool_chunk_size variable. Hence the buffer pool size must be a multiple of innodb_buffer_pool_chunk_size * innodb_buffer_pool_instances, since innodb_buffer_pool_size is set to 1.5 GB the behavior is to round off to 1GB as the "innodb_buffer_pool_instances" is set to 8."

Original Answer:

On the "db.t2.small" MariaDB instance, it has 2GB of RAM, but innodb_buffer_pool_size is being incorrectly set to 2GB.

This means MariaDB will crash when the InnoDB buffer pool fills with too much data.

The temporary solution is to change the Parameter Group so:

innodb_buffer_pool_instances = 12

If you want to get an ETA from Amazon, ask them to check Case 6411919871, from the 5th September 2019.

I did have to eventually pay them $29 to raise a "Technical" bug report, the original report was raised in August, both with Accounts, and on their Forum:

https://forums.aws.amazon.com/thread.jspa?messageID=913625

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange