Question

This is the 5th time. It happens once a week (Tuesday or Wednesday within 03:00-07:00 UTC+0). On the console, it shows available but inaccessible. We try to wait if the instance will recover itself, after ~30 min nothing happens. So I reboot it manually, then it came online again after rebooting (~5 min).

It would be helpful to know what actually went wrong. This is only a dev server with few users and records.

Engine: Aurora MySQL 5.7.12
DB instance class: db.t2.small
Backup time: 16:00-16:30 UTC+0
Maintenance time: sun:17:00-sun:17:30 UTC+0

Below is the only list of available logs after rebooting the instance.

error/mysql-error-running.log.2018-07-24.03 Tue Jul 24 11:14:06 GMT+800 2018    11.8 kB
error/mysql-error-running.log.2018-07-24.04 Tue Jul 24 11:30:00 GMT+800 2018    285.5 kB
error/mysql-error-running.log.2018-07-24.05 Tue Jul 24 12:30:00 GMT+800 2018    31.1 kB
error/mysql-error-running.log.2018-07-24.06 Tue Jul 24 13:30:00 GMT+800 2018    31.8 kB
error/mysql-error-running.log.2018-07-24.07 Tue Jul 24 14:30:00 GMT+800 2018    32.9 kB
error/mysql-error-running.log.2018-07-24.08 Tue Jul 24 15:30:00 GMT+800 2018    29 kB
error/mysql-error-running.log.2018-07-24.09 Tue Jul 24 16:30:00 GMT+800 2018    32.1 kB
error/mysql-error-running.log.2018-07-24.10 Tue Jul 24 17:30:00 GMT+800 2018    27.5 kB
error/mysql-error-running.log.2018-07-24.11 Tue Jul 24 18:30:00 GMT+800 2018    31.7 kB
error/mysql-error-running.log.2018-07-24.12 Tue Jul 24 19:30:00 GMT+800 2018    27.1 kB
error/mysql-error-running.log.2018-07-24.13 Tue Jul 24 20:30:00 GMT+800 2018    22.4 kB
error/mysql-error-running.log.2018-07-24.14 Tue Jul 24 21:30:00 GMT+800 2018    22.8 kB
error/mysql-error-running.log.2018-07-24.15 Tue Jul 24 22:30:00 GMT+800 2018    24.7 kB
error/mysql-error-running.log.2018-07-24.16 Tue Jul 24 23:30:00 GMT+800 2018    24.7 kB
error/mysql-error.log   Wed Jul 25 00:34:45 GMT+800 2018    2.6 kB
external/mysql-external.log Wed Jul 25 00:30:00 GMT+800 2018    7.6 kB

external/mysql-external.log

/rdsdbbin/oscar/bin/mysqld, Version: 5.7.12 (MySQL Community Server (GPL)). started with:
Tcp port: 3306 Unix socket: /tmp/mysql.sock
Time,ServerHost,User,UserHost,Command,Payload
/rdsdbbin/oscar/bin/mysqld, Version: 5.7.12 (MySQL Community Server (GPL)). started with:
Tcp port: 3306 Unix socket: /tmp/mysql.sock
Time,ServerHost,User,UserHost,Command,Payload
/rdsdbbin/oscar/bin/mysqld, Version: 5.7.12 (MySQL Community Server (GPL)). started with:
Tcp port: 3306 Unix socket: /tmp/mysql.sock
Time,ServerHost,User,UserHost,Command,Payload
----------------------- END OF LOG ----------------------

error/mysql-error-running.log.2018-07-24.03 shows: https://pastebin.com/ywmXLR5g.

error/mysql-error-running.log.2018-07-24.04 shows: https://pastebin.com/g1dkR6rj.

error/mysql-error-running.log.2018-07-24.18 shows: https://pastebin.com/g0aAXfaT.

All other logs shows nothing(see photo).

enter image description here

Event Logs

July 24, 2018 at 11:14:14 AM UTC+8  DB instance restarted
July 24, 2018 at 11:13:31 AM UTC+8  Error restarting mysql: Engine bootstrap failed with no mysqld process running...
July 24, 2018 at 11:12:01 AM UTC+8  Recovery of the DB instance is complete.
July 24, 2018 at 11:04:26 AM UTC+8  Recovery of the DB instance has started. Recovery time will vary with the amount of data to be recovered.

CPU Utilization (07-24-2018) enter image description here

CPU Utilization (07-11-2018 to 07-24-2018) enter image description here

Was it helpful?

Solution

Special thanks to @WilsonHauck. After 4 weeks of monitoring, Manually upgrading Aurora to the latest version solves the issue.

There have been several bugfixes addressing unexpected restarts on 2.01.1. https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/AuroraMySQL.Updates.20Updates.html

To manually upgrade your Aurora:

  1. Go to RDS - AWS Console
  2. Navigate to Clusters
  3. Select your cluster
  4. Click Actions >> Upgrade now

OTHER TIPS

Had this problem with Aurora 5.7 as well over the weekend, at migration point to boot! AWS support said to disable "Performance Insights" as there's a "software defect" that internal teams are "actively" working on. No restarts so far.

As far as performance is concerned, compared to our instance-based clustered perconas, RDS Aurora MySQL is significantly slower: about 10% slower (this is based on replicated data and benchmarking similar long running reports) regardless of RDS server type or size (we tried bigger instance types just to be sure, same result).

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top