We are experiencing frequent outages with our AWS Aurora MySQL RDS (version 8.0.36)
cluster since upgrading from MySQL 5.7
. Over the past two months, the production database has gone down three times, and each time recovery has been difficult.
Env Details
Version: 8.0.36
Cluster Configuration: RDS Proxy in front of the DB cluster
Upgrade Path: MySQL 5.7 → MySQL 8.0.36
Steps taken
- Added DB Proxy: Implemented a DB proxy in front of the DB cluster to manage connections.
- Reached Out to AWS Support: Engaged AWS support for assistance.
- Disabled Parallel Query: Turned off ParallelQuery.
- Query Optimization: Analyzed and optimized heavy queries (does not
appear to be a query-related issue).
Problem
When the issue occurs, the database enters a failing state, showing the following behaviors:
Cannot be recovered without intervention.
Logs indicate an assertion failure in log0grover.cc.
InnoDB initialization fails, suggesting potential corruption in the InnoDB tablespace.
Log
241220 9:45:23 server_audit: Audit STARTED.
Found DAS config file, trying to load DAS switcher from DAS config file.
2024-12-20 09:45:23 70369552721040:[DAS][INFO]: Calculated persistence threads 4
aurora_enable_das:0
241220 9:45:23 server_audit: server_audit_incl_users set to ''.
241220 9:45:23 server_audit: server_audit_excl_users set to ''.
2024-12-20T09:45:23.681788Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. (ha_innodb.cc:14071)
2024-12-20T09:45:23.681848Z 1 [Note] [MY-013547] [InnoDB] Atomic write disabled (ha_innodb.cc:14095)
2024-12-20T09:45:23.681884Z 1 [Note] [MY-012932] [InnoDB] PUNCH HOLE support available (srv0start.cc:2138)
2024-12-20T09:45:23.681902Z 1 [Note] [MY-012944] [InnoDB] Uses event mutexes (srv0start.cc:2183)
2024-12-20T09:45:23.681913Z 1 [Note] [MY-012945] [InnoDB] GCC builtin __atomic_thread_fence() is used for memory barrier (srv0start.cc:2184)
2024-12-20T09:45:23.681945Z 1 [Note] [MY-012948] [InnoDB] Compressed tables use zlib 1.2.13 (srv0start.cc:2202)
2024-12-20T09:45:23.684704Z 1 [Note] [MY-012951] [InnoDB] Using hardware accelerated crc32 and polynomial multiplication. (srv0start.cc:2234)
2024-12-20T09:45:23.685080Z 1 [Note] [MY-012203] [InnoDB] Directories to scan './' (srv0start.cc:2272)
2024-12-20T09:45:23.686379Z 1 [Note] [MY-012955] [InnoDB] Initializing buffer pool, total size = 19.503906G, instances = 4, chunk size =4.875977G (srv0start.cc:2355)
(buf0buf.cc:1803)
(buf0buf.cc:1803)
(buf0buf.cc:1803)
(buf0buf.cc:1803)
(buf0buf.cc:1803)
(buf0buf.cc:1803)
(buf0buf.cc:1803)
(buf0buf.cc:1803)
(buf0buf.cc:1803)
(buf0buf.cc:1803)
(buf0buf.cc:1803)
(buf0buf.cc:1803)
(buf0buf.cc:1803)
(buf0buf.cc:1803)
(buf0buf.cc:1803)
2024-12-20T09:45:29.083910Z 1 [ERROR] [MY-013183] [InnoDB] Assertion failure: log0grover.cc:807:0 thread 70370132519312 (ut0dbg.cc:58)
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
2024-12-20T09:45:29Z UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=569da3248b99719cc62f4997c36af088a01878a6
Thread pointer: 0x400043494000
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 400052c060b8 thread_stack 0x40000
/rdsdbbin/oscar/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x2c) [0x2b76b2c]
/rdsdbbin/oscar/bin/mysqld(print_fatal_signal(int)+0x37c) [0x1473fdc]
/rdsdbbin/oscar/bin/mysqld(my_server_abort()+0x98) [0x14745d8]
/rdsdbbin/oscar/bin/mysqld(my_abort()+0x14) [0x2b6ced4]
/rdsdbbin/oscar/bin/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x2b0) [0x2ea0570]
/rdsdbbin/oscar/bin/mysqld(onGroverErrorImpl(int, char const*, int)+0x244) [0x10395e4]
/rdsdbbin/oscar/bin/mysqld(log_grover_open(char const*, char const*, char const*, char const*, char const*, char const*, std::vector<engineShmSegAddrName, std::allocator<engineShmSegAddrName> >&)+0x14c8) [0x2ed1a48]
/rdsdbbin/oscar/bin/mysqld(srv_start(bool)+0x1698) [0x2e59f38]
/rdsdbbin/oscar/bin/mysqld() [0x2cfbcdc]
/rdsdbbin/oscar/bin/mysqld(dd::bootstrap::DDSE_dict_init(THD*, dict_init_mode_t, unsigned int)+0xf8) [0x280bfd8]
/rdsdbbin/oscar/bin/mysqld(dd::upgrade_57::do_pre_checks_and_initialize_dd(THD*)+0x104) [0x2b3f6e4]
/rdsdbbin/oscar/bin/mysqld() [0x15b6b68]
/rdsdbbin/oscar/bin/mysqld() [0x30fe508]
/lib64/libpthread.so.0(+0x7230) [0x40002fde0230]
/lib64/libc.so.6(+0xdb7dc) [0x4000302117dc]
Has anyone encountered this issue with Aurora MySQL 8.0.x?
Are there recommended steps to verify and fix InnoDB corruption in this specific context?
Any insights on whether upgrading to a more recent MySQL 8.0.x version could address this?
I can provide more logs if needed
1