Question

I am trying to migrate from a MySQL AWS RDS instance with a huge SSD and too much excess space down to a small one, and data migration is the only method. There are four tables in the range of 330GB-450GB and executing mysqldump, in a single thread, while piped directly to the target RDS instance is estimated to take about 24 hours by pv (copying at 5 mbps).

I wrote a bash script that calls multiple mysqldump using ' & ' at the end and a calculated --where parameter, to simulate multithreading. This works and currently takes less than an hour with 28 threads.

However, I am concerned about any potential loss of performance while querying in the future, since I'll not be inserting in the sequence of the auto_increment id columns.

Can someone confirm whether this would be the case or whether I am being paranoid for no reasons.

What solution did you use for a single table that is in the 100s of GBs? Due to a particular reason, I want to avoid using AWS DMS and definitely don't want to use tools that haven't been maintained in a while.

Was it helpful?

Solution

You are correct that it will cause fragmentation of the clustered index. However, if it is an auto-incrementing column the data wasn't really sorted by anything meaningful. You went from a unsorted mess to a differently sorted unsorted mess.

Selecting/updating/reading a few rows at a time? Not a big deal - the B-tree will still know how to find the correct page without too much additional effort.

You'll have issues if you're trying to break up large updates/deletes by using ranges of the auto-incrementing column as the rows will be spread across pages.

If performance does become an issue, you can rebuild the index, the newer versions of MySQL should be able to do so without taking the table offline.

As an aside - did you attempt sorting the data by the auto-incrementing column then performing a bulk load?

OTHER TIPS

Tables are by nature unsorted, so you will not have any performance loss on that site, after inserting your data, but we doen't how your instances is smaller, we can't tell the impact it will have.

Your index on that field will be sorted and so will find the wanted rows quite fast, at least faster as scanning the hole column.

Not a tall, there is no performance issue in any of the cause

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top