mysql zero downtime ALTER table sorting (percona, openark, etc)

https://stackoverflow.com/questions/22453773

15-06-2023
|

Question

I need to sort a huge MyISAM table in a certain way so that SELECTs are faster under certain conditions.

Please note that this question is about how to do an ALTER table with zero downtime for SORTING the table in some specific column order. It is not a dup question of other questions asking about the more general case.

A simple way to achieve this is doing something like this:

ALTER table mytable ORDER BY col1, col2;

We may also use myisamchk --sort-records to achieve the same result.

In any case, both approaches let us to do very quick:

SELECT * WHERE col1=x order by col2;

Note that this is not a problem with the index but with fetching large amounts of ordered data from the table.

So far that ALTER has been working well. The problem now is that the ALTER command is slow and it locks the DB.

I believe we may use percona or openark tools for doing the same operation. Something like this:

pt-online-schema-change --alter "ENGINE=MyISAM, ORDER BY col1, col2" D=mydatabase,t=mytable -u root --dry-run

This internally creates a new table copies it and then moves names. It is pretty well documented.

However I'm not sure if/how percona will honour the "ORDER BY". I cannot see anything happening in the dry-run logs (but this may be normal). And this is not explained in the documentation.

Does anyone know how will percona ORDER BY the table?

Will it do the ordering on the new table (_mytable_new) after mytable is copied and before renaming?
Will it do the ordering during the copy from mytable as in "INSERT INTO _mytable_new SELECT * FROM mytable ORDER BY col1, col2"?
Or perhaps "ORDER BY" will never be done?

EDIT: I launched PTDEBUG=1 ./pt-online-schema-change --alter "ENGINE=MyISAM, ORDER BY col1, col2" on the tests server.

After checking the logs I found out that "ORDER BY" is not being applied... Any ideas? Does openark permit to do so?

Thanks!

Solution

There's a way to achieve an ORDER BY with pt-online-schema-change.

First, make sure you have an index on the column you want to ORDER BY. Then use percona tool with option "--chunk-index" so that the chosen index is used to fetch the rows in the original table.

There's a problem though. Percona won't be able to fetch rows when indices have poor selectivity. In that case, create a composite index using the column you need to sort + ID for example (or any other column with high cardinality). Will be slow but may be a way yo get online rows sorted.

I got a boost of 10x with a large table of 100M rows which was very fragmented. OPTIMIZE table without column sorting did not improve the situation since values where randomly distributed in a table of 8GB. I hope this finding helps others.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow