Speed up tokudb "alter table … engine=TokuDB”

https://stackoverflow.com/questions/12771932

mysql
tokudb

05-07-2021
|

Question

I'm trying to convert a 400 million row Innodb table to the tokudb engine. When I start with "alter table ... engine=TokuDB" things run really fast in the beginning, (Using SHOW PROCESSLIST) I see it reading in about 1 million rows every 10 seconds. But once I reach about 19-20 million rows, it starts to slow reading and is more like 10k rows every few seconds.

Are there any mysql or tokudb variables that affect the speed of which an ALTER TABLE to tokudb works? I tried the tmp_table_size and some others but can't seem to get past that hurdle.

Any ideas?

Solution

Here are the important variables, make sure they are set globally prior to starting the operation or locally within the session executing the storage engine change:

tokudb_load_save_space : default is off and should be left alone unless you are low on disk space.
tokudb_cache_size : if unset the TokuDB will allocate 50% of RAM for it's own caching mechanism, we generally recommend leaving this setting alone. As you are running on an existing server you need to make sure that you aren't over-committing memory between TokuDB, InnoDB, and MyISAM.

OTHER TIPS

The solution to this for me was to export "into outfile" and import "load data infile"

This was several orders of magnitude faster for me (110 million records). Everytime I modify a large tokudb database (alter table) it takes forever (~30k/sec). It has been quicker to full export and import (~500k/sec) Dropped alter table times from hours to minutes.

This is true when either converting from innodb or altering native tokudb (any alter table).

select a.*,calcfields from table1 a into outfile 'temp.txt';
create table table2 .....<br>
load data infile 'temp.txt' into table table2 (field1,field2,...);

ps: experiment with the create table with row_format=tokudb_lzma or tokudb_uncompressed). You can try 3 ways pretty quick (you need to do an OS level directory ls to see size). I find offline indexes help too.

set tokudb_create_index_online=off;
create clustering index field1 on table2(field1); (much faster)

Multiple Clustering indexes can make a world of difference when you learn when to use them.

I was using GUI tools that alter table for index changes (waiting hours each time) Hand doing this make things far more productive (I had spent days going nowhere via GUI, to done in 30 min)

Using 5.5.30-tokudb-7.0.1-MariaDB and VERY HAPPY.

Hopefully this can help others when experimenting. Obviously very late for original asker. The only existing response was not constructive at all for me. (The question was)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow