Question

We have a table with 600M+ records and using AWS Aurora with master-slave replication.

Requirement: We need to add two columns with index in customer table.

Aurora Instance Specs: Ram: 30.5 GB CPU: 4 vCPU

I replicated the cluster and created instance with same configuration and ran the update process to newly created master instance to see how it works. But it's taking a lot time; ( already been 4 days ) and still running. Is there anything I can do to speedup the alter process?

Command:

pt-online-schema-change --execute --ask-pass --user=admin --statistics --chunk-time=1 --alter 'ADD COLUMN email_sha1 CHAR(40) DEFAULT null, ADD COLUMN email_sha256 CHAR(64) DEFAULT null, ADD INDEX email_sha1(email_sha1) USING BTREE, ADD INDEX email_sha256(email_sha256) USING BTREE'  D=example,t=customer,h=example.public.rds.amazonaws.com
Was it helpful?

Solution

You are taking a hash of a phone number? Why? Indexing the phone number is arguably better than building a hash and indexing that. For one thing, the phone number is always(?) shorter than the hash.

Kill the ALTER and rethink the task.

The likely reason for the problem is the randomness of the column being indexed -- Toward the end of the process, it will be slowing down to about 1 disk hit per row. If you estimate 1ms per disk hit, I would estimate that it might take a week to finish.

What version are you running? Can it add an index with ALTER=INPLACE? That might let you (1) add the columns, then (2) in a separate ALTER do the ADD INDEX with ALTER=INPLACE, and not have to worry about it taking a week.

Still I question the need for the columns or the index.

If you need to discuss further, please provide SHOW CREATE TABLE.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top