Question

I have a very large table that I want to add a timestamp column to. It is a table that gets a lot more updates than inserts. I am trying to figure out a way to do this without taking the table out of production for a significant amount of time and it has me in knots. I can do:

alter table stuff add column mod_time timestamp;

Well, I could do this and then the table is locked for 3-5 hours. Not a happy time for users.

For adding a varchar column, for example, I could create a new table, add the column, copy the data over and then replace the old table with the new. This last switch can be done in just a second or two during a slow-ish time. No problems from that. Of course, I may need to do the copy in stages. I can do one copy, and then do a copy of everything that changed after the first copy started. I can repeat this until nothing has changed and do the swap. Of course, it is much easier to determine what changed if I have a "mod_time" column.

What I really would rather to do is: (1) create the new table, (2) copy the data, (3) add the timestamp column, (4) swap the tables.

But step (3) takes, again, hours.

I want rows updated before this column was added to have a mod_time value set to NULL.

If I switch (2) and (3) in the step above, the swap is doable, but I get a mod_time = when I did this stuff to add the column. I want NULL.

I can try to set the values to NULL just before I do the swap, but of course if you update the row to set it to NULL, you are updating the row and the mod_time column gets set to the current time. :-)

I wish I could do: (1) create the new table, (2) add "mod_time" as a datetime column (2) copy the data, (3) change mod_time to a timestamp column, (4) swap the tables.

Theoretically (3) could be done very very quickly (assuming the storage for datetime and timestamp are compatible) because all I am really doing in changing the column to tmestamp is changing its behavior in the future, not its current storage. So this should take no time at all, yes? Probably not.

Just to clarify, if I do:

 alter table stuff add column mod_time timestamp;

I get:

 +----------+--------+----------+---------------------+
 |     col1 |   col2 |     col3 | mod_time            |
 +----------+----- --+----------+---------------------+
 |     5001 |     50 |     2463 | 0000-00-00 00:00:00 |
 |     5002 |     50 |     2467 | 0000-00-00 00:00:00 |
 |     5003 |     50 |     2459 | 0000-00-00 00:00:00 |

This is what I want. It is just that the alter statement will take too long. Everything else I am trying gives me:

 +----------+--------+----------+---------------------+
 |     col1 |   col2 |     col3 | mod_time            |
 +----------+----- --+----------+---------------------+
 |     5001 |     50 |     2463 | 2013-12-05 18:11:21 |
 |     5002 |     50 |     2467 | 2013-12-05 18:11:21 |
 |     5003 |     50 |     2459 | 2013-12-05 18:11:21 |
Était-ce utile?

La solution

You should use pt-online-schema-change, a tool for automating ALTER TABLE operations for MySQL, without locking the table. It's part of the Percona Toolkit, a free collection of indispensable helper tools for MySQL developers and DBA's.

$ pt-onlines-chema-change h=localhost,D=mydatabase,t=mytable \
  --alter="add column mod_time timestamp"

You can continue reading and writing the original table while pt-online-schema-change is working. When it's done, it automatically swaps the tables' names, and drops the original.

You can view a recording of a webinar about this tool by its designer. Viewing is free, but requires registration: http://www.percona.com/webinars/2012-05-02-zero-downtime-schema-changes-in-mysql

Autres conseils

How about something like:

CREATE TABLE Stuff_new LIKE Stuff;
ALTER TABLE Stuff_new ADD COLUMN mod_time TIMESTAMP NULL; -- NULL important to get column to default to NULL, not current timestamp
DELIMITER $$
CREATE TRIGGER ON Stuff_AI AFTER INSERT ON Stuff FOR EACH ROW
BEGIN
    INSERT INTO Stuff_new (c1, c2, ..., mod_time) VALUES (NEW.c1, NEW.c2, ..., NULL);
END $$
CREATE TRIGGER ON Stuff_AU AFTER UPDATE ON Stuff FOR EACH ROW
BEGIN
    INSERT INTO Stuff_new (c1,c2,...) VALUES (NEW.c1, NEW.c2, ...)
      ON DUPLICATE KEY UPDATE c1=NEW.c1, c2=NEW.c2, ..., mod_time=NULL;
END $$
CREATE TRIGGER ON Stuff_AD AFTER DELETE ON Stuff FOR EACH ROW
BEGIN
    DELETE FROM Stuff_new WHERE c1=OLD.c1;
END $$
DELIMITER ;

Basically, you're creating the new table structure in a parallel table "Stuff_new", then using triggers on Stuff to replicate the data into Stuff_new as changes are made on Stuff. After time passes, Stuff_new should approximate Stuff enough where you can do a proper copy of missing data over.

Caveat emptor: it's late, I'm tired... this is just a brainstorm! Use at own risk. :)

(Oh, and this might be helpful: ALTER TABLE ADD COLUMN takes a long time)

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top