문제

So I have this audit table (tracks actions on any table in my database):

CREATE TABLE `track_table` (
  `id` int(16) unsigned NOT NULL,
  `userID` smallint(16) unsigned NOT NULL,
  `tableName` varchar(255) NOT NULL DEFAULT '',
  `tupleID` int(16) unsigned NOT NULL,
  `date_insert` datetime NOT NULL,
  `action` char(12) NOT NULL DEFAULT '',
  `className` varchar(255) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `userID` (`userID`),
  KEY `tableID` (`tableName`,`tupleID`,`date_insert`),
  KEY `actionDate` (`action`,`date_insert`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

and I need to start archiving outdated items. The table has grown to about 50million rows, so the fastest way I could delete the rows was to delete it a table at a time (based on tableName).

This works pretty well but on some of the tables that are write-heavy, it won't complete. My query deletes all items that have an associated delete action on a tupleID/tableName combination:

DELETE FROM track_table WHERE tableName='someTable' AND tupleID IN (
  SELECT DISTINCT tupleID FROM track_table
  WHERE tableName='someTable' AND action='DELETE' AND date_insert < DATE_SUB(CURDATE(), INTERVAL 30 day)
)

I let this run on my server for 3 days and it never completed for the largest table. The explain output (if I switch the delete to select:

| id | select_type        | table       | type | possible_keys      | key     | key_len | ref        | rows    | Extra                        |
|  1 | PRIMARY            | track_table | ref  | tableID            | tableID | 257     | const      | 3941832 | Using where                  |
|  2 | DEPENDENT SUBQUERY | track_table | ref  | tableID,actionDate | tableID | 261     | const,func |       1 | Using where; Using temporary |

So 4 million rows shouldn't take 3 days to delete, I would think. I have my innodb_buffer_pool_size set to 3GB, and the server is not set to use one_file_per_table. What other ways can I improve InnoDB delete performance? (Running MySQL 5.1.43 on Mac OSX)

도움이 되었습니까?

해결책

You could delete data in batches.

In SQL Server, the syntax is delete top X rows from a table. You then do it in a loop, with a transaction for each batch (if you have more than one statement, of course), so to keep transactions short and maintain locks only for short periods.

In MySQL syntax: DELETE FROM userTable LIMIT 1000

There are restrictions on that (can't use LIMIT in deletes with joins, for instance) but in this case you might be able to do it that way.

There is an additional danger to using LIMIT with DELETE when it comes to replication; the rows deleted are sometimes not deleted in the same order on the slave as it was deleted on the master.

다른 팁

Try using a temp table approach. Try something like this :

Step 1) CREATE TABLE track_table_new LIKE track_table;

Step 2) INSERT INTO track_table_new SELECT * FROM track_table WHERE action='DELETE' AND date_insert >= DATE_SUB(CURDATE(), INTERVAL 30 day);

Step 3) ALTER TABLE track_table RENAME track_table_old;

Step 4) ALTER TABLE track_table_new RENAME track_table;

Step 5) DROP TABLE track_table_old;

I did not include the tuple field in Step 2. Please see if this produces the desired effect. If this is what you want, you may want to ditch the tuple field altogether unless you use the tuple field for other reasons.

Deletion of unwanted rows in batch should keep other operation workable. But your deletion of operation has conditions, so ensure that there is an appropriate index on columns over conditions.

Because MySQL doesn't support complete function of loose index scan, you may try to adjust the sequence for KEY actionDate (action, date_insert) to KEY actionDate (date_insert, action). With the prefix of 'date_insert', the MySQL should use this index to scan the rows which are prior to your datetime condition.

With such index, you may write SQL as:

DELETE
FROM track_table
WHERE tableName='someTable'
    AND action='DELETE'
    AND date_insert < DATE_SUB(CURDATE(), INTERVAL 30 day)
LIMIT 1000 -- Your size of batch
| id | select_type        | table       | type | possible_keys      | key     | key_len | ref        | rows    | Extra                        |
|  1 | PRIMARY            | track_table | ref  | tableID            | tableID | 257     | const      | 3941832 | Using where                  |
|  2 | DEPENDENT SUBQUERY | track_table | ref  | tableID,actionDate | tableID | 261     | const,func |       1 | Using where; Using temporary |

-Fist, from your explain the key_len so big => you need to downgrade the size as small as possible. For your query I think the best way is change data type of action field from char(12) to tinyint, so the data mapping look like:

1: -> DELETE
2: -> UPDATE
3: -> INSERT
...

and you can change table_id instead tablename too. the DDL for the best performance can:

CREATE TABLE `track_table` (
  `id` int(11) unsigned NOT NULL,
  `userID` smallint(6) unsigned NOT NULL,
  `tableid` smallint(6) UNSIGNED NOT NULL DEFAULT 0,
  `tupleID` int(11) unsigned NOT NULL,
  `date_insert` datetime NOT NULL,
  `actionid` tinyin(4) UNSIGNED NOT NULL DEFAULT 0,
  `className` varchar(255) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `userID` (`userID`),
  KEY `tableID` (`tableid`,`tupleID`,`date_insert`),
  KEY `actionDate` (`actionid`,`date_insert`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `actions` (
  `id` tinyint(4) unsigned NOT NULL 
  `actionname` varchar(255) NOT NULL,
  PRIMARY KEY (`id`) 
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `table_name` (
  `id` tinyint(4) unsigned NOT NULL 
  `tablename` varchar(255) NOT NULL,
  PRIMARY KEY (`id`) 
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

so the query can running look like:

DELETE FROM track_table WHERE tableid=@tblid AND tupleID IN (
  SELECT DISTINCT tupleID FROM track_table
  WHERE tableid=@tblid AND actionid=@actionid AND date_insert < DATE_SUB(CURDATE(), INTERVAL 30 day)
).

But the fastest way was using partition. so you can drop partition. Currently, my table have got about more than 40mil rows. and update hourly (400k rows update for each time), and i can drop the curr_date partition and reload data into the table. the drop command very fast (<100ms). Hope this help.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 dba.stackexchange
scroll top