Question

I have a dating website. In this website I used to send daily 10 photo matches to a user and store them in the structure as

SENDER RECEIVER
11      1
12      1
13      1
14      1

I maintain two month log. User can also check them by logging to my website.

Which means there is parallel insert and select, which surely is not an issue.

Problem is when a user becomes inactive or deletes its id, I need to remove all the entries from the log where sender='inactive-id'.

Size of log is approx 60 million. So whenever a delete queries comes in this huge table, all selects get locked and my site is getting down.

Note my table is merge myisam as I need to store 2-3 month records and on 1st of every month I change the definition

Was it helpful?

Solution

Normally, Table is the most granular object that is locked by DELETE statement. Therefore, by using MERGE table you combine several objects that can be lock independently into a single big object that will be locked, when DELETE hits ANY of its tables.

MERGE is a solution for tables which change rarely or never: MERGE Table Advantages and Disadvantages.

You have 2 options:

Minimise impact of locks:

  • Delete in small batches
  • Run delete job during low load hours
  • Consider not deleting at all, if it does not save you much space
  • Instead of deleting rows mark them as "deleted" or obsolete and exclude from SELECT queries

Have smaller objects locked (rather than locking all your tables at once):

  • Have several Delete statements to delete from each of underlying tables
  • Drop MERGE definition, delete data from each underlying table create MERGE. However, I think you can do it without dropping MERGE definition.
  • Use partitioning.

Quote from MySQL Manual:

An alternative to a MERGE table is a partitioned table, which stores partitions of a single table in separate files. Partitioning enables some operations to be performed more efficiently and is not limited to the MyISAM storage engine. For more information, see Chapter 18, Partitioning.

I would strongly advocate for partitioning, because: - You can fully automate your logging / data retention process: a script can create new and remove empty partitions, move obsolete data to a different table and then truncate that table. - key uniqueness is enforced - Only partition that contains data to be deleted is locked. Selects on other partitions runs as normal. - Searches run on all partitions at the same time (as with MERGE), but you can use HASH SubPartitioning to further speed up searches.

However if you believe that benefits of partitioning will be outweighed by cost of development, then may be you should not delete that data at all?

OTHER TIPS

I think that the best solution would be setting partitions on log based on user id. This way when you run a delete Db will block only one partition.

If you Google on "delete on huge table" you'll get some informative results. Here are the first three hits:

http://www.dba-oracle.com/t_oracle_fastest_delete_from_large_table.htm

Improving DELETE and INSERT times on a large table that has an index structure

http://www.dbforums.com/microsoft-sql-server/1635228-fastest-way-delete-large-table.html

One method they all mention is deleting in small batches instead of all at once. You say that the table contains data for a 2 month period. Maybe you run delete statements for each day separate?

I hope this helps!

If you use InnoDB and create FOREIGN KEY relations, you can get the rows deleted automatically when the user themself is deleted:

CREATE TABLE `DailyChoices`(

sender INT(11) NOT NULL, receiver INT(11) NOT NULL, CONSTRAINT FOREIGN KEY (sender) REFERENCES users (userid) ON DELETE CASCADE ON UPDATE CASCADE ) TYPE = InnoDB;

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top