I have 50 GB of data in a table, and have to remove it if the records are older than a particular date, after taking its backup.

Currently i follow the following steps:

  1. Take backup of complete table.
  2. Run a delete query with where clause for removing the non required data as:

    DELETE FROM <some-table-name> WHERE `creation_time` <= '<some-valid-time>'
    

Problem with the current approach are:

  1. It is painfully slow.
  2. Redundant storage of data, when only incremental data is required; due to the backup is taken of whole table but removal of only selective records are done.
  3. After deletion the disk space is not returned back to the OS (until optimization is done).

I thought of breaking that table into smaller tables for weekly/monthly basis which would enable easy backup and deletion, but query them together will be very difficult and slow.

Please advice some smart and efficient way to do this.

有帮助吗?

解决方案

Use the creation_time as a partitioning key, make per-week or per-month partitions. Dropping old partitions is incredibly fast.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top