Question

One of my database tables has grown quite large, to the point where I think it is impacting the performance on my site (it is definitely making backups a lot slower).

It has ~13,000,000 rows and is 4.2 GiB in size, of which 1.2 GiB is data.

The structure looks like this:

CREATE TABLE IF NOT EXISTS `t1` (
  `id` int(10) unsigned NOT NULL,
  `int2` int(10) unsigned NOT NULL,
  `int3` int(10) unsigned NOT NULL,
  `int4` int(10) unsigned NOT NULL,
  `char1` varchar(255) NOT NULL,
  `int5` int(10) NOT NULL,
  `char2` varchar(1024) DEFAULT NULL,
  `char3` varchar(1024) NOT NULL,
  PRIMARY KEY (`id`,`int2`,`int3`,`int4`),
  KEY `key1` (`id`,`int2`,`char1`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Common operations in this table are insert and selects, rows are never updated and rarely deleted. int2 is a running version number, which means usually only the rows with the highest value of int2 for that id are selected.

I have been thinking of several ways of optimizing this and I was wondering which one would be the which one to pursue:

  1. char1 (which is in the index) actually only contains about 40,000 different strings. I could move the strings into a second table (idchar -> char) and then just save the id in my main table, at the cost of an additional id lookup step during inserts and selects.
  2. char2 and char3 are often empty. I could move them to a separate table that I would then do a LEFT JOIN on in selects.
  3. Even if char2 and char3 contain data they are usually shorter than 1024 chars. I could probably shorten these to ~200.

Which one of these do you think is the most promising? Does decreasing the row size (either by making char1 into an integer or by removing/resizing columns) in MySQL InnoDB tables actually have a big impact on performance?

Thanks

Was it helpful?

Solution

There are several options. From what you say, moving char1 to another table seems quite reasonable. The additional lookup could, under some circumstances, even be faster than storing the raw data in the tables. (This occurs when the repeated values cause the table to be larger than necessary, especially when the larger table might be larger than available memory.) And, this would save space both in the data table and the corresponding index.

The exact impact on performance is hard to say, without understanding much more about your system and the query load.

Moving char3 and char4 to another table will have minimal impact. The overhead of the link to the other table would eat up any gains in space. You could save a couple bytes per record by storing them as varchar(255) rather than varchar(1024).

If you have a natural partitioning key, then partitioning is definitely an option, particularly for reducing the time for backups. This is very handy for a transaction-style table, where records are inserted and never or very rarely modified. If, on the other hand, the records contain customer records and any could be modified at any time, then you would still need to back up all the partitions.

OTHER TIPS

There are several factors that could affect performance of your DB. Partitioning is definitive the best option but not allways can be done. If you are searching char1 before insert, then partitioning can be a problem because you have to search all the parts for the key. You must analize how the data is generated and most important how you make your querys for this table. This is the key so you should post your querys over this table. In the case on char2 and char3, moving to another table won't make any difference. You also should mention the physical distribution of you data. Are you using a single data file? Are data files on same physical disk as SO? Give more details so we can give you more help.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top