Question

I have no control over things like tmp_table_size and max_heap_table_size, and so as our tables grow the time taken by queries requiring temp tables is growing geometrically.

I am wondering if there is a way to prevent MySQL from using temp tables for these queries? What would be the best approach in this situation:

Here is an example of the biggest offender:

SELECT `skills`.`id`
FROM (`jobs_skills`)
JOIN `jobs` ON (`jobs`.`id` = `jobs_skills`.`job_id`)
JOIN `skills` ON (`skills`.`id` = `jobs_skills`.`skill_id`)
WHERE `jobs`.`job_visibility_id` = 1
AND `jobs`.`active` = 1
AND `skills`.`valid` = 1
AND `jobs_skills`.`skill_id` IN (96,101,103,108,121,2610,99,119,2607,102,104,112,113,122,1032,1488,2608,109,126,1438,2310,2318,2622,118,1046,1387,2609,100,116,123,2611,2612,2616,2618,114,127,1562,1587,1608,2276,2615,125,1070,1071,1161,1658,2613,2614,2617,105,110,111,120,1394,1435)
GROUP BY `jobs_skills`.`job_id`

for which copying to temp table took 107 seconds, 99% of the total query time.

despite fears of tl;dr syndrome, I am offering . . .

MORE DETAILS

Here is the EXPLAIN statement for the query:

+----+-------------+-------------+--------+----------------------+--------------+---------+----------------------------------+--------+----------------------------------------------+
| id | select_type | table       | type   | possible_keys        | key          | key_len | ref                              | rows   | Extra                                        |
+----+-------------+-------------+--------+----------------------+--------------+---------+----------------------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | jobs        | ref    | PRIMARY,active_index | active_index | 1       | const                            | 468958 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | jobs_skills | ref    | PRIMARY              | PRIMARY      | 4       | 557574_prod.jobs.id              |      1 | Using where; Using index                     |
|  1 | SIMPLE      | skills      | eq_ref | PRIMARY              | PRIMARY      | 4       | 557574_prod.jobs_skills.skill_id |      1 | Using where                                  |
+----+-------------+-------------+--------+----------------------+--------------+---------+----------------------------------+--------+----------------------------------------------+

and here are the CREATE TABLE statements for the relevant tables:

| jobs  | CREATE TABLE `jobs` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `user_id` int(10) unsigned NOT NULL,
  `title` varchar(40) NOT NULL,
  `description` text NOT NULL,
  `address_id` int(10) unsigned NOT NULL,
  `proximity` smallint(3) unsigned NOT NULL default '15',
  `job_payrate_id` tinyint(1) unsigned NOT NULL default '1',
  `payrate` int(10) unsigned NOT NULL,
  `start_date` int(10) unsigned NOT NULL,
  `job_start_id` tinyint(1) unsigned NOT NULL default '1',
  `duration` tinyint(1) unsigned NOT NULL COMMENT 'Full-time, Part-time, Flexible',
  `posting_date` int(10) unsigned NOT NULL,
  `revision_date` int(10) unsigned NOT NULL,
  `expiration` int(10) unsigned NOT NULL,
  `active` tinyint(1) unsigned NOT NULL default '1',
  `team_size` tinyint(2) unsigned NOT NULL default '1',
  `job_type_id` tinyint(1) unsigned NOT NULL default '1',
  `job_shift_id` tinyint(1) unsigned NOT NULL default '1',
  `job_visibility_id` tinyint(1) unsigned NOT NULL default '1',
  `position_count` smallint(5) unsigned NOT NULL default '1',
  `impressions` int(10) unsigned NOT NULL default '0',
  `clicks` int(10) unsigned NOT NULL default '0',
  `employer_email` varchar(100) NOT NULL default '',
  `job_source_id` smallint(6) unsigned NOT NULL default '0',
  `job_password` varchar(50) NOT NULL default '',
  PRIMARY KEY  (`id`),
  KEY `active_index` (`active`),
  KEY `user_id_index` (`user_id`),
  KEY `address_id_index` (`address_id`),
  KEY `posting_date_index` USING BTREE (`posting_date`)
) ENGINE=InnoDB AUTO_INCREMENT=875013 DEFAULT CHARSET=utf8

-

| jobs_skills | CREATE TABLE `jobs_skills` (
  `job_id` int(10) unsigned NOT NULL,
  `skill_id` int(10) unsigned NOT NULL,
  `required` tinyint(1) unsigned NOT NULL,
  PRIMARY KEY  (`job_id`,`skill_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |

-

| skills | CREATE TABLE `skills` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `parent_id` int(10) unsigned NOT NULL,
  `name` varchar(35) NOT NULL default '',
  `description` varchar(250) NOT NULL,
  `valid` tinyint(1) unsigned NOT NULL default '0',
  `is_category` tinyint(1) unsigned NOT NULL default '0',
  `last_edited` int(10) unsigned NOT NULL default '0',
  `impressions` int(10) unsigned NOT NULL default '0',
  `clicks` int(10) unsigned NOT NULL default '0',
  `jobs` int(10) unsigned NOT NULL default '0',
  PRIMARY KEY  (`id`),
  KEY `name` (`name`),
  KEY `parent` (`parent_id`)
) ENGINE=InnoDB AUTO_INCREMENT=2657 DEFAULT CHARSET=utf8 |

Like I said, this is not the only query with this problem, so any general advice would be most helpful, though I will not decline any advice specific to this query.

Was it helpful?

Solution

Your original subquery:

SELECT `skills`.`id`
FROM (`jobs_skills`)
JOIN `jobs` ON (`jobs`.`id` = `jobs_skills`.`job_id`)
JOIN `skills` ON (`skills`.`id` = `jobs_skills`.`skill_id`)
WHERE `jobs`.`job_visibility_id` = 1
AND `jobs`.`active` = 1
AND `skills`.`valid` = 1
AND `jobs_skills`.`skill_id` IN (96,101,103,108,121,2610,99,119,2607,102,104,112,113,122,1032,1488,2608,109,126,1438,2310,2318,2622,118,1046,1387,2609,100,116,123,2611,2612,2616,2618,114,127,1562,1587,1608,2276,2615,125,1070,1071,1161,1658,2613,2614,2617,105,110,111,120,1394,1435)
GROUP BY `jobs_skills`.`job_id`

You need to rafactor the query in such a way that you control and micromanage the temp tables being created and their sizes. Based solely on the JOIN, WHERE, and GROUP BY clauses, you need to implement the following changes:

jobs needs to be indexed on job_visibility_id,active,id

Needed Subquery

(SELECT id job_id FROM jobs WHERE job_visibility_id=1 AND active=1 ORDER BY id)

skills needs to be indexed on valid,id

Needed Subquery

(SELECT id skill_id FROM skills WHERE valid=1 ORDER BY id)

jobs_skills needs to be indexed on skill_id,job_id

Needed Subquery

(SELECT job_id FROM jobs_skills WHERE skill_id IN (96,101,103,108,121,2610,99,119,2607,102,104,112,113,122,1032,1488,2608,109,126,1438,2310,2318,2622,118,1046,1387,2609,100,116,123,2611,2612,2616,2618,114,127,1562,1587,1608,2276,2615,125,1070,1071,1161,1658,2613,2614,2617,105,110,111,120,1394,1435) ORDER BY skill_id,job_id)

SQL to create needed indexes

ALTER TABLE jobs ADD INDEX (job_visibility_id,active,id);
ALTER TABLE skills ADD INDEX (valid,id);
ALTER TABLE jobs_skills ADD INDEX (skill_id,job_id);

Now combine the Subqueries to form VOLTRON

SELECT skill_id
FROM (SELECT JS.*
FROM (SELECT skill_id,job_id FROM jobs_skills WHERE skill_id IN (96,101,103,108,121,2610,99,119,2607,102,104,112,113,122,1032,1488,2608,109,126,1438,2310,2318,2622,118,1046,1387,2609,100,116,123,2611,2612,2616,2618,114,127,1562,1587,1608,2276,2615,125,1070,1071,1161,1658,2613,2614,2617,105,110,111,120,1394,1435) ORDER BY skill_id,job_id) JS
INNER JOIN
(SELECT id job_id FROM jobs WHERE job_visibility_id=1 AND active=1 ORDER BY id) J
USING (job_id) INNER JOIN
(SELECT id skill_id FROM skills WHERE valid=1 ORDER BY id) S USING (skill_id)
) A
GROUP BY job_id;

Give it a Try !!!

BTW if the syntax is incorrect, I'll try to adjust it !!!

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top