Tratar con tablas temporales cuando no tiene el control sobre las variables db

https://dba.stackexchange.com/questions/2915

16-10-2019
|

Pregunta

No tengo ningún control sobre cosas como tmp_table_size y max_heap_table_size, y así como nuestras mesas crecen el tiempo empleado por las consultas que requieren tablas temporales está creciendo geométricamente.

Me pregunto si hay una manera de prevenir el uso de MySQL tablas temporales para estas consultas? ¿Cuál sería el mejor enfoque en esta situación:

Este es un ejemplo de los más grandes delincuente:

SELECT `skills`.`id`
FROM (`jobs_skills`)
JOIN `jobs` ON (`jobs`.`id` = `jobs_skills`.`job_id`)
JOIN `skills` ON (`skills`.`id` = `jobs_skills`.`skill_id`)
WHERE `jobs`.`job_visibility_id` = 1
AND `jobs`.`active` = 1
AND `skills`.`valid` = 1
AND `jobs_skills`.`skill_id` IN (96,101,103,108,121,2610,99,119,2607,102,104,112,113,122,1032,1488,2608,109,126,1438,2310,2318,2622,118,1046,1387,2609,100,116,123,2611,2612,2616,2618,114,127,1562,1587,1608,2276,2615,125,1070,1071,1161,1658,2613,2614,2617,105,110,111,120,1394,1435)
GROUP BY `jobs_skills`.`job_id`

para los que copying to temp table tomó 107 segundos, el 99% del tiempo total de consulta.

a pesar de los temores de tl; dr síndrome, estoy ofreciendo. . .

Más información

Aquí está la declaración EXPLAIN para la consulta:

+----+-------------+-------------+--------+----------------------+--------------+---------+----------------------------------+--------+----------------------------------------------+
| id | select_type | table       | type   | possible_keys        | key          | key_len | ref                              | rows   | Extra                                        |
+----+-------------+-------------+--------+----------------------+--------------+---------+----------------------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | jobs        | ref    | PRIMARY,active_index | active_index | 1       | const                            | 468958 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | jobs_skills | ref    | PRIMARY              | PRIMARY      | 4       | 557574_prod.jobs.id              |      1 | Using where; Using index                     |
|  1 | SIMPLE      | skills      | eq_ref | PRIMARY              | PRIMARY      | 4       | 557574_prod.jobs_skills.skill_id |      1 | Using where                                  |
+----+-------------+-------------+--------+----------------------+--------------+---------+----------------------------------+--------+----------------------------------------------+

y aquí están las declaraciones CREATE TABLE para las tablas correspondientes:

| jobs  | CREATE TABLE `jobs` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `user_id` int(10) unsigned NOT NULL,
  `title` varchar(40) NOT NULL,
  `description` text NOT NULL,
  `address_id` int(10) unsigned NOT NULL,
  `proximity` smallint(3) unsigned NOT NULL default '15',
  `job_payrate_id` tinyint(1) unsigned NOT NULL default '1',
  `payrate` int(10) unsigned NOT NULL,
  `start_date` int(10) unsigned NOT NULL,
  `job_start_id` tinyint(1) unsigned NOT NULL default '1',
  `duration` tinyint(1) unsigned NOT NULL COMMENT 'Full-time, Part-time, Flexible',
  `posting_date` int(10) unsigned NOT NULL,
  `revision_date` int(10) unsigned NOT NULL,
  `expiration` int(10) unsigned NOT NULL,
  `active` tinyint(1) unsigned NOT NULL default '1',
  `team_size` tinyint(2) unsigned NOT NULL default '1',
  `job_type_id` tinyint(1) unsigned NOT NULL default '1',
  `job_shift_id` tinyint(1) unsigned NOT NULL default '1',
  `job_visibility_id` tinyint(1) unsigned NOT NULL default '1',
  `position_count` smallint(5) unsigned NOT NULL default '1',
  `impressions` int(10) unsigned NOT NULL default '0',
  `clicks` int(10) unsigned NOT NULL default '0',
  `employer_email` varchar(100) NOT NULL default '',
  `job_source_id` smallint(6) unsigned NOT NULL default '0',
  `job_password` varchar(50) NOT NULL default '',
  PRIMARY KEY  (`id`),
  KEY `active_index` (`active`),
  KEY `user_id_index` (`user_id`),
  KEY `address_id_index` (`address_id`),
  KEY `posting_date_index` USING BTREE (`posting_date`)
) ENGINE=InnoDB AUTO_INCREMENT=875013 DEFAULT CHARSET=utf8

| jobs_skills | CREATE TABLE `jobs_skills` (
  `job_id` int(10) unsigned NOT NULL,
  `skill_id` int(10) unsigned NOT NULL,
  `required` tinyint(1) unsigned NOT NULL,
  PRIMARY KEY  (`job_id`,`skill_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |

| skills | CREATE TABLE `skills` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `parent_id` int(10) unsigned NOT NULL,
  `name` varchar(35) NOT NULL default '',
  `description` varchar(250) NOT NULL,
  `valid` tinyint(1) unsigned NOT NULL default '0',
  `is_category` tinyint(1) unsigned NOT NULL default '0',
  `last_edited` int(10) unsigned NOT NULL default '0',
  `impressions` int(10) unsigned NOT NULL default '0',
  `clicks` int(10) unsigned NOT NULL default '0',
  `jobs` int(10) unsigned NOT NULL default '0',
  PRIMARY KEY  (`id`),
  KEY `name` (`name`),
  KEY `parent` (`parent_id`)
) ENGINE=InnoDB AUTO_INCREMENT=2657 DEFAULT CHARSET=utf8 |

Como dije, esto no es la única consulta con este problema, por lo que cualquier consejo general sería más útil, aunque no voy a rechazar cualquier asesoramiento específico a esta consulta.

Solución

Su subconsulta originales:

SELECT `skills`.`id`
FROM (`jobs_skills`)
JOIN `jobs` ON (`jobs`.`id` = `jobs_skills`.`job_id`)
JOIN `skills` ON (`skills`.`id` = `jobs_skills`.`skill_id`)
WHERE `jobs`.`job_visibility_id` = 1
AND `jobs`.`active` = 1
AND `skills`.`valid` = 1
AND `jobs_skills`.`skill_id` IN (96,101,103,108,121,2610,99,119,2607,102,104,112,113,122,1032,1488,2608,109,126,1438,2310,2318,2622,118,1046,1387,2609,100,116,123,2611,2612,2616,2618,114,127,1562,1587,1608,2276,2615,125,1070,1071,1161,1658,2613,2614,2617,105,110,111,120,1394,1435)
GROUP BY `jobs_skills`.`job_id`

Debe rafactor la consulta de tal manera que el control y la microgestión de las tablas temporales que se crean y sus tamaños. Basado únicamente en la unión, WHERE y cláusulas GROUP BY, es necesario implementar los siguientes cambios:

necesita ser indexada en job_visibility_id puestos de trabajo, activo, id

Se necesita Subconsulta

(SELECT id job_id FROM jobs WHERE job_visibility_id=1 AND active=1 ORDER BY id)

necesita ser indexada en válida, id

habilidades

Se necesita Subconsulta

(SELECT id skill_id FROM skills WHERE valid=1 ORDER BY id)

necesita ser indexada en skill_id jobs_skills, job_id

Se necesita Subconsulta

(SELECT job_id FROM jobs_skills WHERE skill_id IN (96,101,103,108,121,2610,99,119,2607,102,104,112,113,122,1032,1488,2608,109,126,1438,2310,2318,2622,118,1046,1387,2609,100,116,123,2611,2612,2616,2618,114,127,1562,1587,1608,2276,2615,125,1070,1071,1161,1658,2613,2614,2617,105,110,111,120,1394,1435) ORDER BY skill_id,job_id)

SQL para crear índices necesarios

ALTER TABLE jobs ADD INDEX (job_visibility_id,active,id);
ALTER TABLE skills ADD INDEX (valid,id);
ALTER TABLE jobs_skills ADD INDEX (skill_id,job_id);

Ahora combinar las subconsultas para formar VOLTRON

SELECT skill_id
FROM (SELECT JS.*
FROM (SELECT skill_id,job_id FROM jobs_skills WHERE skill_id IN (96,101,103,108,121,2610,99,119,2607,102,104,112,113,122,1032,1488,2608,109,126,1438,2310,2318,2622,118,1046,1387,2609,100,116,123,2611,2612,2616,2618,114,127,1562,1587,1608,2276,2615,125,1070,1071,1161,1658,2613,2614,2617,105,110,111,120,1394,1435) ORDER BY skill_id,job_id) JS
INNER JOIN
(SELECT id job_id FROM jobs WHERE job_visibility_id=1 AND active=1 ORDER BY id) J
USING (job_id) INNER JOIN
(SELECT id skill_id FROM skills WHERE valid=1 ORDER BY id) S USING (skill_id)
) A
GROUP BY job_id;

darle una oportunidad !!!

Por cierto, si la sintaxis es incorrecta, voy a tratar de ajustarlo !!!

Licenciado bajo: CC-BY-SA con atribución

No afiliado a dba.stackexchange