¿Cómo se realiza un AND con una unión?

https://stackoverflow.com/questions/599461

11-09-2019
|

Pregunta

Tengo la siguiente estructura de datos y datos:

CREATE TABLE `parent` (
  `id` int(11) NOT NULL auto_increment,
  `name` varchar(10) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO `parent` VALUES(1, 'parent 1');
INSERT INTO `parent` VALUES(2, 'parent 2');

CREATE TABLE `other` (
  `id` int(11) NOT NULL auto_increment,
  `name` varchar(10) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO `other` VALUES(1, 'other 1');
INSERT INTO `other` VALUES(2, 'other 2');

CREATE TABLE `relationship` (
  `id` int(11) NOT NULL auto_increment,
  `parent_id` int(11) NOT NULL,
  `other_id` int(11) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO `relationship` VALUES(1, 1, 1);
INSERT INTO `relationship` VALUES(2, 1, 2);
INSERT INTO `relationship` VALUES(3, 2, 1);

Quiero encontrar los registros de los padres con tanto otra de 1 y 2.

Esto es lo que he descubierto, pero me pregunto si hay una mejor manera:

SELECT p.id, p.name
FROM parent AS p
    LEFT JOIN relationship AS r1 ON (r1.parent_id = p.id)
    LEFT JOIN relationship AS r2 ON (r2.parent_id = p.id)
WHERE r1.other_id = 1 AND r2.other_id = 2;

El resultado es 1, "padre 1", que es la correcta. El problema es que una vez que se obtiene una lista de 5+ une, se vuelve desordenado y como la tabla de relación crece, se vuelve lento.

¿Hay una mejor manera?

Estoy usando MySQL y PHP, pero esto es probablemente bastante genérico.

Solución

Ok, he probado esto. Las consultas de mejor a peor fueron los siguientes:

Consulta 1: Une (0.016s; básicamente instantánea )

SELECT p.id, name
FROM parent p
JOIN relationship r1 ON p.id = r1.parent_id AND r1.other_id = 100
JOIN relationship r2 ON p.id = r2.parent_id AND r2.other_id = 101
JOIN relationship r3 ON p.id = r3.parent_id AND r3.other_id = 102
JOIN relationship r4 ON p.id = r4.parent_id AND r4.other_id = 103

Consulta 2: existe (0.625s)

SELECT id, name
FROM parent p
WHERE EXISTS (SELECT 1 FROM relationship WHERE parent_id = p.id AND other_id = 100)
AND EXISTS (SELECT 1 FROM relationship WHERE parent_id = p.id AND other_id = 101)
AND EXISTS (SELECT 1 FROM relationship WHERE parent_id = p.id AND other_id = 102)
AND EXISTS (SELECT 1 FROM relationship WHERE parent_id = p.id AND oth

Consulta 3: Agregado (1.016s)

p.id SELECT, p.name P de los padres DONDE (SELECT COUNT (*) a partir de relación donde parent_id = p.id Y other_id IN (100,101,102,103))

Consulta 4: UNION agregada (2.39s)

SELECT id, name FROM (
  SELECT p1.id, p1.name
  FROM parent AS p1 LEFT JOIN relationship as r1 ON(r1.parent_id=p1.id)
  WHERE r1.other_id = 100
  UNION ALL
  SELECT p2.id, p2.name
  FROM parent AS p2 LEFT JOIN relationship as r2 ON(r2.parent_id=p2.id)
  WHERE r2.other_id = 101
  UNION ALL
  SELECT p3.id, p3.name
  FROM parent AS p3 LEFT JOIN relationship as r3 ON(r3.parent_id=p3.id)
  WHERE r3.other_id = 102
  UNION ALL
  SELECT p4.id, p4.name
  FROM parent AS p4 LEFT JOIN relationship as r4 ON(r4.parent_id=p4.id)
  WHERE r4.other_id = 103
) a
GROUP BY id, name
HAVING count(*) = 4

En realidad lo anterior estaba produciendo los datos erróneos por lo que es equivocada o hice algo mal con él. En cualquier caso, lo anterior es sólo una mala idea.

Si eso no es rápido, entonces usted tiene que mirar a explicar el plan para la consulta. Probablemente se esté sólo le hace falta índices apropiados. Prueba con:

CREATE INDEX ON relationship (parent_id, other_id)

Antes de ir por el camino de la agregación (SELECT COUNT (*) DE ...) debería leer instrucción SQL - ‘Join’ Vs ‘group by y having’

Nota: Los tiempos anteriores se basan en:

CREATE TABLE parent (
  id INT PRIMARY KEY,
  name VARCHAR(50)
);

CREATE TABLE other (
  id INT PRIMARY KEY,
  name VARCHAR(50)
);

CREATE TABLE relationship (
  id INT PRIMARY KEY,
  parent_id INT,
  other_id INT
);

CREATE INDEX idx1 ON relationship (parent_id, other_id);
CREATE INDEX idx2 ON relationship (other_id, parent_id);

y cerca de 800.000 registros creados con:

<?php
ini_set('max_execution_time', 600);

$start = microtime(true);

echo "<pre>\n";
mysql_connect('localhost', 'scratch', 'scratch');
if (mysql_error()) {
    echo "Connect error: " . mysql_error() . "\n";
}
mysql_select_db('scratch');
if (mysql_error()) {
    echo "Selct DB error: " . mysql_error() . "\n";
}

define('PARENTS', 100000);
define('CHILDREN', 100000);
define('MAX_CHILDREN', 10);
define('SCATTER', 10);
$rel = 0;
for ($i=1; $i<=PARENTS; $i++) {
    query("INSERT INTO parent VALUES ($i, 'Parent $i')");
    $potential = range(max(1, $i - SCATTER), min(CHILDREN, $i + SCATTER));
    $elements = sizeof($potential);
    $other = rand(1, min(MAX_CHILDREN, $elements - 4));
    $j = 0;
    while ($j < $other) {
        $index = rand(0, $elements - 1);
        if (isset($potential[$index])) {
            $c = $potential[$index];
            $rel++;
            query("INSERT INTO relationship VALUES ($rel, $i, $c)");
            unset($potential[$index]);
            $j++;
        }
    }
}
for ($i=1; $i<=CHILDREN; $i++) {
    query("INSERT INTO other VALUES ($i, 'Other $i')");
}

$count = PARENTS + CHILDREN + $rel;
$stop = microtime(true);
$duration = $stop - $start;
$insert = $duration / $count;

echo "$count records added.\n";
echo "Program ran for $duration seconds.\n";
echo "Insert time $insert seconds.\n";
echo "</pre>\n";

function query($str) {
    mysql_query($str);
    if (mysql_error()) {
        echo "$str: " . mysql_error() . "\n";
    }
}
?>

Así que una vez más se une a llevar el día.

Otros consejos

Dado que la tabla principal contiene la clave única en (parent_id, other_id) se puede hacer esto:

select p.id, p.name 
  from parent as p 
 where (select count(*) 
        from relationship as r 
       where r.parent_id = p.id 
         and r.other_id in (1,2)
        ) >= 2

Simplificando un poco, esto debería funcionar, y de manera eficiente.

SELECT DISTINCT p.id, p.name
  De los padres p
  Combinación interna relación r1 = EN p.id r1.parent_id Y r1.other_id = 1 |   Combinación interna relación R2 en p.id = r2.parent_id Y r2.other_id = 2

requerirá al menos un unieron registro para cada "otro" valor. Y el optimizador debe saber que sólo tiene que encontrar un partido cada uno, y sólo tiene que leer el índice, no cualquiera de las mesas auxiliares, una de las cuales ni siquiera se hace referencia en absoluto.

No he hecho probado, pero algo en la línea de:

SELECT id, name FROM (
  SELECT p1.id, p1.name
  FROM parent AS p1 LEFT JOIN relationship as r1 ON(r1.parent_id=p1.id)
  WHERE r1.other_id = 1
  UNION ALL
  SELECT p2.id, p2.name
  FROM parent AS p2 LEFT JOIN relationship as r2 ON(r2.parent_id=p2.id)
  WHERE r2.other_id = 2
   -- etc
) GROUP BY id, name
HAVING count(*) = 2

La idea es que usted no tiene que hacer varias vías une; simplemente concatenar los resultados de la regularidad se une, por sus identificadores de grupo, y recoger las filas que se presentaron en todos los segmentos.

Este es un problema común en la búsqueda de múltiples asociados a través de un muchos a muchos se unen. Esto se encuentra a menudo en los servicios que utilizan la 'etiqueta' concepto, por ejemplo, Stackoverflow

Ver mi otro post en una mejor arquitectura para etiqueta (en su caso 'otro') de almacenamiento

búsqueda es un proceso de dos pasos:

Para todas las posibles candiates de TagCollections que tengan cualquier / todas las etiquetas que se requieren (puede ser más fácil usar un cursor de construcción de bucle)
Seleccionar los datos que coinciden TagCollection

El rendimiento es siempre más rápido debido a estar ahí significativamente menos TagCollections que los elementos de datos para buscar

Puede hacerlo con un selecto anidado, lo probé en MSSQL 2005, pero como usted ha dicho que debe ser bastante genérica

SELECT * FROM parent p
WHERE p.id in(
    SELECT r.parent_Id 
    FROM relationship r 
    WHERE r.parent_id in(1,2) 
    GROUP BY r.parent_id
    HAVING COUNT(r.parent_Id)=2
)

y el número 2 en COUNT(r.parent_Id)=2 es de acuerdo con el número de combinaciones que necesita)

Si usted puede poner su lista de valores other_id en una tabla que sería lo ideal. El código de abajo se ve que los padres con al menos los identificadores dados. Si se desea que tenga exactamente el mismo ID (es decir, sin extras) que tendrían que cambiar ligeramente la consulta.

SELECT
     p.id,
     p.name
FROM
     My_Other_IDs MOI
INNER JOIN Relationships R ON
     R.other_id = MOI.other_id
INNER JOIN Parents P ON
     P.parent_id = R.parent_id
GROUP BY
     p.parent_id,
     p.name
HAVING
     COUNT(*) = (SELECT COUNT(*) FROM My_Other_IDs)

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow