题
我有以下数据结构和数据:
CREATE TABLE `parent` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(10) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `parent` VALUES(1, 'parent 1');
INSERT INTO `parent` VALUES(2, 'parent 2');
CREATE TABLE `other` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(10) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `other` VALUES(1, 'other 1');
INSERT INTO `other` VALUES(2, 'other 2');
CREATE TABLE `relationship` (
`id` int(11) NOT NULL auto_increment,
`parent_id` int(11) NOT NULL,
`other_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `relationship` VALUES(1, 1, 1);
INSERT INTO `relationship` VALUES(2, 1, 2);
INSERT INTO `relationship` VALUES(3, 2, 1);
我想找到既对方的1&2的父记录。
这是我已经想通了,但我不知道是否有更好的方法:
SELECT p.id, p.name
FROM parent AS p
LEFT JOIN relationship AS r1 ON (r1.parent_id = p.id)
LEFT JOIN relationship AS r2 ON (r2.parent_id = p.id)
WHERE r1.other_id = 1 AND r2.other_id = 2;
,结果为1,“亲本1”,这是正确的。问题是,一旦你得到一个列表的5+连接,它就会变得混乱和关系表的增长,它变得缓慢。
有没有更好的办法?
我使用MySQL和PHP,但是这可能是非常通用的。
解决方案
好的,我测试这一点。从最好到最差的查询是:
<强>查询1:联接(0.016s;基本上即时)强>
SELECT p.id, name
FROM parent p
JOIN relationship r1 ON p.id = r1.parent_id AND r1.other_id = 100
JOIN relationship r2 ON p.id = r2.parent_id AND r2.other_id = 101
JOIN relationship r3 ON p.id = r3.parent_id AND r3.other_id = 102
JOIN relationship r4 ON p.id = r4.parent_id AND r4.other_id = 103
<强>查询2:EXISTS(0.625s)强>
SELECT id, name
FROM parent p
WHERE EXISTS (SELECT 1 FROM relationship WHERE parent_id = p.id AND other_id = 100)
AND EXISTS (SELECT 1 FROM relationship WHERE parent_id = p.id AND other_id = 101)
AND EXISTS (SELECT 1 FROM relationship WHERE parent_id = p.id AND other_id = 102)
AND EXISTS (SELECT 1 FROM relationship WHERE parent_id = p.id AND oth
<强>查询3:聚合(1.016s)强>
选择p.id,p.name 从父p WHERE(SELECT COUNT(*)FROM关系WHERE PARENT_ID = p.id AND other_id IN(100101102103))
<强>查询4:UNION骨料(2.39s)强>
SELECT id, name FROM (
SELECT p1.id, p1.name
FROM parent AS p1 LEFT JOIN relationship as r1 ON(r1.parent_id=p1.id)
WHERE r1.other_id = 100
UNION ALL
SELECT p2.id, p2.name
FROM parent AS p2 LEFT JOIN relationship as r2 ON(r2.parent_id=p2.id)
WHERE r2.other_id = 101
UNION ALL
SELECT p3.id, p3.name
FROM parent AS p3 LEFT JOIN relationship as r3 ON(r3.parent_id=p3.id)
WHERE r3.other_id = 102
UNION ALL
SELECT p4.id, p4.name
FROM parent AS p4 LEFT JOIN relationship as r4 ON(r4.parent_id=p4.id)
WHERE r4.other_id = 103
) a
GROUP BY id, name
HAVING count(*) = 4
其实上面产生了错误的数据,所以它要么错还是我做错了什么吧。无论什么情况下,上述仅仅是一个好主意。
如果这还不算快,那么你需要看看该查询的解释计划。你可能只是缺少合适的指标。与尝试:
CREATE INDEX ON relationship (parent_id, other_id)
你去聚集(FROM ... SELECT COUNT(*))的路由之前,您应该阅读的 SQL语句 - ‘加入’VS‘group by和having’
。注意:强>上面的定时是基于:
CREATE TABLE parent (
id INT PRIMARY KEY,
name VARCHAR(50)
);
CREATE TABLE other (
id INT PRIMARY KEY,
name VARCHAR(50)
);
CREATE TABLE relationship (
id INT PRIMARY KEY,
parent_id INT,
other_id INT
);
CREATE INDEX idx1 ON relationship (parent_id, other_id);
CREATE INDEX idx2 ON relationship (other_id, parent_id);
和近80万记录与创建的:
<?php
ini_set('max_execution_time', 600);
$start = microtime(true);
echo "<pre>\n";
mysql_connect('localhost', 'scratch', 'scratch');
if (mysql_error()) {
echo "Connect error: " . mysql_error() . "\n";
}
mysql_select_db('scratch');
if (mysql_error()) {
echo "Selct DB error: " . mysql_error() . "\n";
}
define('PARENTS', 100000);
define('CHILDREN', 100000);
define('MAX_CHILDREN', 10);
define('SCATTER', 10);
$rel = 0;
for ($i=1; $i<=PARENTS; $i++) {
query("INSERT INTO parent VALUES ($i, 'Parent $i')");
$potential = range(max(1, $i - SCATTER), min(CHILDREN, $i + SCATTER));
$elements = sizeof($potential);
$other = rand(1, min(MAX_CHILDREN, $elements - 4));
$j = 0;
while ($j < $other) {
$index = rand(0, $elements - 1);
if (isset($potential[$index])) {
$c = $potential[$index];
$rel++;
query("INSERT INTO relationship VALUES ($rel, $i, $c)");
unset($potential[$index]);
$j++;
}
}
}
for ($i=1; $i<=CHILDREN; $i++) {
query("INSERT INTO other VALUES ($i, 'Other $i')");
}
$count = PARENTS + CHILDREN + $rel;
$stop = microtime(true);
$duration = $stop - $start;
$insert = $duration / $count;
echo "$count records added.\n";
echo "Program ran for $duration seconds.\n";
echo "Insert time $insert seconds.\n";
echo "</pre>\n";
function query($str) {
mysql_query($str);
if (mysql_error()) {
echo "$str: " . mysql_error() . "\n";
}
}
?>
所以再次加入携带的那一天。
其他提示
由于父表包含唯一键(PARENT_ID,other_id),你可以这样做:
select p.id, p.name
from parent as p
where (select count(*)
from relationship as r
where r.parent_id = p.id
and r.other_id in (1,2)
) >= 2
简化了一点,这应该工作,并有效地
SELECT DISTINCT p.id,p.name结果 从父p点击 INNER JOIN关系R1 ON p.id = r1.parent_id AND r1.other_id = 1,点击 INNER JOIN关系R2上p.id = r2.parent_id AND r2.other_id = 2
将需要至少一个接合记录对每个“其他”值。而优化器应该知道它只有找到每一个比赛,而且只需要读取索引,也不子公司表,其中一个甚至没有在所有引用。
我还没有实际测试,但沿的线的东西:
SELECT id, name FROM (
SELECT p1.id, p1.name
FROM parent AS p1 LEFT JOIN relationship as r1 ON(r1.parent_id=p1.id)
WHERE r1.other_id = 1
UNION ALL
SELECT p2.id, p2.name
FROM parent AS p2 LEFT JOIN relationship as r2 ON(r2.parent_id=p2.id)
WHERE r2.other_id = 2
-- etc
) GROUP BY id, name
HAVING count(*) = 2
我们的想法是你没有做多路连接;只是你的IDS连接结果的定期加入,组,并挑选在每一个环节出现了该行。
此通过许多搜索多个联系人何时许多联接是一个常见的问题。这通常是使用“标签”概念,如遇到服务计算器
查看一个更好的架构我的其他职位标签(你的情况 '其他')存储
搜索是一个两个步骤的过程:
- 找到具有任何TagCollections /所有您所需要的标记的所有可能candiates(可以是使用循环结构的光标更容易) 基于
- 选择数据匹配TagCollection 醇>
性能始终是更快,因为有比数据项显著更少TagCollections搜索
您可以用嵌套选择这样做,我测试了它在2005年MSSQL但正如你说,这应该是非常普通的。
SELECT * FROM parent p
WHERE p.id in(
SELECT r.parent_Id
FROM relationship r
WHERE r.parent_id in(1,2)
GROUP BY r.parent_id
HAVING COUNT(r.parent_Id)=2
)
和在COUNT(r.parent_Id)=2
数字2是根据的连接所需的数目)
如果你可以把你other_id值列表进入,这将是理想的表。下面的代码查找父母至少给出的ID。如果你想让它具有完全相同的IDS(即无需额外),你就必须稍微改变查询。
SELECT
p.id,
p.name
FROM
My_Other_IDs MOI
INNER JOIN Relationships R ON
R.other_id = MOI.other_id
INNER JOIN Parents P ON
P.parent_id = R.parent_id
GROUP BY
p.parent_id,
p.name
HAVING
COUNT(*) = (SELECT COUNT(*) FROM My_Other_IDs)