MySQL与SQL Server Express性能比较

https://stackoverflow.com/questions/405795

03-07-2019
|

题

我有一个有点复杂的查询，大约有100K行。

查询在SQL Server Express中运行13秒（在我的开发框中运行）

具有相同索引和表的相同查询需要超过15分钟才能在MySQL 5.1上运行（在我的生产盒上运行 - 功能更强大，并且使用100％资源进行测试）有时查询会使计算机崩溃记忆错误。

我在MySQL中做错了什么？为什么需要这么长时间？

select e8.*
from table_a e8
inner join (
    select max(e6.id) as id, e6.category, e6.entity, e6.service_date
    from (
        select e4.* 
        from table_a e4
        inner join (
            select max(e2.id) as id, e3.rank, e2.entity, e2.provider_id, e2.service_date
            from table_a e2
            inner join (
                select min(e1.rank) as rank, e1.entity, e1.provider_id, e1.service_date
                from table_a e1
                where e1.site_id is not null
                group by e1.entity, e1.provider_id, e1.service_date 
            ) as e3
            on e2.rank= e3.rank
            and e2.entity = e3.entity
            and e2.provider_id = e3.provider_id
            and e2.service_date = e3.service_date
            and e2.rank= e3.rank
            group by e2.entity, e2.provider_id, e2.service_date, e3.rank
        ) e5
        on e4.id = e5.id
        and e4.rank= e5.rank                            
    ) e6
    group by e6.category, e6.entity, e6.service_date 
) e7
on e8.id = e7.id and e7.category = e8.category

解决方案

这个答案我最初试图发布到你删除的问题，这并没有表明它是MySQL的问题。我仍然会继续使用SQL Server使用CTE重构查询，然后转换回嵌套查询（如果有的话）。抱歉格式化，杰夫阿特伍德发给我原始发布的文本，我不得不重新格式化。

没有数据，预期结果和好名字很难做到，但我会将所有嵌套查询转换为CTE，将它们叠加起来，将它们命名为有意义并重构 - 从排除未使用的列开始。删除列不会导致改进，因为优化器非常智能 - 但它会使您能够改进查询 - 可能会将某些或所有CTE分解出来。我不确定你的代码在做什么，但是你可能会发现新的RANK（）类型函数很有用，因为看起来你正在使用一种带有所有这些自连接的搜索模式。

所以从这里开始吧。我已经看过你的e7改进了，e7中未使用的列可能表示有关分组可能性的缺陷或不完整的思考，但如果这些列真的没必要，那么这可能会一直涓涓细流回到e6中的逻辑， e5和e3。如果e7中的分组正确，那么您可以在结果和连接中消除除max（id）之外的所有内容。我不明白为什么每个类别会有多个MAX（id），因为这会在你加入时增加你的结果，所以MAX（id）在类别中必须是唯一的，在这种情况下，类别在连接中是多余的。 / p>

WITH e3 AS (
select min(e1.rank) as rank,
e1.entity,
e1.provider_id,
e1.service_date
from table_a e1
where e1.site_id is not null
group by e1.entity, e1.provider_id, e1.service_date
)

,e5 AS (
select max(e2.id) as id,
e3.rank,
e2.entity,
e2.provider_id,
e2.service_date
from table_a e2
inner join e3
on e2.rank= e3.rank
and e2.entity = e3.entity
and e2.provider_id = e3.provider_id
and e2.service_date = e3.service_date
and e2.rank= e3.rank
group by e2.entity, e2.provider_id, e2.service_date, e3.rank
)

,e6 AS (
select e4.* -- switch from * to only the columns you are actually using
from table_a e4
inner join e5
on e4.id = e5.id
and e4.rank= e5.rank
)

,e7 AS (
select max(e6.id) as id, e6.category -- unused, e6.entity, e6.service_date
from e6
group by e6.category, e6.entity, e6.service_date
-- This instead
-- select max(e6.id) as id
-- from e6
-- group by e6.category, e6.entity, e6.service_date
)

select e8.*
from table_a e8
inner join e7
on e8.id = e7.id
and e7.category = e8.category
-- THIS INSTEAD on e8.id = e7.id

其他提示

如果有效的索引可用，则<100,000> 100,000行不应该花费13秒。我怀疑差异是因为SQL服务器具有比MySQL更强大的查询优化器。 MySQL拥有的内容更多的是SQL Parser而不是Optimizer。

您需要提供更多信息 - 所有参与表的完整模式，以及每个参与表的完整索引列表。

然后了解数据的内容以及查询的目的。用例的顺序。

解析计划对两者都很有意思，看看它们之间存在什么差异。我不确定这是一个苹果和橙色的比较，但我很好奇。

我不知道这个是否有帮助，但这是搜索“mysql查询优化器”的第一个命中。

这是另一个这可能是值得的。

我所知道的唯一拥有CTE的开源数据库是Firebird（ http ：//www.firebirdsql.org/rlsnotesh/rlsnotes210.html#rnfb210-cte ）

Postgres将在8.4中认为

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow