再帰的なサブクエリファクタリングによるサイクル検出

https://stackoverflow.com/questions/1731889

20-09-2019
|

質問

Oracle SQLは、V2以降、構文による独自の接続を使用して階層クエリを実行できます。最新の11Gリリース2で、彼らは再帰的なサブクエリファクタリングを追加しました。これはANSI標準であり、私が正しく理解していれば、これは他のRDBMSベンダーによっても実装されています。

接続と再帰を比較すると、サイクル検出を使用するときに結果セットの違いに気付きました。結果による接続は私にとってより直感的ですので、Oracleの実装にバグが含まれているのか、これが標準的なANSIと予想される動作なのか疑問に思っています。したがって、私の質問は、MySQL、DB2、SQL Serverなどの他のデータベースを使用して、クエリで再帰を確認できるかどうかです。もちろん、これらのデータベースが句を使用して再帰をサポートする場合。

Oracle 11.2.0.1.0でどのように機能するかを以下に示します

SQL> select *
  2    from t
  3  /

        ID  PARENT_ID
---------- ----------
         1          2
         2          1

2 rows selected.

構文によるConnectを使用したクエリ：

SQL>  select id
  2        , parent_id
  3        , connect_by_iscycle
  4     from t
  5  connect by nocycle parent_id = prior id
  6    start with id = 1
  7  /

        ID  PARENT_ID CONNECT_BY_ISCYCLE
---------- ---------- ------------------
         1          2                  0
         2          1                  1

2 rows selected.

それは私に直感的に見えます。ただし、新しいANSI構文を使用すると、もう1つの行が返されます。

SQL> with tr (id,parent_id) as
  2  ( select id
  3         , parent_id
  4      from t
  5     where id = 1
  6     union all
  7    select t.id
  8         , t.parent_id
  9      from t
 10           join tr on t.parent_id = tr.id
 11  ) cycle id set is_cycle to '1' default '0'
 12  select id
 13       , parent_id
 14       , is_cycle
 15    from tr
 16  /

        ID  PARENT_ID I
---------- ---------- -
         1          2 0
         2          1 0
         1          2 1

3 rows selected.

これは、確認するために使用できるスクリプトです。

create table t
( id        number
, parent_id number
);
insert into t values (1, 2);
insert into t values (2, 1);
commit;
with tr (id,parent_id) as
( select id
       , parent_id
    from t
   where id = 1
   union all
  select t.id
       , t.parent_id
    from t
         join tr on t.parent_id = tr.id
) cycle id set is_cycle to '1' default '0'
select id
     , parent_id
     , is_cycle
  from tr;

解決

ドキュメントから CONNECT_BY_ISCYCLE:

CONNECT_BY_ISCYCLE 擬似柱の戻り 1 現在の列にも祖先である子供がいる場合

そしてそれは CYCLE:

祖先の列の1つがサイクル列に同じ値を持っている場合、行はサイクルを形成すると見なされます。

あなたの例では、行 2 祖先でもある子供がいますが、 id まだ返されていません。

言い換えると、 CONNECT_BY_ISCYCLE チェックします子供（まだ返されていません）、 CYCLE チェックします 現在の行 （すでに返されています）。

CONNECT BY 再帰的ですが、行ベースです CTE'sはセットベースです。

Oracleのドキュメントに注意してください CYCLE 「祖先の列」に言及します。ただし、一般的に言えば、再帰的に「祖先の列」の概念はありません CTE. 。これは、ツリーから完全に結果をもたらすことができるセットベースの操作です。一般的に言えば、アンカー部分と再帰部分は、異なるテーブルを使用することもできます。

再帰以来 CTEそうです いつもの 階層の木を構築するために使用される、 Oracle サイクルチェックを追加することにしました。しかし、セットベースの方法が再帰的です CTE動作すると、「先祖の列」サイクル条件の明確な定義がなければ定義できないため、次のステップがサイクルを生成するかどうかを伝えることは一般に不可能です。

「次の」ステップを実行するには、「現在の」セット全体を使用できる必要がありますが、現在のセット（サイクル列を含む）の各行を生成するには、「次の」操作の結果が必要です。

現在のセットが常に単一の行で構成されている場合、それは問題ではありません（ CONNECT BY）、しかし、再帰操作がセット全体で定義されている場合は問題です。

調べませんでした Oracle 11 それでも、しかし SQL Server 再帰的なものを実装します CTEただ隠すことによって CONNECT BY それらの背後には、多数の制限を配置する必要があります（すべてがすべてのセットベースの操作を効果的に禁止しています）。

PostgreSQL一方、実装は本当にセットベースです。再帰部分のアンカー部分を使用して、任意の操作を行うことができます。ただし、サイクルはそもそもサイクルが定義されていないため、サイクルを検出する手段はありません。

前述のように、 MySQL 実装していません CTEまったく（実装していません HASH JOIN'sまたは MERGE JOINSも同様に、ネストされたループのみなので、あまり驚かないでください）。

皮肉なことに、私は今日、まさにこのテーマについて手紙を受け取りました。それについてはブログで説明します。

アップデート：

再帰 CTE'sin SQL Server それ以上です CONNECT BY 変装した。衝撃的な詳細については、私のブログのこの記事をご覧ください。

SQL Server：再帰CTEは本当にセットベースですか？

他のヒント

PostgreSQLは、スタイルの階層クエリをサポートしますが、自動サイクル検出はありません。これは、独自に書く必要があることを意味し、返される行の数は、クエリの再帰部分で結合条件を指定する方法によって異なります。

両方の例は、ループを検出するためにids（all_idsと呼ばれる）の場合の配列を使用します。

WITH recursive tr (id, parent_id, all_ids, cycle) AS (
    SELECT id, parent_id, ARRAY[id], false
    FROM t
    WHERE id = 1
    UNION ALL
    SELECT t.id, t.parent_id, all_ids || t.id, t.id = ANY(all_ids)
    FROM t
    JOIN tr ON t.parent_id = tr.id AND NOT cycle)
SELECT id, parent_id, cycle
FROM tr;

 id | parent_id | cycle
----+-----------+-------
  1 |         2 | f
  2 |         1 | f
  1 |         2 | t


WITH recursive tr (id, parent_id, all_ids, cycle) AS (
    SELECT id, parent_id, ARRAY[id], false
    FROM t
    WHERE id = 1
    UNION ALL
    SELECT t.id, t.parent_id, all_ids || t.id, (EXISTS(SELECT 1 FROM t AS x WHERE x.id = t.parent_id))
    FROM t
    JOIN tr ON t.parent_id = tr.id
    WHERE NOT t.id = ANY(all_ids))
SELECT id, parent_id, cycle
FROM tr;

 id | parent_id | cycle
----+-----------+-------
  1 |         2 | f
  2 |         1 | t

私の知る限り：

MySQLは再帰CTEをサポートしていません
SQL Severは、再帰CTEのサイクル検出をサポートしていません

MySQL Serverバージョン5.0.45は気に入らなかった with:

エラー1064（42000）：SQL構文にエラーがあります。適切な構文のためにMySQL Serverバージョンに対応するマニュアルを確認してください。

WITH RECURSIVE s (master, slave, all_ids, cycle) AS
(
    SELECT master, slave, ARRAY[master], false FROM binding WHERE master=3477

    UNION ALL

    SELECT d.master, d.slave, all_ids || d.master, d.slave = ANY(all_ids)
    FROM
        binding AS d
    JOIN
        s
    ON (d.master = s.slave)
    WHERE NOT d.master = ANY(all_ids)
)
SELECT *
FROM s;

この状態はもっと良いと思います d.slave = ANY(all_ids)

Results of the connect by may not always be intuitive.

Below queries demonstrate different approaches to detect cycles starting with id = 3 for the graph on the picture.

create table graph (id, id_parent) as
(select 2, 1 from dual
union all select 3, 1 from dual
union all select 4, 3 from dual
union all select 5, 4 from dual
union all select 3, 5 from dual)

SQL> select level lvl, graph.*, connect_by_iscycle cycle
  2    from graph
  3   start with id = 3
  4  connect by nocycle prior id = id_parent;

       LVL         ID  ID_PARENT      CYCLE
---------- ---------- ---------- ----------
         1          3          1          0
         2          4          3          0
         3          5          4          1
         1          3          5          0
         2          4          3          0
         3          5          4          1

6 rows selected.

SQL> select level lvl, graph.*, connect_by_iscycle cycle
  2    from graph
  3   start with id = 3
  4  connect by nocycle prior id = id_parent
  5         and prior id_parent is not null;

       LVL         ID  ID_PARENT      CYCLE
---------- ---------- ---------- ----------
         1          3          1          0
         2          4          3          0
         3          5          4          0
         4          3          5          1
         1          3          5          0
         2          4          3          0
         3          5          4          1

7 rows selected.

SQL> with t(id, id_parent) as
  2   (select *
  3      from graph
  4     where id = 3
  5    union all
  6    select g.id, g.id_parent
  7      from t
  8      join graph g
  9        on t.id = g.id_parent)
 10  search depth first by id set ord
 11  cycle id set cycle to 1 default 0
 12  select * from t;

        ID  ID_PARENT        ORD C
---------- ---------- ---------- -
         3          1          1 0
         4          3          2 0
         5          4          3 0
         3          5          4 1
         3          5          5 0
         4          3          6 0
         5          4          7 0
         3          5          8 1

8 rows selected.

Node with id = 3 has two parents so Oracle traverses two cycles in this example.

(1, 3) -> (3, 4) -> (4, 5) -> (5, 3)

and

(5, 3) -> (3, 4) -> (4, 5)

Edge (5, 3) is missing from the result of the first query and first cycle. At the same time edge (5, 3) appears in the result for the third query and second cycle twice.

Why so? You can check description of the cycle detection logic in the answer provided by Quassnoi. In plain English it means that

(1) connect by detects a cycle if child ID for current row is part of IDs visited so far

(2) rec with detects a cycle if ID for current row is part of IDs visited so far

Result of the second query looks the most natural although there is additional predicate and prior id_parent is not null. In this case

(3) it detects a cycle if ID for current row is part of parent IDs visited so far

All this conditions are implemented in columns cnt1, cnt2, cnt3 in below query.

SQL> with t(id, id_parent, path_id, path_id_parent, cnt1, cnt2, cnt3) as
  2   (select g.*,
  3           cast('->' || g.id as varchar2(4000)),
  4           cast('->' || g.id_parent as varchar2(4000)),
  5           0,
  6           0,
  7           0
  8      from graph g
  9     where id = 3
 10    union all
 11    select g.id,
 12           g.id_parent,
 13           t.path_id || '->' || g.id,
 14           t.path_id_parent || '->' || g.id_parent,
 15           regexp_count(t.path_id || '->', '->' ||
 16            (select id from graph c where c.id_parent = g.id) || '->'),
 17           regexp_count(t.path_id || '->', '->' || g.id || '->'),
 18           regexp_count(t.path_id_parent || '->', '->' || g.id || '->')
 19      from t
 20      join graph g
 21        on t.id = g.id_parent
 22    -- and t.cnt1 = 0
 23    -- and t.cnt2 = 0
 24    -- and t.cnt3 = 0
 25    )
 26  search depth first by id set ord
 27  cycle id set cycle to 1 default 0
 28  select * from t;

        ID  ID_PARENT PATH_ID         PATH_ID_PARENT  CNT1 CNT2 CNT3        ORD C
---------- ---------- --------------- --------------- ---- ---- ---- ---------- -
         3          1 ->3             ->1                0    0    0          1 0
         4          3 ->3->4          ->1->3             0    0    0          2 0
         5          4 ->3->4->5       ->1->3->4          1    0    0          3 0
         3          5 ->3->4->5->3    ->1->3->4->5       1    1    1          4 1
         3          5 ->3             ->5                0    0    0          5 0
         4          3 ->3->4          ->5->3             0    0    0          6 0
         5          4 ->3->4->5       ->5->3->4          1    0    1          7 0
         3          5 ->3->4->5->3    ->5->3->4->5       1    1    1          8 1

8 rows selected.

If you uncomment filter by cnt1/cnt2/cnt3 and remove "cycle id set cycle to 1 default 0" then query will return result as corresponding query above. In other words you can avoid cycle clause and implement whatever cycle detection logic you find more intuitive.

Additional details about traversing hierarchies and cycle detection can be found in the book Oracle SQL Revealed.

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow