Question

I have a PostgreSQL table that contains several conditions and their individual successors. Some conditions have several successors and these successors might have several successors too. So the goal is extract all possible chains of conditions to achieve something like a tree diagram in data. The table looks like this:

id  | con | succ
----|-----|-----
1   | a   |  b
2   | a   |  c
3   | a   |  d
4   | b   |  c
5   | b   |  f
6   | c   |  e
7   | c   |  g
8   | c   |  h
9   | d   |  h
10  | d   |  i

I still have no clear idea how to store the single chains in the end, but I need the starting point (a), the respective end point and all nodes between starting and end point. I'm thankful for all kind of advice on how to store the chains and how to extract them.

UPDATE:

This is an extract of my data:

ID  | parent_ID
----|----------
403 |   302
404 |   2xx
405 |   303
406 |   304
407 |   304
408 |   2xx
409 |   305
501 |   2xx
502 |   305
503 |   2xx
504 |   2xx
505 |   2xx
506 |   305
507 |   2xx
508 |   306
509 |   2xx
510 |   307
511 |   308
512 |   308
513 |   308
514 |   309
515 |   310
600 |   5xx

You see that some parent-IDs are not ID themselves but groups of IDs ('all beginning with 2'). Now the question is how to make the recursive query run or how to make the recursive query handle the '2xx'. The values are stored as characters. Instead of '2xx' another notation is possible aswell.

Was it helpful?

Solution

Querying tree- and graph-related data stored in a database efficiently is a rather vast topic.

In terms of storage, note that storing an (id, parent_id) pair will usually be the better (as in widely accepted) option.

The question is how to query it, and more importantly how to do so efficiently.

Your main options for trees include:

I'd add a hybrid variation of MPTT to the list: if you implement MPTT using float indexes, you can get away with not updating anything when moving things around in your tree, which makes things plenty fast. It's a lot trickier to maintain however, because collisions can occur when the difference between two indexes is too small — you need to re-index a large enough subset of the tree when this happens.

For graphs, WITH queries work too. Variations of MPTT and nested sets exist as well; for instance the GRIPP index. It's an area where research and new indexing methods are still quite active.

OTHER TIPS

Your best best is to work with the ltree data type. See the documentation here. That does require that you rework your table structure a bit though. If that is not an option, you should look at recursive with-queries that can - at first sight - work with your current table structure, but the queries will provide data in a format that is not as easy to manipulate as ltree data.

Converting your current table to a ltree variant is best done using a recursive with-query. First you need to create a new table to hold the ltree column:

CREATE TABLE tree_list (
  id int,
  chain ltree
);

Then run the recursive query and insert the results into the new table:

WITH RECURSIVE build_tree(id, chain) AS ( 
  SELECT id, con::ltree || succ
  FROM tree
  WHERE con = 'a'
UNION ALL
  SELECT tree.id, build_tree.chain || tree.succ
  FROM tree, build_tree
  WHERE build_tree.chain  ~ ('*.' || tree.con)::lquery)
INSERT INTO tree_list SELECT * FROM build_tree;

You will note that the 10 rows of data you provide above will yield 13 chains because there are multiple paths from a to each of e, g and h. This query should work with trees of practically unlimited depth.

 id |  chain  
----+---------
  1 | a.b
  2 | a.c
  3 | a.d
  4 | a.b.c
  5 | a.b.f
  6 | a.c.e
  7 | a.c.g
  8 | a.c.h
  9 | a.d.h
 10 | a.d.i
  6 | a.b.c.e
  7 | a.b.c.g
  8 | a.b.c.h
(13 rows)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top