Extracting network/tree structure from table in PostgreSQL

Question 1

Querying tree- and graph-related data stored in a database efficiently is a rather vast topic.

In terms of storage, note that storing an (id, parent_id) pair will usually be the better (as in widely accepted) option.

The question is how to query it, and more importantly how to do so efficiently.

Your main options for trees include:

WITH queries: http://www.postgresql.org/docs/current/static/queries-with.html

Pros: Built-in, and works fine when dealing with small sets
Cons: Doesn't scale well for larger sets
MPTT, aka pre-ordered trees: http://en.wikipedia.org/wiki/Tree_traversal

Pros: Fastest reads for trees
Cons: Slow writes, hard to maintain unless you do rows one by one
Nested sets (or intervals) for trees: http://en.wikipedia.org/wiki/Nested_set_model

Pros: Fast reads for trees
Cons: Faster than MPTT but still slow, not trivial to understand
The ltree type in Postgres contrib: http://www.postgresql.org/docs/current/static/ltree.html

Pros: Built-in, indexable
Cons: Not ORM friendly

I'd add a hybrid variation of MPTT to the list: if you implement MPTT using float indexes, you can get away with not updating anything when moving things around in your tree, which makes things plenty fast. It's a lot trickier to maintain however, because collisions can occur when the difference between two indexes is too small — you need to re-index a large enough subset of the tree when this happens.

For graphs, WITH queries work too. Variations of MPTT and nested sets exist as well; for instance the GRIPP index. It's an area where research and new indexing methods are still quite active.

Question 2

Your best best is to work with the ltree data type. See the documentation here. That does require that you rework your table structure a bit though. If that is not an option, you should look at recursive with-queries that can - at first sight - work with your current table structure, but the queries will provide data in a format that is not as easy to manipulate as ltree data.

Converting your current table to a ltree variant is best done using a recursive with-query. First you need to create a new table to hold the ltree column:

CREATE TABLE tree_list (
  id int,
  chain ltree
);

Then run the recursive query and insert the results into the new table:

WITH RECURSIVE build_tree(id, chain) AS ( 
  SELECT id, con::ltree || succ
  FROM tree
  WHERE con = 'a'
UNION ALL
  SELECT tree.id, build_tree.chain || tree.succ
  FROM tree, build_tree
  WHERE build_tree.chain  ~ ('*.' || tree.con)::lquery)
INSERT INTO tree_list SELECT * FROM build_tree;

You will note that the 10 rows of data you provide above will yield 13 chains because there are multiple paths from a to each of e, g and h. This query should work with trees of practically unlimited depth.

 id |  chain  
----+---------
  1 | a.b
  2 | a.c
  3 | a.d
  4 | a.b.c
  5 | a.b.f
  6 | a.c.e
  7 | a.c.g
  8 | a.c.h
  9 | a.d.h
 10 | a.d.i
  6 | a.b.c.e
  7 | a.b.c.g
  8 | a.b.c.h
(13 rows)