Querying tree- and graph-related data stored in a database efficiently is a rather vast topic.
In terms of storage, note that storing an (id, parent_id)
pair will usually be the better (as in widely accepted) option.
The question is how to query it, and more importantly how to do so efficiently.
Your main options for trees include:
WITH queries: http://www.postgresql.org/docs/current/static/queries-with.html
Pros: Built-in, and works fine when dealing with small sets
Cons: Doesn't scale well for larger setsMPTT, aka pre-ordered trees: http://en.wikipedia.org/wiki/Tree_traversal
Pros: Fastest reads for trees
Cons: Slow writes, hard to maintain unless you do rows one by oneNested sets (or intervals) for trees: http://en.wikipedia.org/wiki/Nested_set_model
Pros: Fast reads for trees
Cons: Faster than MPTT but still slow, not trivial to understandThe ltree type in Postgres contrib: http://www.postgresql.org/docs/current/static/ltree.html
Pros: Built-in, indexable
Cons: Not ORM friendly
I'd add a hybrid variation of MPTT to the list: if you implement MPTT using float
indexes, you can get away with not updating anything when moving things around in your tree, which makes things plenty fast. It's a lot trickier to maintain however, because collisions can occur when the difference between two indexes is too small — you need to re-index a large enough subset of the tree when this happens.
For graphs, WITH
queries work too. Variations of MPTT and nested sets exist as well; for instance the GRIPP index. It's an area where research and new indexing methods are still quite active.