Is PostgreSQL's Ltree module a good fit for threaded comments?

https://stackoverflow.com/questions/603894

03-07-2019
|

Question

I'm considering using PostgreSQL's Ltree module in my application to help with threaded comments. I've been eying it for a while to use for threaded comments. I figure it would help with cases where you need to update a node and its children, like when you want to hide a comment and its replies.

I'm thinking ltree (or something like it) it would be useful if it was coupled with a traditional adjacency list ("comment_id"/"parent_comment_id").

Before taking the plunge into using ltree, I'm wondering a few things:

Are you, or have you, used ltree? Is it what one might call "production ready"?
If so, what problems did you use it to solve? Did it do a good job?
Do you think it is a good fit for a threaded comment system?
1. If you used it, what did you use for the "text" part of the path? Did you set up something like the DMOZ example they use "Top.Astronomy.Cosmology" or base it on something like the primary key "1.403.29.5"?
2. Is there a better way to do this? I'm a bit nervous using a nested list approach--everything I've read suggests that it isn't all to hot with UPDATES or INSERTS (don't you have to reorder the whole thing?). I'm also not a CS major and that kind of data structure is something I might forget in the future. Is anybody using nested lists for comments or something like it?

If it is of any help, here is the schema I'm considering:

CREATE TABLE comments (
    comment_id SERIAL PRIMARY KEY,
    parent_comment_id int REFERENCES comments(comment_id) ON UPDATE CASCADE ON DELETE CASCADE,
    thread_id int NOT NULL  REFERENCES threads(thread_id) ON UPDATE CASCADE ON DELETE CASCADE,
    path ltree NOT NULL,
    comment_body text NOT NULL,
    hide boolean not null default false
);

The "path" column, used by ltree, would look something like:

<thread_id>.<parent_comment_id_#1>.<parent_comment_id_#2>.<my_comment_id>

Is there anything wrong with using the primary keys in the path? Should I be including the node's own primary key in the path? If I did, would it make sense to put a unique index on it to serve as a constraint?

Solution

Yes and yes;
Hierarchy of sections in a knowledge base (one of the implementations);
Yes;

The definition of one of the tables in question:

                                                   Table "knowledgebase.section"
           Column           |           Type           |                                  Modifiers
----------------------------+--------------------------+-----------------------------------------------------------------------------
 section_sid                | integer                  | not null default nextval('knowledgebase.section_section_sid_seq'::regclass)
 section                    | character varying        | not null
 description                | character varying        |
 path                       | ltree                    | not null
 is_active                  | boolean                  | not null default true
 role_sid                   | integer                  | not null
 last_modified_by           | integer                  | not null
 creation_datetime          | timestamp with time zone | not null default now()
 last_modification_datetime | timestamp with time zone | not null default now()
 is_expanded                | boolean                  | not null default false
 section_idx                | tsvector                 |
Indexes:
    "section_sid_pkey" PRIMARY KEY, btree (section_sid)
    "section_section_key" UNIQUE, btree (section)
    "idxsection_idx" gist (section_idx)
    "path_gist_idx" gist (path)
Foreign-key constraints:
    "last_modified_by_fkey" FOREIGN KEY (last_modified_by) REFERENCES "user"."role"(role_sid) ON UPDATE CASCADE ON DELETE RESTRICT
    "role_sid_fkey" FOREIGN KEY (role_sid) REFERENCES "user"."role"(role_sid) ON  UPDATE CASCADE ON DELETE RESTRICT
Triggers:
    section_idx_update BEFORE INSERT OR UPDATE ON knowledgebase.section FOR EACH ROW EXECUTE PROCEDURE tsearch2('section_idx', 'section')

The "path" column uses the primary key as a label.

A sample of the current contents of that table (regarding the primary key and the "path" column):

  section_sid | path
 -------------+-------
           53 | 34.53
           56 | 56
           55 | 29.55
           35 | 35
           54 | 34.54
           37 | 30.37
          ... | ...

OTHER TIPS

I recommend anyone implementing hierarchical relationships in SQL read Joe Celko's Trees and Hierarchies in SQL for Smarties.

Traversing arbitrary depth parent child links can be very inefficient when using just a parent_id. The book describes techniques that make this access fast.

One strategy (which I happen to use) can also be found for free in this series of articles:

Part 1 (look for the section "Trees in SQL") (Wayback link)
Part 2 (Wayback link)
Part 3 (Wayback link)
Part 4 (Wayback link)

Version 8.4 of PostgreSQL will be bringing Common Table Expressions functionality into the core with WITH and WITH... RECURSIVE expressions. If you're modifying old code, you may want to wait until 8.4 is released, as then you won't have to worry about any incompatibilities between Ltree and the new core syntax. If you're working with old code, or do not want to wait for 8.4, you will probably want to make sure you write code that is easily translatable to the new syntax, especially if you're changing an old schema or designing a new one.