Question

I have been wondering whether it's possible to do multidimensional tables in PostgreSQL. Here's an example table from my project:

     id |  created_by   |          content           | comments |   
     ---+---------------+----------------------------+----------+
      1 | Anonymous     | does this thing work?      |          | 
      2 | James         | this is the body           |          | 
      3 | Chan          | this must work this time~! |          | 
      4 | Freak         | just to add something new  |          | 
      5 | Anonymous     | yahoo!                     |          | 

What do I mean by multidimensional table? It would look like something like this if there's such thing.

     id |  created_by   |          content           |             comments                     |   
     ---+---------------+----------------------------+------------------------------------------+
      1 | Anonymous     | does this thing work?      | id | created_by |         comments       | 
      2 | James         | this is the body           |                 |created_by|   comments  | 
      3 | Chan          | this must work this time~! |                                  | 
      4 | Freak         | just to add something new  |                                  | 
      5 | Anonymous     | yahoo!                     |                                  | 

This is just an example. But the key concept is that in every comment, there's another set of columns, making comments sort of like a table by itself.

So yeah, does this exist in Postgres or is there any better way to implement this feature? :)

Was it helpful?

Solution

I would like to convince you, if possible, to not encode your data this way, (independent of how terrible an idea it is)

Lets suppose you have a really hot post, goes viral, et-cetera. That means all of your users are viewing it and many are trying to comment on it. with all of your nested discussion embedded in a single row, all updates must apply to that row. This in turn means that every update on that discussion competes with every other to update that one attribute. As you might imagine, this write contention will make your database slow way down.

A second reason is that it violates the rules of first normal form; in the sense that the comment attribute on the table you're showing contains more than one value. The motivating reasoning for this widely applied rule is that it makes a larger number of queries possible. In your design, it would be very difficult to delete from COMMENTS where USER = 'spammy-user'*, or even select * from COMMENTS where text like '%Trending Topic%'. In general, if you might ever want to look at part of a value in a column, rather than the whole thing, then you're probably looking at an opportunity for normalization.

The rule I try to use is "each 'kind of thing' gets its own table". as comments are a 'kind of thing', we'll split them out:

create table COMMENTS(
    COMMENT_ID serial primary key,
    POST_ID integer not null foreign key references POSTS(ID),
    PARENT_COMMENT_ID integer foreign key references COMMENTS(COMMENT_ID),
    CREATED_BY ...
    CONTENT ...
)

with the convention that comments having a null parent_comment_id are the roots of threaded discussions.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top