Question

Using PostgreSQL 11.

I've tried digging through both Stack Overflow and here and was unable to find an answer on best practices.

I'm working on a database design and have arrived at a schema that uses a generic "join table". This join table contains five columns:

CREATE TABLE many_joins_table (
  id PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
  object_id int NOT NULL,
  object_table joins_object_t NOT NULL,
  parent_id int NOT NULL,
  parent_table joins_parent_t NOT NULL);

I've been using this table to represent adjacent and many-to-many relationships between objects in my database. One such example is tags.

CREATE TABLE tag (
  id PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
  name text NOT NULL UNIQUE);

CREATE TABLE comment (
  id PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY);


CREATE TYPE joins_object_t AS ENUM ('tag');
CREATE TYPE joins_parent_t AS ENUM ('comment');

When a tag is added to the comment table, I would insert a new row into this join table with the following fields:

INSERT INTO many_joins_table
VALUES(1, 'tag'::joins_object_t, 1, 'comment'::joins_parent_t);

Besides inflexibility of enums, addressed with PostgreSQL 9.1 https://stackoverflow.com/questions/1771543/adding-a-new-value-to-an-existing-enum-type/7834949#7834949.

Are there any significant disadvantages or advantages of such an approach? I'm worried that I've mistakenly implemented an anti-pattern. Are there any best practices I can apply to improve this implementation (indexing or constraints)?

Thanks!

Note: I'm aware there are better ways to implementing tags, namely using intarrays. I'm just using tags as an example since it's easily understood. https://stackoverflow.com/questions/23508551/integer-array-lookup-using-postgres

Edit: Removed the UUIDs since it may be a distraction to the question.

Was it helpful?

Solution

First, stay off enums for things like this. Enum values can never be removed, so use them only if you are sure that that will never be necessary, which doesn't seem the case here.

Anyway, I would say that your design is too complicated, and still lacks the crucial feature of referential integrity.

Use a junction table for each pair of objects that can be related. This way, you

  • make clear which objects can be related

  • can have referential integrity

Having many tables is something that a database is good at. If you contend that you have 1000 tables and each object can be related to each other, that would be too many tables. But in that case you should probably go for a model where you have not a table per object type anyway.

OTHER TIPS

I think you're making this complex. First that blog is bad advise. Welcome to the internet. Just ignore it.

Use a int PRIMARY KEY GENERATED BY IDENTITY AS DEFAULT (ie, IDENTITY COLUMN until you have a reason not too

Now you have two things..

  • comments
  • tags

Both of these can be hierarchical. That's the only thing they have in common. Tables don't model different structures of data. Tables hold data. And the whole purpose of a relational database is to model the relations of the data. This is totally perverted when you model abstract schemas that fit all of your data.

Questions for you,

  • Do you need a hierarchy? This is more complex and slow. It does a lot more. Learn recursive queries, and you can make it fast enough for most workloads though. If not, don't model things hierarchically for fun. StackOverflow doesn't have hierarchical tags -- they're pretty successful.
  • So you need a hierarchy? Does it need multiple inheritance? You can reply to one answer or one question with a "comment" on this network. Think about how much more complex it would be to be able to reply to multiple answers with the same comment. Think about the user-interface. You can go there, do you need to?

Assuming you need single-inheritance, you can do

CREATE TABLE tag (
  tag_id     int   PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
  tag        name  text,
  parent_tag int   REFERENCES tag
);

For more information on this see , and specifically my answer here which addresses threaded comments. This is all well and good, but now you need a recursive query to query this table. You also need to pin it to a question. Does a question get tagged with all subtags? Does it get tagged with all parent-tags? Can a question be tagged with two tags in the same hierarchy?

Frequently with tags, it's easier to just do..

-- case insensitive
CREATE EXTENSION citext;

CREATE OR REPLACE FUNCTION array_lacks_dupes(anyarray)
RETURNS bool
AS $$
        SELECT coalesce(hasdupe,nodupe) AS hasdupe
        FROM (VALUES (true)) AS t(nodupe)
        LEFT OUTER JOIN (
                SELECT false
                FROM unnest($1) AS e
                GROUP BY e
                HAVING count(*) > 1
                LIMIT 1
        ) AS g(hasdupe)
        ON true
$$ LANGUAGE sql
STRICT IMMUTABLE;


CREATE TABLE question (
  question_id int      PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
  tags        citext[] CHECK (array_lacks_dupes(tags))
);

CREATE INDEX ON question USING gin (tags);

With the above schema, you don't have to query it in some weird way and you can still resolve the query on an index (you can even add other arbitrary things on that index and do it all in one lookup)

SELECT * FORM question
WHERE tags @> ARRAY['foo']::citext;

That will find any questions tagged 'foo' on an index! Want to find if you match multiple tags?

WHERE tags @> ARRAY['foo', 'bar']::citext;
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top