How to model dependent categories (parent-child-relation)?

https://dba.stackexchange.com/questions/283782

14-03-2021
|

Question

Idea / Goal

I have two tables: Article (represents a Product) and Category. The relation is like this: An Article can be assigned to only one Category but a Category may have at least one Articles, therefore, 1:n (n >= 1).

Now, I want to build a category tree that holds up to 5 dependent Category entries. The category-hierarchy is constant. I have given each Category a level field that tells me how deep I am in the tree (1-5, 1 being the main category, 5 being the last leaf possible). Now I still need to know how to navigate ... I thought that I could just set a parent_category field that points to the upper category. The main categories, however, do not have a parent_category respectively they are set to null.

With this approach, I could query the databases for each level. This would make sense since I want to navigate through my categories and then see the sub-categories on entering a category (imagine a URL like: %host%/category/sub-category).

Example

Please note: The data represented here is arbitrary. But the structure would be identical to this example below (I have omitted some other fields because they don't matter for this question):

Article

| id | name            | fk_category |
| -- | --------------- | ----------- |
| 1  | Apple           | 4           |
| 2  | Orange juice    | 3           |

Category

| id | name     | level | parent |
| -- | -------- | ----- | ------ |
| 1  | Fruits   | 1     | null   |
| 2  | Domestic | 2     | 1      |
| 3  | Juice    | 3     | 2      |
| 4  | Whatever | 3     | 2      |

The URL %host%/fruits/domestic would be resolved into a SQL-Query (simplified here) like:

SELECT name
FROM Category
WHERE level = 3

And then give me the names: "Juice" and "Whatever".

The URL %host%/fruits/domestic/whatever would be resolved into a SQL-Query (simplified here) like:

SELECT name
FROM Article
WHERE fk_category = 4

This should give me "Apple" (... and any other Articles that are within that category).

I am not sure yet how I should implement it that the Category also knows if it is a node without children. I could set a boolean for has_children.

Question

How would you model this relationship? Is it okay to model it the way I had in mind? Is there a better alternative? Does it violate any rules?

Solution

First you have to separate the nodes subordination from the nodes itself. You need a pivot table trees having exactly two columns: nodeID -- parentID. The node having parentID equal to NULL is a root node for certain tree. Pivot table trees can contain more than one tree in it. This table requires a primary key (nodeID) and an unique multicolumn key (nodeID, parentID) - each node should be listed once and each node can have only one parent. An additional restriction - node can not refer to itself as a parent.

CREATE TABLE trees (
  nodeID INT UNSIGNED NOT NULL,
  parentID INT UNSIGNED, -- nullable for root nodes
  PRIMARY KEY ( nodeID ), -- each node is listed once
  UNIQUE KEY  ( nodeID, parentID ), -- no duplicates
  CONSTRAINT treeselfref CHECK ( nodeID <> parentID ) -- can't be parent for itself
)

Then you need a set of procedures for nodes manipulations:

AddNode( nodeID, parentID )
DeleteNode( nodeID ) -- you have to manage orphan childs
DeleteBranch( nodeID ) -- and all its childs/subchilds
GetParent( nodeID )  -- for the given node
GetChilds( parentID ) -- whos parent is parentID
GetBranch( parentID ) -- childs and their childs up to the leaves, ranked
GetLeaves( nodeID ) -- all leaves attached to the node
GetPath( nodeID ) -- full subordination path from the root up to the node, ranked
MergeBranches( nodeIDa, nodeIDb )
MoveBranch( nodeID, newParentID )
. . . . .

An exact set of procedures and their logic depend on the items/categories nature. So here is the sample code for the GetLeaves( nodeID ) procedure, just for illustration of the basic principle:

WITH RECURSIVE cte (nodeID, parentID) AS 
( SELECT w.*   
    FROM trees AS w 
   WHERE w.nodeID = _nodeID  -- seed node for branch 
  UNION ALL
  SELECT q.*
    FROM cte   AS z
    JOIN trees AS q ON q.parentID = z.nodeID  -- childs, recurrent
) SELECT s.nodeID
    FROM cte        AS s
    LEFT JOIN trees AS r ON r.parentID = s.nodeID
  WHERE r.nodeID IS NULL; -- nodes not listed as parents for other nodes i.e. leaves

Here is the code for GetPath( nodeID ) procedure that returns a list of all parent nodes from the tree root up to the given node, with rank (distance):

WITH RECURSIVE cte (rank, nodeID, parentID) AS 
( SELECT 0 AS rank, w.*
    FROM trees AS w 
   WHERE w.nodeID = _nodeID
  UNION ALL
  SELECT z.rank+1, q.*
    FROM cte   AS z
    JOIN trees AS q  ON q.nodeID = z.parentID
) SELECT s.*, f.name 
    FROM cte        AS s
    JOIN nodes      AS n ON n.ID = s.nodeID
    JOIN categories AS f ON f.ID = n.categoryID
   ORDER BY s.rank DESC

When invoked for nodeID = 123 it will return something like that:

+----+------+--------+----------+
|rank|nodeID|parentID|name      |
+----+------+--------+----------+
|  5 | 1    | NULL   | root     |
|  4 | 7    | 1      | edible   |
|  3 | 11   | 7      | fruit    |
|  2 | 31   | 11     | domestic |
|  1 | 42   | 31     | apple    |
|  0 | 123  | 42     | mcintosh |
+----+------+--------+----------+

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange