How to model dependent categories (parent-child-relation)?
-
14-03-2021 - |
Pergunta
Idea / Goal
I have two tables: Article
(represents a Product) and Category
. The relation is like this: An Article
can be assigned to only one Category
but a Category
may have at least one Article
s, therefore, 1:n (n >= 1)
.
Now, I want to build a category tree that holds up to 5 dependent Category
entries. The category-hierarchy is constant. I have given each Category
a level
field that tells me how deep I am in the tree (1-5, 1 being the main category, 5 being the last leaf possible). Now I still need to know how to navigate ... I thought that I could just set a parent_category
field that points to the upper category. The main categories, however, do not have a parent_category
respectively they are set to null
.
With this approach, I could query the databases for each level. This would make sense since I want to navigate through my categories and then see the sub-categories on entering a category (imagine a URL like: %host%/category/sub-category
).
Example
Please note: The data represented here is arbitrary. But the structure would be identical to this example below (I have omitted some other fields because they don't matter for this question):
Article
| id | name | fk_category | | -- | --------------- | ----------- | | 1 | Apple | 4 | | 2 | Orange juice | 3 |
Category
| id | name | level | parent | | -- | -------- | ----- | ------ | | 1 | Fruits | 1 | null | | 2 | Domestic | 2 | 1 | | 3 | Juice | 3 | 2 | | 4 | Whatever | 3 | 2 |
The URL %host%/fruits/domestic
would be resolved into a SQL-Query (simplified here) like:
SELECT name
FROM Category
WHERE level = 3
And then give me the names: "Juice" and "Whatever".
The URL %host%/fruits/domestic/whatever
would be resolved into a SQL-Query (simplified here) like:
SELECT name
FROM Article
WHERE fk_category = 4
This should give me "Apple" (... and any other Articles that are within that category).
I am not sure yet how I should implement it that the Category also knows if it is a node without children. I could set a boolean
for has_children
.
Question
How would you model this relationship? Is it okay to model it the way I had in mind? Is there a better alternative? Does it violate any rules?
Solução
First you have to separate the nodes subordination from the nodes itself. You need a pivot table trees
having exactly two columns: nodeID -- parentID
. The node having parentID
equal to NULL is a root node for certain tree. Pivot table trees
can contain more than one tree in it. This table requires a primary key (nodeID) and an unique multicolumn key (nodeID, parentID) - each node should be listed once and each node can have only one parent. An additional restriction - node can not refer to itself as a parent.
CREATE TABLE trees (
nodeID INT UNSIGNED NOT NULL,
parentID INT UNSIGNED, -- nullable for root nodes
PRIMARY KEY ( nodeID ), -- each node is listed once
UNIQUE KEY ( nodeID, parentID ), -- no duplicates
CONSTRAINT treeselfref CHECK ( nodeID <> parentID ) -- can't be parent for itself
)
Then you need a set of procedures for nodes manipulations:
AddNode( nodeID, parentID )
DeleteNode( nodeID ) -- you have to manage orphan childs
DeleteBranch( nodeID ) -- and all its childs/subchilds
GetParent( nodeID ) -- for the given node
GetChilds( parentID ) -- whos parent is parentID
GetBranch( parentID ) -- childs and their childs up to the leaves, ranked
GetLeaves( nodeID ) -- all leaves attached to the node
GetPath( nodeID ) -- full subordination path from the root up to the node, ranked
MergeBranches( nodeIDa, nodeIDb )
MoveBranch( nodeID, newParentID )
. . . . .
An exact set of procedures and their logic depend on the items/categories nature. So here is the sample code for the GetLeaves( nodeID )
procedure, just for illustration of the basic principle:
WITH RECURSIVE cte (nodeID, parentID) AS
( SELECT w.*
FROM trees AS w
WHERE w.nodeID = _nodeID -- seed node for branch
UNION ALL
SELECT q.*
FROM cte AS z
JOIN trees AS q ON q.parentID = z.nodeID -- childs, recurrent
) SELECT s.nodeID
FROM cte AS s
LEFT JOIN trees AS r ON r.parentID = s.nodeID
WHERE r.nodeID IS NULL; -- nodes not listed as parents for other nodes i.e. leaves
Here is the code for GetPath( nodeID )
procedure that returns a list of all parent nodes from the tree root up to the given node, with rank (distance):
WITH RECURSIVE cte (rank, nodeID, parentID) AS
( SELECT 0 AS rank, w.*
FROM trees AS w
WHERE w.nodeID = _nodeID
UNION ALL
SELECT z.rank+1, q.*
FROM cte AS z
JOIN trees AS q ON q.nodeID = z.parentID
) SELECT s.*, f.name
FROM cte AS s
JOIN nodes AS n ON n.ID = s.nodeID
JOIN categories AS f ON f.ID = n.categoryID
ORDER BY s.rank DESC
When invoked for nodeID = 123 it will return something like that:
+----+------+--------+----------+
|rank|nodeID|parentID|name |
+----+------+--------+----------+
| 5 | 1 | NULL | root |
| 4 | 7 | 1 | edible |
| 3 | 11 | 7 | fruit |
| 2 | 31 | 11 | domestic |
| 1 | 42 | 31 | apple |
| 0 | 123 | 42 | mcintosh |
+----+------+--------+----------+