Parent/Child database contains cyclical references

https://stackoverflow.com/questions/7165595

09-01-2021
|

Question

I have a keyword table where each keyword is assigned an id and is unique. I have a second table that links ids of parent keywords to ids of child keywords. One keyword can have up to approx 800 children or have none at all. Children can be parents of yet more keywords (and on and on...)

The issue I am running into is a child (or grandchild or great grandchild) can be the parent of the initial keyword causing a cyclical structure. I am attempting to build a tree data structure for the initial keyword using a recursive function but the function either never ends, or goes past the 1000 level recursion limit in Python.

Is there a better way to design my parent/child table to prevent this (or do upfront checking during inserts) or is there a better way to write a recursive function to prevent this condition from happening? I tried to limit the depth of the recursion function but ran into single level issues (i.e. child being the parent of the parent). Again, my goal is to create a tree structure for an initial keyword.

Table Keyword:
    id int(11) not null primary key auto_increment (id of keyword)
    text varchar(255) unique (keyword text e.g. "computer help desk")

Table Keyword_Relation:
    id int(11) not null primary key auto_increment (id for parent/child combo, not keyword id)
    parent int(11) (id of parent keyword)
    child int(11) (id of child keyword)

Solution

You can start at the top of the tree, and just keep track of the nodes you've already found and ignore them.

def getchildren(node, ignore_nodes=[]):
    child_nodes = []
    for child in node.children():
        if child in ignore_nodes:
            continue
        child_nodes.append(child)
        ignore_nodes.append(child)
        nodes, ignore_nodes = getchildren(child, ignore_nodes)
        child_nodes.extend(nodes)
    return child_nodes, ignore_nodes

OTHER TIPS

What you are trying to do is create a topological sort. There are a number of methods published for doing this optimally and it will depend on your schema and preferred method.

In your case it sounds like you do not have multi-parent. But how I have dealt with it programmatically is to start with the leaf nodes (that is nodes with no children) and ascend the tree. While ascending, keep a collection of nodes you have encountered. If you ever repeat an encounter then a cycle exists and a topological sort is not possible.

You will not get an infinite loop, but it's certainly possible that your topology will have more than 1000 nodes... so recursion may not be possible for you.

EDIT: To answer your question about "better design".... If possible it may be advantageous to store the root node identifier. That is: given a parent, child, grandchild, greatgrandchild, greatgreat....grandchild

Each row will not only contain their immediate parent's ID but also the root node Parent ID... or some "known good" root node

IF you do this you can speed up topological sort methods by only ascending up until the root node, and only include sets having the same root node.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow