Question

I have a Table that stores Hierarchical information using the Adjacency List model. (uses a self referential key - example below. This Table may look familiar):

category_id name                 parent
----------- -------------------- -----------
1           ELECTRONICS          NULL
2           TELEVISIONS          1
3           TUBE                 2
4           LCD                  2
5           PLASMA               2
6           PORTABLE ELECTRONICS 1
7           MP3 PLAYERS          6
8           FLASH                7
9           CD PLAYERS           6
10          2 WAY RADIOS         6

What is the best method to "flatten" the above data into something like this?

category_id lvl1        lvl2        lvl3        lvl4
----------- ----------- ----------- ----------- -----------
1           1           NULL        NULL        NULL
2           1           2           NULL        NULL
6           1           6           NULL        NULL
3           1           2           3           NULL
4           1           2           4           NULL
5           1           2           5           NULL
7           1           6           7           NULL
9           1           6           9           NULL
10          1           6           10          NULL
8           1           6           7           8

Each row is one "Path" through the Hierarchy, except there is a row for each node (not just each leaf node). The category_id column represents the current node and the "lvl" columns are its ancestors. The value for the current node must also be in the farthest right lvl column. The value in the lvl1 column will always represent the root node, values in lvl2 will always represent direct descendants of lvl1, and so on.

If possible the method to generate this output would be in SQL, and would work for n-tier hierarchies.

Was it helpful?

Solution

To do multi-level queries across a simple adjacency-list invariably involves self-left-joins. It's easy to make a right-aligned table:

SELECT category.category_id,
    ancestor4.category_id AS lvl4,
    ancestor3.category_id AS lvl3,
    ancestor2.category_id AS lvl2,
    ancestor1.category_id AS lvl1
FROM categories AS category
    LEFT JOIN categories AS ancestor1 ON ancestor1.category_id=category.category_id
    LEFT JOIN categories AS ancestor2 ON ancestor2.category_id=ancestor1.parent
    LEFT JOIN categories AS ancestor3 ON ancestor3.category_id=ancestor2.parent
    LEFT JOIN categories AS ancestor4 ON ancestor4.category_id=ancestor3.parent;

To left-align it like your example is a bit more tricky. This comes to mind:

SELECT category.category_id,
    ancestor1.category_id AS lvl1,
    ancestor2.category_id AS lvl2,
    ancestor3.category_id AS lvl3,
    ancestor4.category_id AS lvl4
FROM categories AS category
    LEFT JOIN categories AS ancestor1 ON ancestor1.parent IS NULL
    LEFT JOIN categories AS ancestor2 ON ancestor1.category_id<>category.category_id AND ancestor2.parent=ancestor1.category_id
    LEFT JOIN categories AS ancestor3 ON ancestor2.category_id<>category.category_id AND ancestor3.parent=ancestor2.category_id
    LEFT JOIN categories AS ancestor4 ON ancestor3.category_id<>category.category_id AND ancestor4.parent=ancestor3.category_id
WHERE
    ancestor1.category_id=category.category_id OR
    ancestor2.category_id=category.category_id OR
    ancestor3.category_id=category.category_id OR
    ancestor4.category_id=category.category_id;

would work for n-tier hierarchies.

Sorry, arbitrary-depth queries are not possible in the adjacency-list model. If you are doing this kind of query a lot, you should change your schema to one of the other models of storing hierarchical information: full adjacency relation (storing all ancestor-descendent relationships), materialised path, or nested sets.

If the categories don't move around a lot (which is usually the case for a store like your example), I would tend towards nested sets.

OTHER TIPS

As mentioned, SQL has no clean way to implement tables with dynamically varying numbers of columns. The only two solutions I have used before are: 1. A fixed number Self-Joins, giving a fixed number of columns (AS per BobInce) 2. Generate the results as a string in a single column

The second one sounds grotesque initially; storing IDs as string?! But when the output is formatted as XML or something, people don't seem to mind so much.

Equally, this is of very little use if you then want to join on the results in SQL. If the result is to be supplied to an Application, it Can be very suitable. Personally, however, I prefer to do the flattening in the Application rather than SQL


I'm stuck here on a 10 inch screen without access to SQL so I can't give the tested code, but the basic method would be to utilise recursion in some way;
- A recursive scalar function can do this
- MS SQL can do this using a recursive WITH statement (more efficient)

Scalar Function (something like):

CREATE FUNCTION getGraphWalk(@child_id INT)
RETURNS VARCHAR(4000)
AS
BEGIN

  DECLARE @graph VARCHAR(4000)

  -- This step assumes each child only has one parent
  SELECT
    @graph = dbo.getGraphWalk(parent_id)
  FROM
    mapping_table
  WHERE
    category_id = @child_id
    AND parent_id IS NOT NULL

  IF (@graph  IS NULL)
    SET @graph = CAST(@child_id AS VARCHAR(16))
  ELSE
    SET @graph = @graph + ',' + CAST(@child_id AS VARCHAR(16))

  RETURN @graph

END


SELECT
  category_id                         AS [category_id],
  dbo.getGraphWalk(category_id)       AS [graph_path]
FROM
  mapping_table
ORDER BY
  category_id

I haven't used a recursive WITH in a while, but I'll give the syntax a go even though I don't have SQL here to test anything :)

Recursive WITH

WITH
  result (
    category_id,
    graph_path
  )
AS
(
  SELECT
    category_id,
    CAST(category_id AS VARCHAR(4000))
  FROM
    mapping_table
  WHERE
    parent_id IS NULL

  UNION ALL

  SELECT
    mapping_table.category_id,
    CAST(result.graph_path + ',' + CAST(mapping_table.category_id AS VARCHAR(16)) AS VARCHAR(4000))
  FROM
    result
  INNER JOIN
    mapping_table
      ON result.category_id = mapping_table.parent_id
)

SELECT
  *
FROM
  result
ORDER BY
  category_id


EDIT - OUTPUT for both is the same:

1   '1'
2   '1,2'
3   '1,2,3'
4   '1,2,4'
5   '1,2,5'
6   '1,6'
7   '1,6,7'
8   '1,6,7,8'
9   '1,6,9'

Traversing a tree of arbitrary depth generally involves recursive procedural code, unless you make use of the special features of some DBMS.

In Oracle, the CONNECT BY clause will permit you to traverse the tree in depth first order if you use adjacency list, as you did here.

If you use nested sets, the left sequence number will provide you with the order to visit the nodes.

Actually can be done with dynamic SQL within a stores procedure. You then become limited to what can be done sith the stored procedure. Obviously it becomes a challenge to EXEC the results into a temporary table not knowing how many columns to expect. However, if the goal is to output to a web page or other UI then it may be worth the effort...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top