Question

I've been asked to query a time logging database, to display all the work done for given projects. Each project is broken into tasks, each of which may itself be broken into tasks. The task hierarchy can be an arbitrary number of levels deep. Part of the requirement is to provide total time worked for each task or node in the hierarchy (not just the leaf level nodes but all nodes, including the top level project node, the leaf level nodes and all nodes in between).

Working with such a hierarchy I assume it may be useful to use the HIERARCHYID data type. Is there any way of doing something like a SUM with ROLLUP on a hierarchy, to give the sub-totals for each node in the hierarchy?

I assumed this sort of aggregate rollup on a hierarchy would be a common requirement but I've had no luck at all finding how to do it, or even if it is possible.

Was it helpful?

Solution

Figured out how to do it. The method is a bit convoluted, perhaps someone else can come up with a neater version.

The method involves four steps:

  1. Run the ROW_NUMBER function over all the tasks for the given projects. Partition by ParentId so that all the child tasks of a given parent are numbered 1, 2, 3, 4, etc. This works at all levels of the task hierarchy;

  2. Use a recursive CTE (common table expression) to walk up the task hierarchy from the leaf level to the top. This will build up the structure of the task hierarchy from the parent-child relationship within the TimeCode table. Originally I tried to include the ROW_NUMBER function here but that didn't work due to the way Microsoft has implemented CTEs;

  3. Add a HIERARCHYID column to the structure built up in step 2;

  4. Do a self-join on the recordset to get all the children of each node in the structure. Group by the parent node and sum the times recorded for each child node. Note that the HIERARCHYID method IsDescendantOf returns not only the children of a node but the node itself as well. So if any time has been recorded against the parent task as well as the children it will be included in the total time for that parent node.

Here's the script:

-- Cannot include a ROW_NUMBER function within the recursive member of the 
--    common table expression as SQL Server recurses depth first. ie SQL 
--    Server recurses each row separately, completing the recursion for a 
--    given row before starting the next.
-- To get around this, use ROW_NUMBER outside the common table expression.

DECLARE @tblTask TABLE (TimeCodeId INT, ParentId INT, ProjectID INT, 
    Level INT, TaskIndex VARCHAR(12), Duration FLOAT);

INSERT INTO @tblTask (TimeCodeId, ParentId, ProjectID, 
    Level, TaskIndex, Duration)
SELECT tc.TimeCodeId, 
    tc.ParentId, 
    CASE
        WHEN tc.ParentId IS NULL THEN tc.ReferenceId1
        ELSE tc.ReferenceId2
    END AS ProjectID, 
    1 AS Level, 
    CAST(ROW_NUMBER() OVER (PARTITION BY tc.ParentId 
                            ORDER BY tc.[Description]) AS VARCHAR(12)) 
                                                            AS TaskIndex, 
    ts.Duration            
FROM Time.TimeCode tc 
    LEFT JOIN 
    (    -- Get time sub-totals for each task.
        SELECT TimeCodeId, 
            SUM(Duration) AS Duration
        FROM Time.Timesheet
        WHERE ReferenceId2 IN (12196, 12198)
        GROUP BY TimeCodeId
    ) ts
    ON tc.TimeCodeId = ts.TimeCodeId
WHERE ReferenceId2 IN (12196, 12198)
ORDER BY [Description];

DECLARE @tblHierarchy TABLE (HierarchyNode HIERARCHYID, 
    Level INT, Duration FLOAT);

-- Common table expression that builds up the task hierarchy recursively.
WITH cte_task_hierarchy AS 
(
    -- Anchor member.
    SELECT t.TimeCodeId,
        t.ParentID,  
        t.ProjectID, 
        t.Level, 
        CAST('/' + t.TaskIndex + '/' AS VARCHAR(200)) AS HierarchyNodeText, 
        t.Duration            
    FROM @tblTask t

    UNION ALL

    -- Dummy root node for HIERARCHYID.
    --    (easier to add it after another query so don't have to cast the 
    --    NULLs to data types)
    SELECT NULL AS TimeCodeId, 
        NULL AS ParentID, 
        NULL AS ProjectID, 
        0 AS Level, 
        CAST('/' AS VARCHAR(200)) AS HierarchyNodeText, 
        NULL AS Duration

    UNION ALL 

    -- Recursive member that walks up the task hierarchy.
    SELECT tp.TimeCodeId, 
        tp.ParentID,  
        th.ProjectID, 
        th.Level + 1 AS Level, 
        CAST('/' + tp.TaskIndex + th.HierarchyNodeText AS VARCHAR(200)) 
            AS HierarchyNodeText,
        th.Duration
    FROM cte_task_hierarchy th 
        JOIN @tblTask tp ON th.ParentID = tp.TimeCodeId 
)
INSERT INTO @tblHierarchy (HierarchyNode, 
    Level, Duration)
SELECT hierarchyid::Parse(cth.HierarchyNodeText), 
    cth.Level, cth.Duration
FROM cte_task_hierarchy cth 
-- This filters recordset to exclude intermediate steps in the recursion 
--    - only want the final result.
WHERE cth.ParentId IS NULL
ORDER BY cth.HierarchyNodeText;

-- Show the task hierarchy.
SELECT *, HierarchyNode.ToString() AS NodeText
FROM @tblHierarchy;

-- Calculate the sub-totals for each task in the hierarchy.
SELECT t1.HierarchyNode.ToString() AS NodeText, 
    COALESCE(SUM(t2.Duration), 0) AS DurationTotal
FROM @tblHierarchy t1 
    JOIN @tblHierarchy t2 
        ON t2.HierarchyNode.IsDescendantOf(t1.HierarchyNode) = 1
GROUP BY t1.HierarchyNode;

Results:

First Recordset (task structure with HIERARCHYID column):

HierarchyNode    Level    Duration    NodeText
-------------    -----   --------    --------
0x               0        NULL       /
0x58             1        NULL       /1/
0x5AC0           2        12.15      /1/1/
0x5AD6           3        8.92       /1/1/1/
0x5ADA           3        11.08      /1/1/2/
0x5ADE           3        7          /1/1/3/
0x5B40           2        182.18     /1/2/
0x5B56           3        233.71     /1/2/1/
0x5B5A           3        227.27     /1/2/2/
0x5BC0           2        45.4       /1/3/
0x68             1        NULL       /2/
0x6AC0           2        8.5        /2/1/
0x6B40           2        2.17       /2/2/
0x6BC0           2        8.91       /2/3/
0x6C20           2        1.75       /2/4/
0x6C60           2        60.25      /2/5/

Second Recordset (tasks with sub-totals for each task):

NodeText    DurationTotal
--------    -------------
/            809.29
/1/          727.71
/1/1/        39.15
/1/1/1/      8.92
/1/1/2/      11.08
/1/1/3/      7
/1/2/        643.16
/1/2/1/      233.71
/1/2/2/      227.27
/1/3/        45.4
/2/          81.58
/2/1/        8.5
/2/2/        2.17
/2/3/        8.91
/2/4/        1.75
/2/5/        60.25

OTHER TIPS

Here is something that I tried and it worked OK. In this case I had the table Taxonomies which had id and ParentTaxonomyID which points to ID. In this stored procedure I wanted to count the number of associated problems that were associated with the taxonomy - but i wanted to sum them up through the hierarchy. Here is the stored procedure I used

ALTER FUNCTION [dbo].[func_NumberOfQuestions](  
@TaxonomyID INT )
RETURNS INT
AS
BEGIN

DECLARE @NChildren INT
SELECT  @NChildren = dbo.func_NumberOfTaxonomyChildren(@TaxonomyID)

DECLARE @NumberOfQuestions INT, @NumberOfDirectQuestions INT, 
    @NumberOfChildQuestions INT 

SELECT  @NumberOfDirectQuestions = COUNT(*) 
FROM    ProblemTaxonomies
WHERE   TaxonomyID = @TaxonomyID

SELECT @NumberOfChildQuestions = 0
IF @NChildren > 0
BEGIN
SELECT  @NumberOfChildQuestions = 
        ISNULL(SUM(dbo.func_NumberOfQuestions(id)), 0)
FROM    Taxonomies
WHERE   ParentTaxonomyID = @TaxonomyID
END

RETURN @NumberOfDirectQuestions + @NumberOfChildQuestions
END

I used a function in T-SQL, this should be pretty obvious recursive call - but using SQL I was able to use the SUM function for the Children

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top