Question

I have some hierarchical data - each entry has an id and a (nullable) parent entry id. I want to retrieve all entries in the tree under a given entry. This is in a SQL Server 2005 database. I am querying it with LINQ to SQL in C# 3.5.

LINQ to SQL does not support Common Table Expressions directly. My choices are to assemble the data in code with several LINQ queries, or to make a view on the database that surfaces a CTE.

Which option (or another option) do you think will perform better when data volumes get large? Is SQL Server 2008's HierarchyId type supported in Linq to SQL?

Was it helpful?

Solution

I would set up a view and an associated table-based function based on the CTE. My reasoning for this is that, while you could implement the logic on the application side, this would involve sending the intermediate data over the wire for computation in the application. Using the DBML designer, the view translates into a Table entity. You can then associate the function with the Table entity and invoke the method created on the DataContext to derive objects of the type defined by the view. Using the table-based function allows the query engine to take your parameters into account while constructing the result set rather than applying a condition on the result set defined by the view after the fact.

CREATE TABLE [dbo].[hierarchical_table](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [parent_id] [int] NULL,
    [data] [varchar](255) NOT NULL,
 CONSTRAINT [PK_hierarchical_table] PRIMARY KEY CLUSTERED 
(
    [id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

CREATE VIEW [dbo].[vw_recursive_view]
AS
WITH hierarchy_cte(id, parent_id, data, lvl) AS
(SELECT     id, parent_id, data, 0 AS lvl
      FROM         dbo.hierarchical_table
      WHERE     (parent_id IS NULL)
      UNION ALL
      SELECT     t1.id, t1.parent_id, t1.data, h.lvl + 1 AS lvl
      FROM         dbo.hierarchical_table AS t1 INNER JOIN
                            hierarchy_cte AS h ON t1.parent_id = h.id)
SELECT     id, parent_id, data, lvl
FROM         hierarchy_cte AS result


CREATE FUNCTION [dbo].[fn_tree_for_parent] 
(
    @parent int
)
RETURNS 
@result TABLE 
(
    id int not null,
    parent_id int,
    data varchar(255) not null,
    lvl int not null
)
AS
BEGIN
    WITH hierarchy_cte(id, parent_id, data, lvl) AS
   (SELECT     id, parent_id, data, 0 AS lvl
        FROM         dbo.hierarchical_table
        WHERE     (id = @parent OR (parent_id IS NULL AND @parent IS NULL))
        UNION ALL
        SELECT     t1.id, t1.parent_id, t1.data, h.lvl + 1 AS lvl
        FROM         dbo.hierarchical_table AS t1 INNER JOIN
            hierarchy_cte AS h ON t1.parent_id = h.id)
    INSERT INTO @result
    SELECT     id, parent_id, data, lvl
    FROM         hierarchy_cte AS result
RETURN 
END

ALTER TABLE [dbo].[hierarchical_table]  WITH CHECK ADD  CONSTRAINT [FK_hierarchical_table_hierarchical_table] FOREIGN KEY([parent_id])
REFERENCES [dbo].[hierarchical_table] ([id])

ALTER TABLE [dbo].[hierarchical_table] CHECK CONSTRAINT [FK_hierarchical_table_hierarchical_table]

To use it you would do something like -- assuming some reasonable naming scheme:

using (DataContext dc = new HierarchicalDataContext())
{
    HierarchicalTableEntity h = (from e in dc.HierarchicalTableEntities
                                 select e).First();
    var query = dc.FnTreeForParent( h.ID );
    foreach (HierarchicalTableViewEntity entity in query) {
        ...process the tree node...
    }
}

OTHER TIPS

This option might also prove useful:

LINQ AsHierarchy() extension method
http://www.scip.be/index.php?Page=ArticlesNET18

I am surprised nobody has mentioned an alternative database design - when hierarchy needs to be flattened from multiple levels and retrieved with high performance (not so considering storage space) it is better to use another entity-2-entity table to track hierarchy instead of parent_id approach.

It will allow not only single parent relations but also multi parent relations, level indications and different types of relationships:

CREATE TABLE Person (
  Id INTEGER,
  Name TEXT
);

CREATE TABLE PersonInPerson (
  PersonId INTEGER NOT NULL,
  InPersonId INTEGER NOT NULL,
  Level INTEGER,
  RelationKind VARCHAR(1)
);

I have done this two ways:

  1. Drive the retrieval of each layer of the tree based on user input. Imagine a tree view control populated with the root node, the children of the root, and the grandchildren of the root. Only the root and the children are expanded (grandchildren are hidden with the collapse). As the user expands a child node the grandchildren of the root are display (that were previously retrieved and hidden), and a retrieval of all of the great-grandchildren is launched. Repeat the pattern for N-layers deep. This pattern works very well for large trees (depth or width) because it only retrieves the portion of the tree needed.
  2. Use a stored procedure with LINQ. Use something like a common table expression on the server to build your results in a flat table, or build an XML tree in T-SQL. Scott Guthrie has a great article about using stored procs in LINQ. Build your tree from the results when they come back if in a flat format, or use the XML tree if that is that is what you return.

This extension method could potentially be modified to use IQueryable. I've used it succesfully in the past on a collection of objects. It may work for your scenario.

public static IEnumerable<T> ByHierarchy<T>(
 this IEnumerable<T> source, Func<T, bool> startWith, Func<T, T, bool> connectBy)
{
  if (source == null)
   throw new ArgumentNullException("source");

  if (startWith == null)
   throw new ArgumentNullException("startWith");

  if (connectBy == null)
   throw new ArgumentNullException("connectBy");

  foreach (T root in source.Where(startWith))
  {
   yield return root;
   foreach (T child in source.ByHierarchy(c => connectBy(root, c), connectBy))
   {
    yield return child;
   }
 }
}

Here is how I called it:

comments.ByHierarchy(comment => comment.ParentNum == parentNum, 
 (parent, child) => child.ParentNum == parent.CommentNum && includeChildren)

This code is an improved, bug-fixed version of the code found here.

In MS SQL 2008 you could use HierarchyID directly, in sql2005 you may have to implement them manually. ParentID is not that performant on large data sets. Also check this article for more discussion on the topic.

I got this approach from Rob Conery's blog (check around Pt. 6 for this code, also on codeplex) and I love using it. This could be refashioned to support multiple "sub" levels.

var categories = from c in db.Categories
                 select new Category
                 {
                     CategoryID = c.CategoryID,
                     ParentCategoryID = c.ParentCategoryID,
                     SubCategories = new List<Category>(
                                      from sc in db.Categories
                                      where sc.ParentCategoryID == c.CategoryID
                                      select new Category {
                                        CategoryID = sc.CategoryID, 
                                        ParentProductID = sc.ParentProductID
                                        }
                                      )
                             };

The trouble with fetching the data from the client side is that you can never be sure how deep you need to go. This method will do one roundtrip per depth and it could be union'd to do from 0 to a specified depth in one roundtrip.

public IQueryable<Node> GetChildrenAtDepth(int NodeID, int depth)
{
  IQueryable<Node> query = db.Nodes.Where(n => n.NodeID == NodeID);
  for(int i = 0; i < depth; i++)
    query = query.SelectMany(n => n.Children);
       //use this if the Children association has not been defined
    //query = query.SelectMany(n => db.Nodes.Where(c => c.ParentID == n.NodeID));
  return query;
}

It can't, however, do arbitrary depth. If you really do require arbitrary depth, you need to do that in the database - so you can make the correct decision to stop.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top