Hierarchical data in Linq - options and performance
-
03-07-2019 - |
Question
I have some hierarchical data - each entry has an id and a (nullable) parent entry id. I want to retrieve all entries in the tree under a given entry. This is in a SQL Server 2005 database. I am querying it with LINQ to SQL in C# 3.5.
LINQ to SQL does not support Common Table Expressions directly. My choices are to assemble the data in code with several LINQ queries, or to make a view on the database that surfaces a CTE.
Which option (or another option) do you think will perform better when data volumes get large? Is SQL Server 2008's HierarchyId type supported in Linq to SQL?
Solution
I would set up a view and an associated table-based function based on the CTE. My reasoning for this is that, while you could implement the logic on the application side, this would involve sending the intermediate data over the wire for computation in the application. Using the DBML designer, the view translates into a Table entity. You can then associate the function with the Table entity and invoke the method created on the DataContext to derive objects of the type defined by the view. Using the table-based function allows the query engine to take your parameters into account while constructing the result set rather than applying a condition on the result set defined by the view after the fact.
CREATE TABLE [dbo].[hierarchical_table](
[id] [int] IDENTITY(1,1) NOT NULL,
[parent_id] [int] NULL,
[data] [varchar](255) NOT NULL,
CONSTRAINT [PK_hierarchical_table] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE VIEW [dbo].[vw_recursive_view]
AS
WITH hierarchy_cte(id, parent_id, data, lvl) AS
(SELECT id, parent_id, data, 0 AS lvl
FROM dbo.hierarchical_table
WHERE (parent_id IS NULL)
UNION ALL
SELECT t1.id, t1.parent_id, t1.data, h.lvl + 1 AS lvl
FROM dbo.hierarchical_table AS t1 INNER JOIN
hierarchy_cte AS h ON t1.parent_id = h.id)
SELECT id, parent_id, data, lvl
FROM hierarchy_cte AS result
CREATE FUNCTION [dbo].[fn_tree_for_parent]
(
@parent int
)
RETURNS
@result TABLE
(
id int not null,
parent_id int,
data varchar(255) not null,
lvl int not null
)
AS
BEGIN
WITH hierarchy_cte(id, parent_id, data, lvl) AS
(SELECT id, parent_id, data, 0 AS lvl
FROM dbo.hierarchical_table
WHERE (id = @parent OR (parent_id IS NULL AND @parent IS NULL))
UNION ALL
SELECT t1.id, t1.parent_id, t1.data, h.lvl + 1 AS lvl
FROM dbo.hierarchical_table AS t1 INNER JOIN
hierarchy_cte AS h ON t1.parent_id = h.id)
INSERT INTO @result
SELECT id, parent_id, data, lvl
FROM hierarchy_cte AS result
RETURN
END
ALTER TABLE [dbo].[hierarchical_table] WITH CHECK ADD CONSTRAINT [FK_hierarchical_table_hierarchical_table] FOREIGN KEY([parent_id])
REFERENCES [dbo].[hierarchical_table] ([id])
ALTER TABLE [dbo].[hierarchical_table] CHECK CONSTRAINT [FK_hierarchical_table_hierarchical_table]
To use it you would do something like -- assuming some reasonable naming scheme:
using (DataContext dc = new HierarchicalDataContext())
{
HierarchicalTableEntity h = (from e in dc.HierarchicalTableEntities
select e).First();
var query = dc.FnTreeForParent( h.ID );
foreach (HierarchicalTableViewEntity entity in query) {
...process the tree node...
}
}
OTHER TIPS
This option might also prove useful:
LINQ AsHierarchy() extension method
http://www.scip.be/index.php?Page=ArticlesNET18
I am surprised nobody has mentioned an alternative database design - when hierarchy needs to be flattened from multiple levels and retrieved with high performance (not so considering storage space) it is better to use another entity-2-entity table to track hierarchy instead of parent_id approach.
It will allow not only single parent relations but also multi parent relations, level indications and different types of relationships:
CREATE TABLE Person (
Id INTEGER,
Name TEXT
);
CREATE TABLE PersonInPerson (
PersonId INTEGER NOT NULL,
InPersonId INTEGER NOT NULL,
Level INTEGER,
RelationKind VARCHAR(1)
);
I have done this two ways:
- Drive the retrieval of each layer of the tree based on user input. Imagine a tree view control populated with the root node, the children of the root, and the grandchildren of the root. Only the root and the children are expanded (grandchildren are hidden with the collapse). As the user expands a child node the grandchildren of the root are display (that were previously retrieved and hidden), and a retrieval of all of the great-grandchildren is launched. Repeat the pattern for N-layers deep. This pattern works very well for large trees (depth or width) because it only retrieves the portion of the tree needed.
- Use a stored procedure with LINQ. Use something like a common table expression on the server to build your results in a flat table, or build an XML tree in T-SQL. Scott Guthrie has a great article about using stored procs in LINQ. Build your tree from the results when they come back if in a flat format, or use the XML tree if that is that is what you return.
This extension method could potentially be modified to use IQueryable. I've used it succesfully in the past on a collection of objects. It may work for your scenario.
public static IEnumerable<T> ByHierarchy<T>(
this IEnumerable<T> source, Func<T, bool> startWith, Func<T, T, bool> connectBy)
{
if (source == null)
throw new ArgumentNullException("source");
if (startWith == null)
throw new ArgumentNullException("startWith");
if (connectBy == null)
throw new ArgumentNullException("connectBy");
foreach (T root in source.Where(startWith))
{
yield return root;
foreach (T child in source.ByHierarchy(c => connectBy(root, c), connectBy))
{
yield return child;
}
}
}
Here is how I called it:
comments.ByHierarchy(comment => comment.ParentNum == parentNum,
(parent, child) => child.ParentNum == parent.CommentNum && includeChildren)
This code is an improved, bug-fixed version of the code found here.
In MS SQL 2008 you could use HierarchyID directly, in sql2005 you may have to implement them manually. ParentID is not that performant on large data sets. Also check this article for more discussion on the topic.
I got this approach from Rob Conery's blog (check around Pt. 6 for this code, also on codeplex) and I love using it. This could be refashioned to support multiple "sub" levels.
var categories = from c in db.Categories
select new Category
{
CategoryID = c.CategoryID,
ParentCategoryID = c.ParentCategoryID,
SubCategories = new List<Category>(
from sc in db.Categories
where sc.ParentCategoryID == c.CategoryID
select new Category {
CategoryID = sc.CategoryID,
ParentProductID = sc.ParentProductID
}
)
};
The trouble with fetching the data from the client side is that you can never be sure how deep you need to go. This method will do one roundtrip per depth and it could be union'd to do from 0 to a specified depth in one roundtrip.
public IQueryable<Node> GetChildrenAtDepth(int NodeID, int depth)
{
IQueryable<Node> query = db.Nodes.Where(n => n.NodeID == NodeID);
for(int i = 0; i < depth; i++)
query = query.SelectMany(n => n.Children);
//use this if the Children association has not been defined
//query = query.SelectMany(n => db.Nodes.Where(c => c.ParentID == n.NodeID));
return query;
}
It can't, however, do arbitrary depth. If you really do require arbitrary depth, you need to do that in the database - so you can make the correct decision to stop.
Please read the following link.
http://support.microsoft.com/default.aspx?scid=kb;en-us;q248915