Question

We have a database that models a tree. This data can grow fairly huge, that is to say many, may million of rows. (The primary key is actually a bigint, so I guess potentially we would like to support billions of rows, although this is probably never going to occur).

A single node can have a very large amount of direct children, more likely the higher up in the hierarchy they are. We have no specified limit to the actual maximum depth of a leaf, i.e. how many nodes one would have to traverse to get to the root, but in practice this would probably normally not grow beyond a few hundred at the very most. Normally it would probably be below 20.

Insertions in this table is very frequent and needs to be high performing. Insertions nodes inserted are always leaf nodes, and always after the last sibling. Nodes are never moved. Deletions are always made as entire subtrees. Finding subtrees is the other operation made on this table. It does not have the same performance requirements, but of course we would like it as fast as possible.

Today this is modeled with the parent/child model, which is efficient for insertions, but is painfully slow for finding subtrees. When the table grows large, this becomes extremely slow and finding a subtree may take several minutes.

So I was thinking about converting this to perhaps using the new hierarchyid type in SQL Server. But I am having troubles figuring out whether this would be suitable. As I undestand it, for the operations we perform in this scenario, such a tree would be a good idea. (Please correct me if I'm wrong here).

But it also states that the maximum size for a hierarchyid is 892 bytes. However, I can not find any information about what this means in practice. How is the hierarchyid encoded? Will I run out of hierarchyids, and if so, when?

Was it helpful?

Solution

So I did some tests and came to somewhat of a conclusion regarding the limitations of hierarchyid:

If I run for example the following code:

DECLARE @i BIGINT = 1
DECLARE @h  hierarchyId = '/'
WHILE 1=1
BEGIN
    SET @h = @h.ToString() + '1/'
    PRINT CONVERT(nvarchar(max), @i) 
    SET @i = @i+1
END

I will get to 1427 levels deep before I get an error. Since I am using the value 1 for each level, this ought to be the most compact tree from which I draw the conclusion that I will not ever be able to create a tree with more than 1427 levels.

However, if I use for example 99999999999999 for each level (eg. /99999999999999/99999999999999/99999999999999/..., the error occurs already at 118 levels deep. It also seems that 14 digits are the maximum for an id at each level, since it fails immediately if I use a 15 digit number.

So with this in mind, if I only use whole integer identifiers (i.e. don't insert nodes between other nodes etc.) I should be able to guarantee up to at least 100 levels deep in my scenario, and at no time will I be able to exceed much more than 1400 levels.

OTHER TIPS

892 bytes does not sound like much, but the hierarchy id seems to be very efficient, space-wise. From http://technet.microsoft.com/en-us/library/bb677290.aspx:

The average number of bits that are required to represent a node in a tree with n nodes depends on the average fanout (the average number of children of a node). For small fanouts (0-7), the size is about 6*logAn bits, where A is the average fanout. A node in an organizational hierarchy of 100,000 people with an average fanout of 6 levels takes about 38 bits. This is rounded up to 40 bits, or 5 bytes, for storage.

The calculation given says it's only for small fanouts (0-7) which makes it hard to reason about for bigger fanouts. You say 'up to a few hundred children at the most'. This (extreme) case does sound dangerous. I don't know about the spec of hierarchy_id, but the more nodes are at any one level, the less depth you should be able to have in the tree within those 892 bytes.

I do see a risk here, as do you (hence the question). Do some tests. Evaluate the goals. What are you moving from? Why are you moving? Simplicity or performance?

This problem is a bad fit for Sql. Maybe you should consider other options for this part of the program?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top