Question

I have a genealogical database (about sheep actually), that is used by breeders to research genetic information. In each record I store fatherid and motherid. In a seperate table I store complete 'roll up' information so that I can quickly tell the complete family tree of any animal without recursing thru the entire database...

Recently discovered the hierarchicalID type built into SQL server 2008, on the surface it sounds promising, but I and am wondering if anyone has used it enough to know whether or not it would be appropriate in my type of app(i.e. two parents, multiple kids)? All the samples I have found/read so far deal with manager/employee type relationships where a given boss can have multiple employees, and each employee can have a single boss.

The needs of my app are similar, but not quite the same.

I am sure I will dig into this new technology anyway, but it would be nice to shortcut my research if someone already knew that it was not designed in such a fashion that it would allow me to make use of it.

I am also curious what kind of performance people are seeing using this new data type versus other methods that do the same thing.

Was it helpful?

Solution

I can't see how it would work; in a regular hierarchy, there is a single chain to the root, so it can store the path (which is what the binary is) to each node. However, with multiple parents, this isn't possible: even if you split matriarchy and partiarchy, you still have 1 mother, 2 grandmothers, 4 great-grand-mothers, etc (not even getting into some of the more "interesting" scanerios possible, especially with livestock). There is no single logical path to encode, so no: I don't think that this can work in your case.

I'm happy to be corrected, though.

OTHER TIPS

Assuming each sheep has one male parent and one female parent, and that no sheep can be its own parent (leading to an Ovine Temporal Paradox), then what about using two HierarchyIDs?

CREATE TABLE dbo.Sheep(
    MotherHID hierarchyid NOT NULL,
    FatherHID hierarchyid NOT NULL,
    Name int NOT NULL
)
GO
ALTER TABLE dbo.Sheep 
ADD CONSTRAINT PK_Sheep PRIMARY KEY CLUSTERED (
    MotherHID,
    FatherHID
)
GO

By making them a joint PK, you'd be uniquely identifying each sheep as the product of its maternal hierarchy and it's paternal hierarchy.

There may be some inherent problem lurking here, so proceed cautiously with a couple simple prototypes - but initially it seems like it would work for you.

Using two separate HierarchyID to indicate father and mother would work well.

However, you definitely would NOT want to use those as a unique indicator of the row, since it's a 2-to-many situation. (Two sheep can have multiple children.)

I don't see anything inherently wrong with using HierarchyId for ancestry--for Sheep at least. For people, the relationships are much more complicated than "this person begat that person", so obviously the use would be limited to breeding.

SQL Server hierarchyID is not a robust solution for many genealogy analytic questions. It is based on ORDPATH and I've used it for awhile in genealogy; but there are too many scenarios in genealogy that cannot be readily addressed with ORDPATH methods for directed acyclic graphs. A graph database is much more robust and well suited for genealogy. I use Neo4j: http://stumpf.org/genealogy-blog/graph-databases-in-genealogy.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top