Question

I have a table representing the transitive closure of an organizational hierarchy (i.e., its a tree with a single root):

create table ancestry (
    ancestor   integer,
    descendant integer,
    distance   integer
);

I have another table that contains the organizations that each user is allowed to access:

create table accessible (
    user         integer,
    organization integer
);

The system shows the user a roll-up of expenditures associated with each organization the user can access. I could always start by showing the user a view of the company (i.e., the root) showing the user a list of immediate child organizations and how much his organizations contribute to the total. In most cases, there would be a single child and the user would be required to drill-down several levels before seeing multiple children. I would prefer to start the presentation with the first organization that shows multiple children (i.e., the LCA).

For a given user, I can find the set of paths to the root easy enough but am having trouble finding the least common ancestor. I am using postgresql 9.1 but would prefer a solution that is database agnostic. In the worst case, I can pull the paths to root back into the application's code and calculate the LCA there.

Was it helpful?

Solution

I took a fresh look at this and developed the following solution. I used a common-table-expression to make it easier to understand how it operates but it could easily be written using a sub-query.

with
hit (id, count) as (
    select
        ancestry.ancestor
       ,count(ancestry.descendant)
    from
        accessible
        inner join ancestry
            on accessible.organization = ancestry.descendant
    where
        accessible.user = @user_id
    group by
        ancestry.ancestor
)
select
    ancestry.descendant as lca
from
    hit
    inner join ancestry
        on ancestry.descendant = hit.id
       and ancestry.ancestor = @company_id
order by
    hit.count desc
   ,ancestry.distance desc
limit 1
;

The hit CTE counts, for each organization in the hierarchy, the number of paths from a child to the root that traverse the organization. The LCA is then the organization with the most traversals. In the event of a tie, the organization farthest from the root (i.e., max(distance)) is the actual LCA. This is best illustrated with an example.

        A
        |
        B
       / \
      C   D

Assuming we wish to find the LCA of nodes C and D from the tree above. The hit CTE produces the following counts:

Node    Count
  A       2
  B       2
  C       1
  D       1

The main query adds the distance:

Node    Count    Distance
  A       2         0
  B       2         1
  C       1         2
  D       1         2

The main query then orders the results by descending count and distance

Node    Count    Distance
  B       2         1
  A       2         0
  C       1         2
  D       1         2

The LCA is the first item in the list.

OTHER TIPS

Just a hunch and not db agnostic (SQL Server) but adaptable

SELECT TOP 1
       a1.ancestor
FROM   ancestor a1
       INNER JOIN
       ancestor a2 ON a1.ancestor=a2.ancestor
WHERE  a1.descendent = @Dec1
       AND
       a2.descendent = @Dec2
ORDER BY a1.distance DESC

If you want to put some data in SQLFiddle, I can have a play with it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top