Question

I have a set of data on which I need to perform a topological sort, with a few assumptions and constraints, and I was wondering if anyone knew an existing, efficient algorithm that would be appropriate for this.

  • The data relationships are known to form a DAG (so no cycles to worry about).
  • An edge from A to B indicates that A depends on B, so B must appear before A in the topological ordering.
  • The graph is not necessarily connected; that is, for any two nodes N and M there may be no way to get from N to M by following edges (even if you ignore edge direction).
  • The data relationships are singly linked. This means that when there is an edge directed from A to B, only the A node contains information about the existence of the edge.

The problem can be formulated as follows:

Given a set of nodes S in graph G which may or may not have incoming edges, find a topological ordering of the subgraph G' consisting of all of the nodes in G that are reachable from any node in set S (obeying edge direction).

This confounds the usual approaches to topological sorting because they require that the nodes in set S do not have any incoming edges, which is something that is not true in my case. The pathological case is:

A --> B --> D
|     ^     ^
|     |     |
\---> C ----/

Where S = {B, C}. An appropriate ordering would be D, B, C, but if a normal topological sort algorithm happened to consider B before C, it would end up with C, D, B, which is completely wrong. (Note that A does not appear in the resulting ordering since it is not reachable from S; it's there to give an example where all of the nodes in S might have incoming edges)

Now, I have to imagine that this is a long-solved problem, since this is essentially what programs like apt and yum have to do when you specify multiple packages in one install command. However, when I search for keyphrases like "dependency resolution algorithm", I get results describing normal topological sorting, which does not handle this particular case.

I can think of a couple of ways to do this, but none of them seem particularly elegant. I was wondering if anyone had some pointers to an appropriate algorithm, preferably one that can operate in a single pass over the data.

Was it helpful?

Solution

I don't think you'll find an algorithm that can do this with a single pass over the data. I would perform a breadth-first search, starting with the nodes in S, and then do a topological sort on the resulting subgraph.

OTHER TIPS

I think you can do a topological sorting of the entire graph and then select only the nodes which are reachable from the set of nodes (you can do some depth first searches from the nodes in the set, in the order resulted after the sorting, and you'll get in the subtree of a node if it wasn't visited before).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top