Question

I had interest in Direct Acyclic Graphs (DAG) for a long time and after reading about Topological sort at Wikipedia, I did not find any special mentioning of an approach involving layers numbering (although layers are extensively mentioned for drawing). With this approach the graph is not technically topologically sorted, but knowing that every node contains the correct number for layer (level), we always can tell whether a particular node "bigger" than another topologically. On the other side as long as we don't have an ordered list, we can not enumerate the nodes topologically (although this can be done with a final conventional sort that compares the levels of the nodes).

This approach allows implementing arbitrary connecting while maintaining the correctness of levels information. The steps can be:

  • For any newly added node (without any connection) the level applied is 1.
  • If a connection between two nodes is requested (from m to n) and the n.level > m.level then they are just simply being connected, no level fixing for other nodes is required.
  • If for requested connection (from m to n) n.level<=m.level then we change n.level to (m.level + 1) and recursively check any dependencies of n for similar level increase (or no increase if the levels on a recursive step are compatible).
  • If we keep the list of recursively entered nodes then we can detect an attempt to cycle and implement some kind of undo operation (returning the levels of all affected nodes to previous values)

For any set of known nodes and connections between them, we just add all nodes applying level=1 to them and just try to apply all known connections between (ignoring and undoing cicles).

The final level information not only allows comparing nodes topologically, but contains other useful information. For example:

  • every node with level = 1 has no incoming connections (every path starts from one of them). So any DAG enumeration can be started from them.
  • The longest path (the number of edges) can not be longer than the (largest level + 1)

I suppose that for some artificial data (n nodes, every Node(n) connected to Node(n + 1)) this algorithm can be very slow. But for real-world data I tried it with (Wikipedia categories - 800,000 nodes - 2,000,000 connections) the time is decent (5-10 minutes) and the number of levels and cycle attempts is low (369 levels, 1000 cycle attempts)

So is this method new or is well-known, but just not widely presented in Wikipedia and other resources? Since it's not a sort (technically), should it be called a data restructuring?

Was it helpful?

Solution

There are some papers on incrementally maintaining the topological order of nodes in a graph, with variations on the algorithm you describe.

If the graph has n nodes and m edges, you spend time O(m + n) every time you insert an edge. The papers ask how much time will it take to insert k edges? Trivially, O(k * (n + m)). But in fact you can show much better upper bounds - something like O(k * sqrt(m + n)) for large enough k.

Some links below, there are more:

http://igitur-archive.library.uu.nl/math/2007-0725-201647/2005-011.pdf

http://arxiv.org/abs/0802.1059

http://www.siam.org/proceedings/soda/2009/SODA09_120_benderm.pdf

OTHER TIPS

Probably someone has thought of this before, but since its worst-case is linear, I would be hard pressed to point you to a research article where it's described. The name for the problem this algorithm solves is "incremental topological sorting" (or dynamic, where edge deletions are also possible).

Consider a DAG consisting of two "parallel" directed paths of length n sharing the starting node and the last node. The layer numbering here is more restrictive than the topological order. In the topological order you can put next-to-last node from path A before second node from path B even though its layer number is larger.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top