Question

I have a DAG. I have this operation to add a edge between two nodes.

If A is reachable from B, then B is A's parent. If A is reachable from B without going though another node, then B is A's direct parent.

Requirements for this graph are:

  1. No cycles.
  2. For any node, there is a list of direct parents P[1],P[2],P[3]... P[i] is not a parent of P[j] for any i and j.

If adding a edge, requirement 1 is not met, the edge is not constructed. If adding a edge, requirement 2 is not met, the edge is constructed, but the direct parents will be modified in a way such that requirement 2 is met.

For example, there are 3 nodes

  • A, direct parents: none
  • B, direct parents: A
  • C, direct parents: A

now, if I add a edge between B and C, we have

  • C, direct parents: A,B

but A is a parent of B, doesn't meet requirement 2, thus A is removed from C's direct parent, and we have

  • C, direct parents: B

Currently here is what I did: Add edge from A to B (this A become B's parent)

  1. Check if B is A's parent by BFS. If so, do not add the edge.(this make sure there is no cycles)
  2. Check if A is already B's parent by BFS. If so, do not add the edge.
  3. Find the intersection of A's parent with B's direct parent. This is done by find every parent of A though BFS. Remove the intersection from B's direct parent, and add A as a direct parent of B.(2 and 3 make sure it meet requirement 2)

This is slow. It breaks down at 5k node level(I'm looking for this to handle any graph with less than 100k nodes), the speed become not acceptable, 0.02 second for adding a node edge.

I have a feeling step 1 and 2 can be done in 1 step with some other algorithm.

I thought of using topological ordering, but it have to transverse the entire graph, which is the worst case for my step 1&2. The ordering will be disrupted when the new node is added. so I have to run a topological ordering every time for a insert, so it doesn't create any benefit. For step 3, the entire set of A's parents have to be found. The process is fairly slow, as on average it transverse a decent part of the graph.

How can I make this more efficient?

Was it helpful?

Solution

Your question comes down to "Can inserting edges in a DAG be done faster than O(v+e)?" as per requirement (1). Requirement (2) is a more local constraint that doesn't require checking the entire graph.

I think the answer is no: you can't do better than O(v+e) in the worst case (where v is the number of nodes/vertices and e is the number of edges).

No doubt there are tricks to improve expected performance, depending on the properties of your DAG and how it changes over time. This appears to be an active research topic. For example, I imagine for some graphs it might be beneficial to cluster nodes. Inserting edges within a cluster then only requires checks within the cluster sub-DAG. But then you'd need a proper clustering strategy with support for cheaply updating the clustering when adding nodes, etc.

OTHER TIPS

Don't know about your implmentation, but recommend you add indexing. Each row index must store pairs metaparent-metachild

metaparent - is parent in link: parent, parent-of-parent,...

metachid - is child in link: child, child-of-child, ...

so for graph A->B->C exists following indexes: A-B, B-C, A-C Adding explicit edge between B->C causes an assertion since such entry already exist. So complexity of algorithm reduces from n^2 to ln(n)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top