Sampling random nodes from a DAG

https://stackoverflow.com/questions/3522368

29-09-2019
|

Question

I have a large directed, acylic graph (DAG) from which I would like to efficiently draw a sample node according to the following criteria:

I specify a fixed node A that must never be sampled
Nodes that directly or indirectly refer to A are never sampled
All other nodes are sampled with equal probability

Nodes are stored as objects with pointers to the other nodes that they refer to, the entire graph can be reached from a single root node that refers to everything else directly or indirectly.

Is there a good algorithm to do this? Ideally without requiring large amounts of additional memory since the DAG is large!

Solution

The only solution I can come up with is to

put the nodes in a hash set
(traverse them from the root using, say, a breadth first traversal), O(|E|+|V|)
start from node A and remove all predecessors by traversing the edges backwards
(again O(|E|+|V|))
select a random node from the remaining nodes.

This would result in a O(|E|+|V|) algorithm with a O(|V|) memory requirement.

Note that you wouldn't have to copy the nodes in step 1, only save a reference to the node.

OTHER TIPS

I'm not an expert in this area by any means, but I think you may want a Monte Carlo Markov chain sampling method such as the Metropolis-Hastings or Gibbs sampling algorithm.

You can find some code samples online, and you might have to modify the code to do exactly what you want it to do. There's some good presentations on the topic, like this.

Some software that might help you along are:

I don't know your familiarity with graph theory, so I'm not sure how difficult it would be for you to implement this.

Hope this was helpful...

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow