Building an All Paths Algorithm for a DAG

Question 1

Here there is an implementation based on the BFS algorithm. I will denote a path as a sequence of vertices l = (v, v', v'', ...) and I will perform the following two operations on it:

extend(l, v): puts vertex v at the end of list l;
v = back(l): gets the last vertex in list l.

FindPaths(G, v) {
  // The first path is, simply, the starting node.
  // It should be the first vertex in topological order.
  pending_paths = {(v)};
  while (pending_paths is not empty) {
    l = pending_paths.remove_first(); // pop the first pending path
    output(l); // output it (or save it in a list to be returned, if you prefer)
    v = back(l); // Get the last vertex of the path
    foreach(edge (v, v') in G) { // For each edge outgoing from v'...
      // extend l with v' and put into the list of paths to be examined.
      pending_paths.push_back(extend(l, v'));
    } 
  }
}

Question 2

Here's a simple recursive algorithm, expressed in pseudocode to avoid clouding the issue with lots of Java list manipulation:

AllPaths(currentNode):
  result = EmptyList()
  foreach child in children(node):
    subpaths = AllPaths(child)
    foreach subpath in subpaths:
      Append(result, currentNode + subpath)
  return result

Calling AllPaths on the root node will give you what you need, and you can improve the running time for nontrivial DAGs by caching the result of AllPaths on each node, so you only need to compute it once rather than once per distinct path that includes it.

Question 3

So while @akappa's Pseudo was a good start, it took me awhile to understand how to make it work, if anyone else comes across this post here's how I did it:

public ArrayList<ArrayList<Node>> searchAll() {
    try {
        BufferedWriter out = new BufferedWriter(new FileWriter("output.txt"));
        //Gets Nodes from Hashmap and puts them into Queue
        for (Node key : paths.adjacencies.keySet()) {
            queue.addToQueue(new QueueItem(key.chemName, new ArrayList<Node>()));
        }
        while (queue.getSize() > 0) {
            QueueItem queueItem = queue.getFromQueue();
            Node node = paths.getNodeFromGraph(queueItem.getNodeId());
            if (node != null) {
                findRelationAll(node, queueItem, out);
            }
        }
        System.out.println("Cycle complete: Number of Edges: [" + resultHistoryAll.size() + "]");
        out.close();
    } catch (IOException e) {
    }
    return resultHistoryAll;

}

public void findRelationAll(Node node, QueueItem queueItem, BufferedWriter out) {
    if (!foundRelation) {
        StringTokenizer st = new StringTokenizer(paths.getAdjacentString(node), ",");
        while (st.hasMoreTokens()) {
            String child = st.nextToken();
            ArrayList<Node> history = new ArrayList<Node>();
            //Gets previous Nodes
            history.addAll(queueItem.getHistoryPath());
            //Makes sure path is unique
            if (history.contains(node)) {
                System.out.println("[" + counter2 + "]: Cyclic");
                counter2++;
                continue;
            }
            history.add(node);
            resultHistory = history;
            queue.addToQueue(new QueueItem(child, history));
            if (!(inList(resultHistory, resultHistoryAll))) {
                resultHistoryAll.add(resultHistory);
                try {
                    out.write("[" + counter + "]: " + resultHistory.toString());
                    out.newLine();
                    out.newLine();
                } catch (IOException e) {
                }
                System.out.println("[" + counter + "]: " + resultHistory.toString());
                counter++;
            } else {
                System.out.println("[" + counter3 + "]: InList");
                counter3++;
            }

        }
    }
}
//This checks path isn't in List already
public boolean inList(ArrayList<Node> resultHistory, ArrayList<ArrayList<Node>> allPaths) {
    for (int i = 0; i < allPaths.size(); i++) {
        if (allPaths.get(i).equals(resultHistory)) {
            return true;
        }
    }
    return false;
}

}

The above code does a few extra things that you might not want:

It writes pathways to a text file as a list of nodes + it's counter value.
Makes sure the path doesn't cross the same node twice
Makes sure no two pathways are the same in the final list (in normal circumstances this is unnecessary work)

The QueueItem object is just a way to store the previously visited nodes. It's part of nemanja's code, which is what my code was based off.

Hat tip to him, akappa (for the most efficient answer), and jacobm (for finding a solution like my original code, and explaining it's limitations).

Incase anyone's actually interested in the work; I'm currently processing over 5 million pathways, of which 60,000 are unique pathways between 900 chemicals. And that's just 1,2,3 or 4 chemical pathways... Biology is complicated.

EDIT and Warning: IF anyone is handling huge reams of data like me, windows 7 - or at least my machine - throws a shit fit and closes the program after an ArrayList > 63,000 objects, regardless of how you arrange the pointers. The solution I started with was after 60,000 objects, restarting the list while adding everything to CSV. This led to some duplicates between list iteration, and should ultimately be surpassed by my moving to linux tomorrow!