Question

I have an adjacency list I have created for a given graph with nodes and weighted edges. I am trying to figure out what the best way would be to find the longest path within the graph. I have a topological sort method, which I've heard can be useful, but I am unsure how to implement it to find the longest path. So is there a way to accomplish this using topology sort or is there a more efficient method?

Here is an example of my out for the adj list (the value in parenthesis are the cost to get to the node after the arrow (cost)to get to -> node:

Node 0 (4)->1(9)->2
Node 1 (10)->3
Node 2 (8)->3
Node 3
Node 4 (3)->8(3)->7
Node 5 (2)->8(4)->7(2)->0
Node 6 (2)->7(1)->0
Node 7 (5)->9(6)->1(4)->2
Node 8 (6)->9(5)->1
Node 9 (7)->3
Node 10 (12)->4(11)->5(1)->6
Was it helpful?

Solution

Bryan already answered your question above, but I thought I could go in more depth.

First, as he pointed out, this problem is only easily solvable if there are no cycles. If there are cycles you run into the situation where you have infinitely long paths. In that case, you might define a longest path to be any path with no repeated nodes. Unfortunately, this problem can be shown to be NP-Hard. So instead, we'll focus on the problem which it seems like you actually need to solve (since you mentioned the topological sort)--longest path in a Directed Acyclic Graph (DAG). We'll also assume that we have two nodes s and t that are our start and end nodes. The problem is a bit uglier otherwise unless you can make certain assumptions about your graph. If you understand the text below, and such assumptions in your graphs are correct, then perhaps you can remove the s and t restrictions (otherwise, you'll have to run it on every pair of vertices in your graph! Slow...)

The first step in the algorithm is to topologically order the vertices. Intuitively this makes sense. Say you order them from left to right (i.e. the leftmost node will have no incoming edges). The longest path from s to t will generally start from the left and end on the right. It's also impossible for the path to ever go in the left direction. This gives you a sequential ordering to generate the longest path--start at the left and move right.

The next step is to sequentially go left to right and define the longest path for each node. For any node that has no incoming edges, the longest path to that node is 0 (this is true by definition). For any node with incoming edges, recursively define the longest path to that node to be the maximum over all incoming edges + the longest path to get to the "incoming" neighbor (note that this number might be negative, if, for example, all of the incoming edges are negative!). Intuitively this makes sense, but the proof is also trivial:

Suppose our algorithm claims that the longest path to some node v is d but the actual longest path is some d' > d. Pick the "least" such node v (we use the ordering as defined by the topological sort. In other words, we pick the "left-most" node that our algorithm failed at. This is important so that we can assume that our algorithm has correctly determined the longest path for any nodes to the "left" of v). Define the length of the hypothetical longest path to be d' = d_1 + e where d_1 is the length of the hypothetical path up to a node v_prev with edge e to v (note the sloppy naming. The edge e also has weight e). We can define it as such because any path to v must go through one of its neighbors which have an edge going to v (since you can't get to v without getting there via some edge that goes to it). Then d_1 must be the longest path to v_prev (else, contradiction. There is a longer path which contradicts our choice of v as the "least" such node!) and our algorithm would choose the path containing d_1 + e as desired.

To generate the actual path you can figure out which edge was used. Say you've reconstructed the path up to some vertex v which has longest path length d. Then go over all incoming vertices and find the one with longest path length d' = d - e where e is the weight of the edge going into v. You could also just keep track of the parents' of nodes as you go through the algorithm. That is, when you find the longest path to v, set its parent to whichever adjacent node was chosen. You can use simple contradiction to show why either method generates the longest path.

Finally some pseudocode (sorry, it's basically in C#. This is a lot messier to code in C without custom classes and I haven't coded C in a while).

public List<Nodes> FindLongestPath(Graph graph, Node start, Node end)
{
    var longestPathLengths = Dictionary<Node, int>;

    var orderedNodes = graph.Nodes.TopologicallySort();
    // Remove any nodes that are topologically less than start. 
    // They cannot be in a path from start to end by definition
    while (orderedNodes.Pop() != start);
    // Push it back onto the top of the stack
    orderedNodes.Push(start);

    // Do algorithm until we process the end node
    while (1)
    {
        var node = orderedNodes.Pop();
        if (node.IncomingEdges.Count() == 0)
        {
            longestPathLengths.Add(node, 0);
        }
        else
        {
            var longestPathLength = Int.Min;
            foreach (var incomingEdge in node.IncomingEdges)
            {
                var currPathLength = longestPaths[incomingEdge.Parent] +               
                                     incomingEdge.Weight);
                if (currPathlength > longestPathLength)
                {
                    longestPath = currPathLength;
                }
            }

            longestPathLengths.Add(node, longestPath);
        }

        if (node == end)
        {
            break;
        }
    }

    // Reconstruct path. Go backwards until we hit start
    var node = end;
    var longestPath = new List<Node>();
    while (node != start)
    {
        foreach (var incomingEdge in node.IncomingEdges)
        {
            if (longestPathLengths[incomingEdge.Parent] == 
                    longestPathLengths[node] - incomingEdge.Weight)
            {
                longestPath.Prepend(incomingEdge.Parent);
                node = incomingEdge.Parent;
                break;
            }
        }
    }

    return longestPath;
}

Note that this implementation is not particularly efficient, but hopefully it's clear! You can optimize in a lot of small ways that should be obvious as you think through the code/implementation. Generally, if you store more stuff in memory, it'll run faster. The way you structure your Graph is also critical. For instance, it didn't seem like you had an IncomingEdges property for your nodes. But without that, finding the incoming edges for each node is a pain (and is not performant!). In my opinion, graph algorithms are conceptually different from, say, algorithms on strings and arrays because the implementation matters so much! If you read the wiki entries on graph algorithms you'll find they often give three or four different runtimes based on different implementations (with different data structures). Keep this in mind if you care about speed

OTHER TIPS

Assuming your graph has no cycles, otherwise longest path becomes a vague concept, you can have a topological sort indeed. Now you can walk this topological sort and for each node compute its longest distance from a source node by looking at all its predecessors and add the weight of the edge connecting to them to their distance. Then choose the predecessor that gives you the longest distance for this node. The topological sort guarantees that all your predecessors have their distance already correctly determined.

If in addition to the length of the longest path, you also want the path itself. Then you start at the node that gave the longest length and look at all its predecessors to find the one that resulted in this length. Then repeat this process until you have found a source node of the graph.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top