Finding the shortest path in a graph without any negative prefixes

https://stackoverflow.com/questions/9509595

14-11-2019
|

문제

Find the shortest path from source to destination in a directed graph with positive and negative edges, such that at no point in the path the sum of edges coming before it is negative. If no such path exists report that too.

I tried to use modified Bellman Ford, but could not find the correct solution.

I would like to clarify a few points :

yes there can be negative weight cycles.
n is the number of edges.
Assume that a O(n) length path exists if the problem has a solution.
+1/-1 edge weights.

해결책

Admittedly this isn't a constructive answer, however it's too long to post in a comment...

It seems to me that this problem contains the binary as well as discrete knapsack problems, so its worst-case-running-time is at best pseudo-polynomial. Consider a graph that is connected and weighted as follows:

Graph with initial edge with weight x and then a choice of -a(i) or 0 at each step

Then the equivalent binary knapsack problem is trying to choose weights from the set {a₀, ..., a_n} that maximizes Σ a_i where Σ a_i < X.

As a side note, if we introduce weighted loops it's easy to construct the unbounded knapsack problem instead.

Therefore, any practical algorithm you might choose has a running time that depends on what you consider the "average" case. Is there a restriction to the problem that I've either not considered or not had at my disposal? You seem rather sure it's an O(n³) problem. (Although what's n in this case?)

다른 팁

Peter de Rivaz pointed out in a comment that this problem includes HAMILTONIAN PATH as a special case. His explanation was a bit terse, and it took me a while to figure out the details, so I've drawn some diagrams for the benefit of others who might be struggling. I've made this post community wiki.

I'll use the following graph with six vertices as an example. One of its Hamiltonian paths is shown in bold.

Graph with six vertices and seven edges; one of its Hamiltonian paths shown in bold

Given an undirected graph with n vertices for which we want to find a Hamiltonian path, we construct a new weighted directed graph with n² vertices, plus START and END vertices. Label the original vertices v_i and the new vertices w_ik for 0 ≤ i, k < n. If there is an edge between v_i and v_j in the original graph, then for 0 ≤ k < n−1 there are edges in the new graph from w_ik to w_j(k+1) with weight −2^j and from w_jk to w_i(k+1) with weight −2ⁱ. There are edges from START to w_i0 with weight 2ⁿ − 2ⁱ − 1 and from w_i(n−1) to END with weight 0.

It's easiest to think of this construction as being equivalent to starting with a score of 2ⁿ − 1 and then subtracting 2ⁱ each time you visit w_ij. (That's how I've drawn the graph below.)

Each path from START to END must visit exactly n + 2 vertices (one from each row, plus START and END), so the only way for the sum along the path to be zero is for it to visit each column exactly once.

So here's the original graph with six vertices converted to a new graph with 38 vertices. The original Hamiltonian path corresponds to the path drawn in bold. You can verify that the sum along the path is zero.

Same graph converted to shortest-weighted path format as described.

UPDATE: The OP now has had several rounds of clarifications, and it is a different problem now. I'll leave this here for documenting my ideas for the first version of the problem (or rather my understanding of it). I'll try a new answer for the current version of the problem. End of UPDATE

It's a pity that the OP hasn't clarified some of the open questions. I'll assume the following:

The weights are +/- 1.
n is the number of vertices

The first assumption is no loss of generality, obviously, but it has great impact on the value of n (via the second assumption). Without the first assumption, even a tiny (fixed) graph can have arbitrary long solutions by varying the weights without limits.

The algorithm I propose is quite simple, and similar to well-known graph algorithms. I'm no graph expert though, so I may use the wrong words in some places. Feel free to correct me.

For the source vertex, remember cost 0. Add (source, 0) to the todo list.
Pop an item from the todo list. Follow all outgoing edges of the vertex, computing the new cost c to reach the new vertex v. If the new cost is valid (c >= 0 and c <= n ^ 2, see below) and not remembered for v, add it to the remembered cost values of v, and add (v, c) to your todo list.
If the todo list is not empty, continue with step 2. (Or break early if the destination can be reached with cost 0).

It's clear that each "step" that's not an immediate dead end creates a new (vertex, cost) combination. There will be stored at most n * n ^2 = n ^ 3 of these combinations, and thus, in a certain sense, this algorithm is O(n^3).

Now, why does this find the optimal path? I don't have a real proof, but I think it the following ideas justify why I think this suffices, and it may be possible that they can be turned into a real proof.

I think it is clear that the only thing we have to show is that the condition c <= n ^ 2 is sufficient.

First, let's note that any (reachable) vertex can be reached with cost less than n.

Let (v, c) be part of an optimal path and c > n ^ 2. As c > n, there must be some cycle on the path before reaching (v, c), where the cost of the cycle is 0 < m1 < n, and there must be some cycle on the path after reaching (v, c), where the cost of the cycle is 0 > m2 > -n.

Furthermore, let v be reachable from the source with cost 0 <= c1 < n, by a path that touches the first cycle mentioned above, and let the destination be reachable from v with cost 0 <= c2 < n, by a path that touches the other cycle mentioned above.

Then we can construct paths from source to v with costs c1, c1 + m1, c1 + 2 * m1, ..., and paths from v to destination with costs c2, c2 + m2, c2 + 2 * m2, ... . Choose 0 <= a <= m2 and 0 <= b <= m1 such that c1 + c2 + a * m1 + b * m2 is minimal and thus the cost of an optimal path. On this optimal path, v would have the cost c1 + a * m1 < n ^ 2.

If the gcd of m1 and m2 is 1, then the cost will be 0. If the gcd is > 1, then it might be possible to choose other cycles such that the gcd becomes 1. If that is not possible, it's also not possible for the optimal solution, and there will be a positive cost for the optimal solution.

(Yes, I can see several problems with this attempt of a proof. It might be necessary to take the gcd of several positive or negative cycle costs etc. I would be very interested in a counterexample, though.)

Here's some (Python) code:

def f(vertices, edges, source, dest):
    # vertices: unique hashable objects
    # edges: mapping (u, v) -> cost; u, v in vertices, cost in {-1, 1}

    #vertex_costs stores the possible costs for each vertex
    vertex_costs = dict((v, set()) for v in vertices)
    vertex_costs[source].add(0) # source can be reached with cost 0

    #vertex_costs_from stores for each the possible costs for each vertex the previous vertex
    vertex_costs_from = dict()

    # vertex_gotos is a convenience structure mapping a vertex to all ends of outgoing edges and their cost
    vertex_gotos = dict((v, []) for v in vertices)
    for (u, v), c in edges.items():
        vertex_gotos[u].append((v, c))

    max_c = len(vertices) ** 2 # the crucial number: maximal cost that's possible for an optimal path

    todo = [(source, 0)] # which vertices to look at

    while todo:
        u, c0 = todo.pop(0)
        for v, c1 in vertex_gotos[u]:
            c = c0 + c1
            if 0 <= c <= max_c and c not in vertex_costs[v]:
                vertex_costs[v].add(c)
                vertex_costs_from[v, c] = u
                todo.append((v, c))

    if not vertex_costs[dest]: # destination not reachable
        return None # or raise some Exception

    cost = min(vertex_costs[dest])

    path = [(dest, cost)] # build in reverse order
    v, c = dest, cost
    while (v, c) != (source, 0):
        u = vertex_costs_from[v, c]
        c -= edges[u, v]
        v = u
        path.append((v, c))

    return path[::-1] # return the reversed path

And the output for some graphs (edges and their weight / path / cost at each point of the path; sorry, no nice images):

AB+ BC+ CD+ DA+             AX+ XY+ YH+ HI- IJ- JK- KL- LM- MH-
 A  B  C  D  A  X  Y  H  I  J  K  L  M  H
 0  1  2  3  4  5  6  7  6  5  4  3  2  1
AB+ BC+ CD+ DE+ EF+ FG+ GA+ AX+ XY+ YH+ HI- IJ- JK- KL- LM- MH-
 A  B  C  D  E  F  G  A  B  C  D  E  F  G  A  B  C  D  E  F  G  A  X  Y  H  I  J  K  L  M  H  I  J  K  L  M  H  I  J  K  L  M  H  I  J  K  L  M  H
 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
AB+ BC+ CD+ DE+ EF+ FG+ GA+ AX+ XY+ YH+ HI- IJ- JK- KL- LM- MN- NH-
 A  X  Y  H
 0  1  2  3
AB+ BC+ CD+ DE+ EF+ FG+ GA+ AX+ XY+ YH+ HI- IJ- JK- KL- LM- MN- NO- OP- PH-
 A  B  C  D  E  F  G  A  B  C  D  E  F  G  A  B  C  D  E  F  G  A  B  C  D  E  F  G  A  B  C  D  E  F  G  A  B  C  D  E  F  G  A  X  Y  H  I  J  K  L  M  N  O  P  H  I  J  K  L  M  N  O  P  H  I  J  K  L  M  N  O  P  H  I  J  K  L  M  N  O  P  H  I  J  K  L  M  N  O  P  H
 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0

Here's the code to produce that output:

def find_path(edges, source, path):
    from itertools import chain

    print(edges)
    edges = dict(((u, v), 1 if c == "+" else -1) for u, v, c in edges.split())
    vertices = set(chain(*edges))

    path = f(vertices, edges, source, dest)
    path_v, path_c = zip(*path)
    print(" ".join("%2s" % v for v in path_v))
    print(" ".join("%2d" % c for c in path_c))

source, dest = "AH"

edges = "AB+ BC+ CD+ DA+             AX+ XY+ YH+ HI- IJ- JK- KL- LM- MH-"
# uv+ means edge from u to v exists and has cost 1, uv- = cost -1
find_path(edges, source, path)

edges = "AB+ BC+ CD+ DE+ EF+ FG+ GA+ AX+ XY+ YH+ HI- IJ- JK- KL- LM- MH-"
find_path(edges, source, path)

edges = "AB+ BC+ CD+ DE+ EF+ FG+ GA+ AX+ XY+ YH+ HI- IJ- JK- KL- LM- MN- NH-"
find_path(edges, source, path)

edges = "AB+ BC+ CD+ DE+ EF+ FG+ GA+ AX+ XY+ YH+ HI- IJ- JK- KL- LM- MN- NO- OP- PH-"
find_path(edges, source, path)

As Kaganar notes, we basically have to make some assumption in order to get a polytime algorithm. Let's assume that the edge lengths are in {-1, 1}. Given the graph, construct a weighted context-free grammar that recognizes valid paths from source to destination with weight equal to the number of excess 1 edges (it generalizes the grammar for balanced parentheses). Compute, for each nonterminal, the cost of the cheapest production by initializing everything to infinity or 1, depending on whether there is a production whose RHS has no nonterminal, and then relaxing n - 1 times, where n is the number of nonterminals.

I would use recursion brute forcing here: something like (pseudo code to make sure it's not language specific)

you will need:

2D array of bools showing where you CAN and where you CAN'T go, this should NOT include "forbidden values", like before negative edge, you can choose to add a vertical and horizontal 'translation' to make sure it starts at [0][0]
an integer (static) containing the shortest path
a 1D array of 2 slots, showing the goal. [0] = x, [1] = y

you will do:

function(int xPosition, int yPosition, int steps)
{
if(You are at target AND steps < Shortest Path)
    Shortest Path = steps
if(This Position is NOT legal)
    /*exit function*/
else
    /*try to move in every legal DIRECTION, not caring whether the result is legal or not
    but always adding 1 to steps, like using:*/
    function(xPosition+1, yPosition, steps+1);
    function(xPosition-1, yPosition, steps+1);
    function(xPosition, yPosition+1, steps+1);
    function(xPosition, yPosition-1, steps+1);
}

then just run it with function(StartingX, StartingY, 0);

the shortest path will be contained in the static external int

I would like to clarify a few points :

yes there can be negative weight cycles.
n is the number of edges.
weights are arbitrary not just +1/-1.
Assume that a O(n) length path exists if the problem has a solution. (n is number of edges)

Although people have shown that no fast solution exists (unless P=NP)..

I think for most graphs (95%+) you should be able to find a solution fairly quickly.

I take advantage of the fact that if there are cycles then there are usually many solutions and we only need to find one of them. There are probably some glaring holes in my ideas so please let me know.

Ideas:

1. find the negative cycle that is closest to the destination. denote the shortest distance between the cycle and destination as d(end,negC)

(I think this is possible, one way might be to use floyds to detect (i,j) with a negative cycle, and then breadth first search from the destination until you hit something that is connected to a negative cycle.)

2. find the closest positive cycle to the start node, denote the distance from the start as d(start,posC)

(I argue in 95% of graphs you can find these cycles easily)

    Now we have cases:
a) there is both the positive and negative cycles were found:
The answer is d(end,negC).

b) no cycles were found:
simply use shortest path algorithm?

c) Only one of the cycles was found. We note in both these cases the problem is the same due to symmetry (e.g. if we swap the weights and start/end we get the same problem). I'll just consider the case that there was a positive cycle found.

find the shortest path from start to end without going around the positive cycle. (perhaps using modified breadth first search or something). If no such path exists (without going positive).. then .. it gets a bit tricky.. we have to do laps of the positive cycle (and perhaps some percentage of a lap).
If you just want an approximate answer, work out shortest path from positive cycle to end node which should usually be some negative number. Calculate number of laps required to overcome this negative answer + the distance from the entry point to the cycle to the exit point of the cycle. Now to do better perhaps there was another node in the cycle you should have exited the cycle from... To do this you would need to calculate the smallest negative distance of every node in the cycle to the end node.. and then it sort of turns into a group theory/ random number generator type problem... do as many laps of the cycle as you want till you get just above one of these numbers.

good luck and hopefully my solutions would work for most cases.

The current assumptions are:

yes there can be negative weight cycles.
n is the number of edges.
Assume that a O(n) length path exists if the problem has a solution.
+1/-1 edge weights.

We may assume without loss of generality that the number of vertices is at most n. Recursively walk the graph and remember the cost values for each vertex. Stop if the cost was already remembered for the vertex, or if the cost would be negative.

After O(n) steps, either the destination has not been reached and there is no solution. Otherwise, for each of the O(n) vertices we have remembered at most O(n) different cost values, and for each of these O(n ^ 2) combinations there might have been up to n unsuccessful attempts to walk to other vertices. All in all, it's O(n ^ 3). q.e.d.

Update: Of course, there is something fishy again. What does assumption 3 mean: an O(n) length path exists if the problem has a solution? Any solution has to detect that, because it also has to report if there is no solution. But it's impossible to detect that, because that's not a property of the individual graph the algorithm works on (it is asymptotic behaviour).

(It is also clear that not all graphs for which the destination can be reached have a solution path of length O(n): Take a chain of m edges of weight -1, and before that a simple cycle of m edges and total weight +1).

[I now realize that most of the Python code from my other answer (attempt for the first version of the problem) can be reused.]

Step 1: Note that your answer will be at most 2*n (if it exists).
Step 2: Create a new graph with vertexes that are a pairs of [vertex][cost]. (2*n^2 vertexes)
Step 3: Note that new graph will have all edges equal to one, and at most 2*n for each [vertex][cost] pair.
Step 4: Do a dfs over this graph, starting from [start][0]
Step 5: Find minimum k, such that [finish][k] is accesible.

Total complexity is at most O(n^2)*O(n) = O(n^3)

EDIT: Clarification on Step 1.
If there is a positive cycle, accesible from start, you can go all the way up to n. Now you can walk to any accesible vertex, over no more than n edges, each is either +1 or -1, leaving you with [0;2n] range. Otherwise you'll walk either through negative cycles, or no more than n +1, that aren't in negative cycle, leaving you with [0;n] range.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow