Why is my algorithm version so slow with this input?
-
28-09-2020 - |
Question
Here I'm trying to do a comparison of two simple as possible algorithms, solving the symmetric travelling salesman problem, finding the optimal solution without the support of heuristics. I'm showing (or just referencing) the code of both algorithms only to better specify the implementation details, which could be relevant for the comparison. I've written my F# code to solve Advent of Code Day 18 Part 1. It is a BFS adapted to find the optimal solution for a weighted graph (last version has been edited since the original question was posted here, but, as already said, ignore this sort of details). While it seems to work fine with other simple demo inputs, it stucks forever with the following input.
#################
#i.G..c...e..H.p#
########.########
#j.A..b...f..D.o#
########@########
#k.E..a...g..B.n#
########.########
#l.F..d...h..C.m#
#################
It represents a list of lowercase char vertices with uppercase constrained predecessors and edges weighted by different dotted distances. So here I'm looking for a comparison based on the time complexity of the algorithms, related to an analysis of the cases where the number of edges increases, at a fixed number of vertices. There is a reference solution in Python which is correct and fast, but I fail to see where is the fundamental difference between the two algorithms, behind the other minor code differences (also because the languages are different).
I've tried with recursion vs a queue, with tree vs grid/map (see my github history) but with no luck until now.
The principal part of the code is described below. It should fall under a breadth first search (BFS) algorithm, whose implementation details are reported below only to complete the theoretical description already given, because the standard BFS has been adapted to multiple targets, weighted edges and a tentative tree pruning.
Here is how I produce the single step, by which I elaborate the solution.
1 single step receives as input which is the move index i
2 then it updates the distance of the current solution draft by summing the distance to the vertex i
3 it updates the current position as of vertex i
4 it also updates the list of predecessors by including vertex i
5 it computes the new tree of reachable vertices starting from the `fullGrid` and taking into account the newly updated list of predecessors
Notice that this is a constrained version of the TSP where each vertex can have as constraint a list of predecessors.
The fullGrid
is supposed to contain the matrix of distances and it looks fine to me, in terms of the values it contains when I debug it.
The wrapping solver is simply a recursion or queue based version of the BFS. Again, I'm not interested in the programming details but in understanding, at a conceptual level, why the underlying algorithm is not applicable for an apparently tractable input case (of "only" 16 nodes).
Start by receiving the graph topology and starting point
Define a mindistance variable
and a list of completed solutions, initially empty
Start looping at the queue of partial solutions
So a single partial solution is dequeued
It computes possible vertices based on current predecessors
Check if nothing remains and adds it to the completed alternatives updating mindistance
Otherwise loop as in classical BFS
For all possible moves = reachable vertices
For each one applies above said single step
If done updates mindistance and completed alternatives
Otherwise enqueue such partial solution
Finally select the min alternative by distance.
Solution
I've found the conceptual problem. In a BFS kind of search, when optimization is needed because the length of the queue grows at critical speed, you have to take track of the visited items.
The goal is to avoid adding another item to the queue when it is already present a better item there.
Just some code, only to clarify the concept.
In my case I've added a visited map
let mutable visited = Map.empty<char,((char list) * int) list>
and I'm enqueuing only better items:
if visited.ContainsKey(solutionNext.tree.area)
&& (visited.[solutionNext.tree.area]
|> List.exists (
fun (keys, distance) ->
distance <= solutionNext.tree.distance
&& solutionNext.keys |> List.forall(fun k -> keys |> List.contains k)
))
then ()
else
solution_queue <- enqueue solution_queue solutionNext
let previous = if visited.ContainsKey(solutionNext.tree.area) then visited.[solutionNext.tree.area] else []
visited <- Map.add solutionNext.tree.area ((solutionNext.keys, solutionNext.tree.distance) :: previous) visited
and skipping the worst ones
while (MyQueue.length solution_queue > 0) do
let solution = dequeue &solution_queue
if visited.ContainsKey(solution.tree.area)
&& (visited.[solution.tree.area]
|> List.exists (
fun (keys, distance) ->
distance < solution.tree.distance
&& solution.keys |> List.forall(fun k -> keys |> List.contains k)
))
then () else
Cost function
From a theoretical viewpoint, BFS can guarantee that the first solution returned will be as short as possible, only if we can assume that every move has got a cost of 1. You could reduce the above examples to that, but the best fit remains a graph where the cost function is represeted by edges of different length. With real cost functions finding the shortest path becomes more involved: this is exactly the case here. Especially because you have to optimize also the Space complexity (see previous paragraph), avoiding the worst-case time complexity of $O(b^d)$, with $b$ being the branching factor and $d$ the depth of the first solution. You have described a scenario where $d$ is well known at start (the number of the keys in the maze) while the edge case you mention is obtained by a bigger $b$ (number of crossings of the maze)