Java: Traveling Salesman - Found polynomial algorithm

Question 1

I'll try to break this down to essentials. But first let me commend you for tackling a problem that's "known" to be enormously hard. No progress can be made without risk taking.

You are approaching TSP in terms of a recursive expression for S(a, b, I), the length of a shortest path from city a to city b, a \ne b, passing through each city in the unordered set I exactly once.

With S in hand, the TSP is easy to solve. For the set of cities C, find

min( D(b, a) + S(a, b, C\a\b) ) over all pairs a, b drawn from C where a \ne b

Here D(x, y) = D(y, x) is the distance from city x to y and C\a\b is C with a and b removed.

The recursive expression you propose for S is

S(a, b, I) = min( D(a, p) + S(p, q, I\p\q) + D(q, b) ) 
               over all pairs p, q drawn from I where p \ne q ,

The base cases are where I has zero or one element(s). These are pretty obvious.

You are proposing to cache values of S(a, b, I) so that no such computation is ever repeated. (This is called memoizing by the way.)

So what is the cost of this computation, or equivalently the size of the cache? We can write a recurrence for it, where the parameter n = |I| is the number of cities in the intermediate set:

C(n) = M(n, 2) C(n - 2) = ( n(n-1)/2 )  C(n - 2)
C(0) = C(1) = 1

Here M(n, m) is the combination of n things taken m at a time, n! / (m! (n-m)!)

We can solve this. For even n:

C(n) = n! /  2^(n/2)

I'll let you work out the odd case.

For the tour among m cities, we'd need to repeat this for all city pairs and corresponding intermediate sets:

(m(m-1)/2) C(m-2) = m! / 2^(m/2-2)

So your method does avoid an exponential amount of work with respect to the naïve algorithm of generating all possible tours, but the factorial still dominates: this function is super-exponential.

NB on your other "stopping criteria:" Above is the cost of computing all possible values of S(a,b,I) exactly once. To get a poly time algorithm, you will have to come up with a scheme for skipping a super-exponential number (a,b,I) of triples entirely. It's unlikely you can do this, but don't let this dampen your enthusiasm.

Question 2

Your work seems to fall down on four key points:

You do not seem to understand what Polynomial Time means
Your algorithm does not appear to solve the generic Travelling Salesman Problem
Even if the problem your algorithm solves is the Travelling Salesman Problem, it is predicated on a false assumption, causing it to give wrong answers
Even if your algorithm correctly solved the correct problem, it does not appear to run in polynomial time

For point 1, a polynomial time algorithm is not one which can be run on a home computer in five minutes. The terms "poly time", "constant time", "log time", etc all refer to the manner in which an algorithm scales. Providing the results from one run of the algorithm tells us nothing about this. In order to provide empirical data on the asymptotic running time of your algorithm, you will need to average over a very large number of random problem instances. For instance, this graph gives evidence for the fact that, in two dimensions, a naive method for range reporting across n random points is O(n) by the naive method and O(n^0.5) using a 2-d tree. I solved 10,000 randomly generated problems for numbers of points ranging from 2 to 2^(20) and I plotted the completion times on some log scales - the gradients of these lines gives evidence for the asymptotic running times of the algorithms.

The results of one trial are almost completely meaningless. If you cannot rigorously prove that an algorithm is polynomial then a large, well analysed set of empirical results, will give evidence for your claim and get people interested. I must place great emphasis on the word "large".

For the second point, your algorithm solves the Euclidean Travelling Salesman Problem, and not the Travelling Salesman Problem. These are different sets of problems. Though this distinction is technical and the ETSP is still NP-hard, the fact that you have not addressed it or even mentioned it in any of your 7 questions on the topic suggests that you haven't adequately researched the field before claiming your solution is valid.

For the third point, from what I can understand from your question, your solution is predicated on the assumption that the shortest Hamiltonian path through vertices D E F A is somehow related to the shortest Hamiltonian path through vertices E F A. This is false. Suppose that E->F->A is the shortest path through those vertices. If D is close to E and chosen such that DEF are colinear with vertices appearing in that order, then the shortest path is D->E->F->A. If D is chosen to be halfway along the line between E and F, the shortest path is E->D->F->A. Similar choices to before can give us vertex arrangements such that E->F->D->A and E->F->A->D are the shortest paths, and such a construction can generalise to any number of vertices. Knowing the shortest Hamiltonian path through some subset of vertices tells you nothing about the situation when another vertex gets involved.

Indeed, from one of your test cases, your algorithm has been shown to produce incorrect results. You have given no explanation as to what happened in this case, nor any indication of how or even if you have fixed this problem.

Finally, the sum you have given is greater than the sum to n of the binomial coefficients. It seems that LaTeX is not supported on this site, so we'll use (nCk) to denote the binomial coefficient n choose k. Your sum can be re-written as the sum of (k)(n-k)(nCk) for k=1 to n. This sum is clearly greater than the sum of (nCk) for k=1 to n, so this sum must be greater than 2^n, so your algorithm is certainly not polynomial based on your analysis. It is highly unlikely that any sum involving a bunch of factorials will turn out to be polynomially bounded. If you require any kind of non-trivial combinatorics to express your algorithm's run time, it probably does not execute in polynomial time.

Question 3

In short: Your approach won nothing in terms of complexity of the problem.

Let's look at the complexity of your approach. What you are effectively doing is calculating the transitive closure of all subpaths, while eliminating the longer from every two subpaths that start and end in the same city, to reduce the number of remaining combinations for the next iteration. Let's assume you stored the distances between every pair of cities in a hashmap, so lookup time is in O(1).

Given that you have n cities you want to include in you route, there are n x (n-1) pairs.

To calculate the distances for all subpaths of length 3, you chose one city and combine it with every pair that does not itself include the chosen city. There are (n-1) x (n-2) such pairs. As you have *n) cities to chose for the first position, you have 3 x 2 x 1 paths of length 3 to calculate. For n = 3 that means you have O(n!).

To calculate the distances for all subpaths of length 4, you repeat the process. This time you need 4 x 3 x 2 x 1 calculations. For n = 4 that means you have O(n!). This is the point where your elimination starts its effect. From every two paths that start and end in the same cities, you only need to remember the shorter one. This means that only (4 x 3 x 2 x 1)/2 paths of length 4 remain.

To calculate the distances for all subpaths of length 5, you gain from the elimination done in the last step. You need to calculate only 5 x (4 x 3 x 2 x 1)/2. For n = 5 that means you have O(1/2 x n!) = O(n!). This time you can eliminate 5 out of the 6 paths that start and end in the same cities (some of which you didn't even calculate, because of the elimination in the previous step), leaving you with (5 x 4 x 3 x 2 x 1)/6 paths of length 5.

Consequentially, for n = 6 you have O(1/6 x n!), which is still O(n!). For every further step the factor will become smaller. What means that you algorithm is faster than the naive brute force approach that does not save any intermediate results. But your complexity remains O(n!).

Java: Traveling Salesman - Found polynomial algorithm

So my algorithm have the next efficiency.

So what kind of efficiency is this?