How to find combination of segments to build a track: possible Subset Sum

Question 1

Math to the rescue!

You can check that every even number larger than 8 is a linear combination of elements from this set - ask on Math Overflow for a proof ;).

So let's rephrase the question in math:

We have an overcomplete dictionary of basis vectors (because 16, for example, is a multiple of 8),
in which we can represent every even number larger than 8 as a linear combination of these basis vectors, and
we are trying to minimize the zero "norm" of this input vector.

The good news: This is a very interesting problem, with many application domains, so it is pretty well researched.

The bad news: this is still a hard (NP-hard) problem.

But, hey, at least now you know.

edit: And just so I don't get accused of handwaving philosophical answer, here's a modified (completely non-optimized) version of Solver.recursiveSolve that exhaustively searches for a combination of segments that match the goal; and a zero norm comparer class with which you can sort your results:

    private void RecursiveSolve(int goal, int currentSum,
        List<int> included, List<int> segments, int startIndex)
    {

        for (int index = 0; index < segments.Count; index++)
        {
            int nextValue = segments[index];
            if (currentSum + nextValue == goal)
            {
                List<int> newResult = new List<int>(included);
                newResult.Add(nextValue);
                mResults.Add(newResult);
            }
            else if (currentSum + nextValue < goal)
            {
                List<int> nextIncluded = new List<int>(included);
                nextIncluded.Add(nextValue);
                RecursiveSolve(goal, currentSum + nextValue,
                    nextIncluded, segments, startIndex++);
            }
        }
    }

class ZeroNormAndSDComparer : IComparer<List<int>>
{
    private int l0_norm(List<int> v)
    {
        int norm = 0;
        HashSet<int> seen = new HashSet<int>();

        for (int i = 0; i < v.Count; ++i)
        {
            if (!seen.Contains(v[i]))
            {
                norm++;
                seen.Add(v[i]);
            }
        }
        return norm;
    }

    private static double StandardDeviation(List<int> v)
    {
        double M = 0.0;
        double S = 0.0;
        int k = 1;
        foreach (double value in v)
        {
            double tmpM = M;
            M += (value - tmpM) / k;
            S += (value - tmpM) * (value - M);
            k++;
        }
        return Math.Sqrt(S / (k - 1));
    }

    public int Compare(List<int> x, List<int> y)
    {
        int normComparison = l0_norm(x).CompareTo(l0_norm(y));
        if  (normComparison == 0)
        {
            return StandardDeviation(x).CompareTo(StandardDeviation(y));
        }
        return normComparison;
    }
}

And your modified Main (now sorting is done after results have been reduced to distinct, 4-segment results):

        List<int> nums = new List<int>() { 8, 10, 12, 14, 16, 18, 20, 22, 24 };

        int width = 80;
        Console.WriteLine("Total width: " + width);

        Solver solver = new Solver();
        List<List<int>> res = solver.Solve(width, nums.ToArray());
        Console.WriteLine("total :" + res.Count);
        var res1 = res.Distinct(new SequenceComparer<List<int>, int>()).ToList();
        Console.WriteLine("total res1:" + res1.Count);

        var res2 = res1.Where(l => l.Count == 4).ToList();
        Console.WriteLine("total res2:" + res2.Count); //reduce to 4 integer solutions

        res2.Sort(new ZeroNormAndSDComparer());

Question 2

Let's divide everything by 2 since everything is even. We now have track pieces with length 4 to 12 for a total length of about 10 to 50.

Name n the length we have to achieve. For every possible number k of track pieces (1 to 4 in general, but 1 to 3 for n<16 or 3 to 4 for n>36 for example), suggest taking n%k pieces of length n/k+1 and k-n%k pieces of length n/k.

'/' designates integer division and '%' the remainder.

Question 3

You really have a special case of the sum-set. While it's NP-hard, the solution space is restricted enough that brute force would probably work fine, as there are only 10000 (10 ^ 4) possible solutions total (which is about equal to the actual number of time steps needed), since you also have to consider 0 as a possible length.

Here is the code, in psudo-Python. Thought about trying it in C#, but not actually familiar with it, and so it probably wouldn't work out well.

lengths = [0, 8, 10, 12, 14, 16, 18, 20, 22, 24]
solutions = []
for len1 in lengths:
    for len2 in lengths:
        for len3 in lengths:
            for len4 in lengths:
                if (len1 + len2 + len3 + len4) == n:
                    solutions.append([len1,len2,len3,len4])
return solutions

Once you have all the valid solutions, you can then simply determine which one(s) you want to show the user, or you can simply write an algorithm to pick the best one. Of course, you'll probably want to not actually include any lengths of size 0.

You can improve this algorithm somewhat using a greedy method that will only find all valid solutions. However, again, the problem as it's stated isn't likely to be complex enough to need it unless things are very constricted in terms of space or time.

As a bonus, if you are expecting multiple queries (for example user asks for n = 40 and later n = 50), you can remove the if statement and simply store all 10000 solutions in a hash-table keyed to the sum value, allowing for O(1) querying.

Narrowing down the solution set:

What you need here is a comparison algorithm that basically compares this solution to that solution and says, "this solution is better/worst than that solution". This allows you to write a sorting algorithm that you can use to sort the solutions to get the best so many, or you can simply just find the best solution.

You can resolve the vast majority of cases by simply calculating the standard deviation for each solution then compare the standard deviations. This will give a number that shows how much variance is in the lengths of the solution. If you use "lower standard deviation is better", then that'll give you "All of equal length or close to each other is preferred over Large/small combinations". The algorithm for standard deviation is fairly straightforward, so I'll leave it to you try to implement. Actually, there's a good chance that C# has the function built in. Just be sure not to include any zero lengths, actually they should probably be removed before you add them to the solution set to avoid issues, which requires a bit of tweaking to the code I gave.

However, you then have the tricky part, dealing with cases where different solutions have the same standard deviation. I'm thinking there are only two cases where this happens.

The first occurs only there are multiple ideal solutions. Such as if n = 24, then three of the solutions will be [8,8,8], [12,12], and [24].
The second occurs due to the brute force nature of the algorithm, and is why there are so many solutions. This is because for every solution like [8,10,12,14] (four unique lengths) there are 24 ways to arrange those lengths, such as [14,12,10,8] and [12,10,14,8]. So the best way to improve upon the brute force algorithm is to have an algorithm that multichooses 4 elements from [0, 8, 10, 12, 14, 16, 18, 20, 22, 24]. This narrows the solution set to only 715 solutions. Of course, if you actually want [8,10,12,14], [14,12,10,8] and [12,10,14,8] as different solutions, then there isn't much you can do.

The above two paragraphs fall squarely in the realm of "it depends". You'll have to decide what rules the algorithm should follow in each case, but I'm thinking those are the only two cases where you could find identical standard deviations.