How to find what numbers in a set add up to another given number?

https://stackoverflow.com/questions/1202540

05-07-2019
|

Question

Here's a problem that I seem to be running into working with an accounting system.

I have a set of transactions, but their sum does not equal the amount that the accounting department thinks that it should. They are not questioning the math, just the transactions being included :p

Is there an algorithm that would help me determine which transactions in the set should not be included in order for the sum to match a given amount.

Given Set:  
2  
4  
5  
7

Given Sum Amount:
13

Result Set:
2
4
7

Edit: There's less than 100 transactions in the set. Does anyone have a C# example as there is not one on the Solving the NP-complete problem in XKCD question?

Man, I should have gotten a CS degree.

Solution

This is the Subset Sum problem, which is NP-Complete. But that doesn't mean there isn't an algorithm for finding a subset sum.

OTHER TIPS

This is the Knapsack Problem and it's NP-Complete. You won't easily solve it exactly with anything except small input sets. For any decent-sized problem set, it's one of those lifetime-of-the-universe-to-solve problems.

That said, there are genetic-algorithm knapsack solvers out there.

As the above members have said, this is the Subset Sum problem (or knapsack problem). However, to say it cannot be done efficiently is not very precise. It can be done, just not in polynomial time. THe solution is actually quite simple using dynamic programming and recursion (and in pseudo-polynomial time).

Given integers [a_1, ... ,a_n] and a number T,

Define the array S[i,k] to denote if there is a subset of elements between a_1, ... a_i which add up to k. (This is a binary matrix).

We can then define a recursive relationship as follows:

S[i,k] = S[i-1,k] or S[i-1,k-a_j]

In words, this means we either use element a_i or we do not. The answer will be located at S[n,T].

What is the work load to construct matrix S? Well, S has n*T elements. To compute each element, we must do O(1) work. So the complete running time is O(n*T).

Now at this point, it seems that I have proven P=NP, as this seems to be a polynomial time algorithm. However, remember that we measure input size in binary, so T = 2^p for some p.

I don't think anyone would say that the above solution, when implemented properly is unreasonable. IN fact, for many reasonable problem sizes, it will perform admirably.

Also, there are some heuristics for solving this problem, but I am not an expert in that area.

This is a version of the knapsack problem. It's NP complete, so you're not going to get a good general answer. How big are your sets of transactions? Is it 5 like you showed, or is it more like 500?

OK. Lots of people have given the name of the problem and mentioned how NP hard it is . And in general, they are correct. However, you have a very specific case you need to solve. The first question to ask is whether or not you think your 100 transactions are close to being the right ones. You have some total ("your" total). They have some total. ("real" total). Some of your transactions are therefore bogus. If you suspect that there are only a few bogus transactions in there, then this isn't so bad. For example, let's consider the case where there is only one bogus transaction. In that case, we'd only have to check 100 different numbers. If there are 2 bogus trans, then you're looking at 100*99 checks, and things will get crazy at 4 bogus trans, with almost 100,000,000 steps. though if all you're doing is adding some int's that's not too terrible.

Another possible shortcut is to examine the nature of your data (incidentally, is it possible to post the 100 "numbers" and the expected sum?). How much is your sum over the real sum? Are there any values so big that eliminating them would make your sum suddenly lower than the real sum? If so, you know those values cannot be the bogus ones. For example, in your example, 7 is absolutely required.

        bool bBreak = false;
        int iEnd = 13;
        ArrayList ar1 = new ArrayList();
        ar1.Add(2);
        ar1.Add(4);
        ar1.Add(5);
        ar1.Add(7);

        String s1 = " ";
        foreach (int i in ar1)
        {
            if (i == iEnd)
            {
                s1 = "Result = " + i;
                bBreak = true;
            }
            if (bBreak) break;
            ArrayList ar2 = new ArrayList();
            ar2.AddRange(ar1);
            ar2.Remove(i);
            foreach (int j in ar2)
            {
                if ((i + j) == iEnd)
                {
                    s1 = "Results = " + i + ", " + j;
                    bBreak = true;
                }

                if (bBreak) break;
                ArrayList ar3 = new ArrayList();
                ar3.AddRange(ar2);
                ar3.Remove(j);
                foreach (int k in ar3)
                {
                    if (bBreak) break;
                    if ((i + j + k) == iEnd)
                    {
                        s1 = "Results = " + i + ", " + j + ", " + k;
                        bBreak = true;
                    }
                }
            }
        }
        Console.WriteLine(s1);

Yes this is possible. Not sure if this post is still opened. But you would want to do the Excel Solver add-in. Post all numbers, with 1s on the adjacent cell. Then put the desired output number.. then all the used numbers, will keep their adjacent "1", while the unused ones will turn to "0". Then do a filter formula that lists only those that have a "1" beside it.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow