maximum sum of a subset of size K with sum less than M

Question 1

Many people have commented correctly that the answer below from years ago, which uses dynamic programming, incorrectly encodes solutions allowing an element of the array to appear in a "subset" multiple times. Luckily there is still hope for a DP based approach.

Let dp[i][j][k] = true if there exists a size k subset of the first i elements of the input array summing up to j

Our base case is dp[0][0][0] = true

Now, either the size k subset of the first i elements uses a[i + 1], or it does not, giving the recurrence

dp[i + 1][j][k] = dp[i][j - a[i + 1]][k - 1] OR dp[i][j][k]

Put everything together:

given A[1...N]
initialize dp[0...N][0...M][0...K] to false
dp[0][0][0] = true
for i = 0 to N - 1:
    for j = 0 to M:
        for k = 0 to K:
            if dp[i][j][k]:
                dp[i + 1][j][k] = true
            if j >= A[i] and k >= 1 and dp[i][j - A[i + 1]][k - 1]:
                dp[i + 1][j][k] = true
max_sum = 0
for j = 0 to M:
    if dp[N][j][K]:
        max_sum = j
return max_sum

giving O(NMK) time and space complexity.

Stepping back, we've made one assumption here implicitly which is that A[1...i] are all non-negative. With negative numbers, initializing the second dimension 0...M is not correct. Consider a size K subset made up of a size K - 1 subset with sum exceeding M and one other sufficiently negative element of A[] such that overall sum no longer exceeds M. Similarly, our size K - 1 subset could sum to some extremely negative number and then with a sufficiently positive element of A[] sum to M. In order for our algorithm to still work in both cases we would need to increase the second dimension from M to the difference between the sum of all positive elements in A[] and the sum of all negative elements (the sum of the absolute values of all elements in A[]).

As for whether a non dynamic programming solution exists, certainly there is the naive exponential time brute force solution and variations that optimize the constant factor in the exponent.

Beyond that? Well your problem is closely related to subset sum and the literature for the big name NP complete problems is rather extensive. And as a general principle algorithms can come in all shapes and sizes -- it's not impossible for me to imagine doing say, randomization, approximation, (just choose the error parameter to be sufficiently small!) plain old reductions to other NP complete problems (convert your problem into a giant boolean circuit and run a SAT solver). Yes these are different algorithms. Are they faster than a dynamic programming solution? Some of them, probably. Are they as simple to understand or implement, without say training beyond standard introduction to algorithms material? Probably not.

This is a variant of the Knapsack or subset-problem, where in terms of time (at the cost of exponential growing space requirements as the input size grows), dynamic programming is the most efficient method that CORRECTLY solves this problem. See Is this variant of the subset sum problem easier to solve? for a similar question to yours.

However, since your problem is not exactly the same, I'll provide an explanation anyways. Let dp[i][j] = true, if there is a subset of length i that sums to j and false if there isn't. The idea is that dp[][] will encode the sums of all possible subsets for every possible length. We can then simply find the largest j <= M such that dp[K][j] is true. Our base case dp[0][0] = true because we can always make a subset that sums to 0 by picking one of size 0.

The recurrence is also fairly straightforward. Suppose we've calculated the values of dp[][] using the first n values of the array. To find all possible subsets of the first n+1 values of the array, we can simply take the n+1_th value and add it to all the subsets we've seen before. More concretely, we have the following code:

initialize dp[0..K][0..M] to false dp[0][0] = true for i = 0 to N: for s = 0 to K - 1: for j = M to 0: if dp[s][j] && A[i] + j < M: dp[s + 1][j + A[i]] = true for j = M to 0: if dp[K][j]: print j break

Question 2

We're looking for a subset of K elements for which the sum of the elements is a maximum, but less than M.

We can place bounds [X, Y] on the largest element in the subset as follows.

First we sort the (N) integers, values[0] ... values[N-1], with the element values[0] is the smallest.

The lower bound X is the largest integer for which

values[X] + values[X-1] + .... + values[X-(K-1)] < M.

(If X is N-1, then we've found the answer.)

The upper bound Y is the largest integer less than N for which

values[0] + values[1] + ... + values[K-2] + values[Y] < M.

With this observation, we can now bound the second-highest term for each value of the highest term Z, where

X <= Z <= Y.

We can use exactly the same method, since the form of the problem is exactly the same. The reduced problem is finding a subset of K-1 elements, taken from values[0] ... values[Z-1], for which the sum of the elements is a maximum, but less than M - values[Z].

Once we've bound that value in the same way, we can put bounds on the third-largest value for each pair of the two highest values. And so on.

This gives us a tree structure to search, hopefully with much fewer combinations to search than N choose K.

Question 3

Felix is correct that this is a special case of the knapsack problem. His dynamic programming algorithm takes O(K*M) size and O(K*K*M) amount of time. I believe his use of the variable N really should be K.

There are two books devoted to the knapsack problem. The latest one, by Kellerer, Pferschy and Pisinger [2004, Springer-Verlag, ISBN 3-540-40286-1] gives an improved dynamic programming algorithm on their page 76, Figure 4.2 that takes O(K+M) space and O(KM) time, which is huge reduction compared to the dynamic programming algorithm given by Felix. Note that there is a typo on the book's last line of the algorithm where it should be c-bar := c-bar - w_(r(c-bar)).

My C# implementation is below. I cannot say that I have extensively tested it, and I welcome feedback on this. I used BitArray to implement the concept of the sets given in the algorithm in the book. In my code, c is the capacity (which in the original post was called M), and I used w instead of A as the array that holds the weights.

An example of its use is:

int[] optimal_indexes_for_ssp = new SubsetSumProblem(12, new List<int> { 1, 3, 5, 6 }).SolveSubsetSumProblem();

where the array optimal_indexes_for_ssp contains [0,2,3] corresponding to the elements 1, 5, 6.

using System;
using System.Collections.Generic;
using System.Collections;
using System.Linq;

public class SubsetSumProblem
{
    private int[] w;
    private int c;

    public SubsetSumProblem(int c, IEnumerable<int> w)
    {
      if (c < 0) throw new ArgumentOutOfRangeException("Capacity for subset sum problem must be at least 0, but input was: " + c.ToString());
      int n = w.Count();
      this.w = new int[n];
      this.c = c;
      IEnumerator<int> pwi = w.GetEnumerator();
      pwi.MoveNext();
      for (int i = 0; i < n; i++, pwi.MoveNext())
        this.w[i] = pwi.Current;
    }

    public int[] SolveSubsetSumProblem()
    {
      int n = w.Length;
      int[] r = new int[c+1];
      BitArray R = new BitArray(c+1);
      R[0] = true;
      BitArray Rp = new BitArray(c+1);
      for (int d =0; d<=c ; d++) r[d] = 0;
      for (int j = 0; j < n; j++)
      {
        Rp.SetAll(false);
        for (int k = 0; k <= c; k++)
          if (R[k] && k + w[j] <= c) Rp[k + w[j]] = true;
        for (int k = w[j]; k <= c; k++) // since Rp[k]=false for k<w[j]
          if (Rp[k])
          {
            if (!R[k]) r[k] = j;
            R[k] = true;
          }
      }
      int capacity_used= 0;
      for(int d=c; d>=0; d--)
        if (R[d])
        {
          capacity_used = d;
          break;
        }
      List<int> result = new List<int>();
      while (capacity_used > 0)
      {
        result.Add(r[capacity_used]);
        capacity_used -= w[r[capacity_used]];
      } ;
      if (capacity_used < 0) throw new Exception("Subset sum program has an internal logic error");
      return result.ToArray();
    }
}