What is an algorithm for minimizing the standard deviation of m sums summed from n summands? [with attempt]

https://cs.stackexchange.com/questions/130125

29-09-2020
|

Question

I have m bins (sums) and n summands. Each summand goes into a bin. In order to minimize the standard deviation, I have a greedy algorithm that appears to accomplish this. I am not sure of the name, but would like to know more. All m bins must have a sum greater than zero at the end of the algorithm.

It seems simple:

sort the summands from highest to lowest.

for each summand in the summands: find the first available bin with minimum sum and place it in the bin

I haven't proved anything about it, but I've come up with a few test data sets and it appears to work.

EDIT --- Here is my attempt at an analysis:

The algorithm seeks to minimize the standard deviation of m sums summed from n summands.

The mean is always the sum of n summands divided by m, that is given.

To minimize the standard deviation, the algorithm makes the greedy choice to minimize the standard deviation, or variance (either). I want to prove that the greedy choice is always the optimal choice.

Suppose there is a bin m1, not the minimum sum bin, that is a better choice than minimum sum bin m. Adding to this bin will minimize the standard deviation around u.

In other words, placing the next value n into this m will maximally decrease the variance, meaning that we have maximally decreased the sum of (m_i - u)^2. NOTE: m1' = m1 + x, m' = m + x, where x is the next summand.

So: remaining_sum_var + (m1' - u)^2 + (m - u)^2 < remaining_sum_var + (m' - u)^2 + (m1 - u^2) where m1 is greater than m.

Simplify and factor:

(m1' - u)^2 + (m - u)^2 < (m' - u)^2 + (m1 - u^2)

m1'^2 - 2um1' + 2u^2 + m^2 - 2um < m'^2 - 2um' + 2u^2 + m1^2 - 2um1

m1'^2 - 2um1' + m^2 - 2um < m'^2 - 2um' + m1^2 - 2um1

Ignoring linear terms whose difference will be constant (linearity).

m1'^2 + m^2 < m'^2 + m1^2

If we have added summand x to m1 this expands to:

(m1 + x)^2 + m^2 < (m + x)^2 + m1^2

m1^2 + 2m1x + x^2 + m^2 < m^2 + 2mx + x^2 + m1^2

2m1x < 2mx

m1 < m, this is a contradiction. So it cannot be the case that there is a better choice than minimum sum bin m.

Solution

No, your algorithm does not always produce the optimal solution. When $m=2$, this is (at least as hard as) the Partition problem, which is NP-hard. Wikipedia discusses the greedy algorithm and lists a counterexample which shows that your algorithm does not work.

OTHER TIPS

I have discovered that the algorithm is the LPT Algorithm. It is covered here along with other algorithms for an example task: scheduling on parallel machines with fixed jobs. The problem is NP hard and NP complete. The polynomial time algorithms are approximations.

https://depositonce.tu-berlin.de/bitstream/11303/11089/3/grigoriu_liliana_2019.pdf

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange