What is an algorithm for minimizing the standard deviation of m sums summed from n summands? [with attempt]
-
29-09-2020 - |
Question
I have m bins (sums) and n summands. Each summand goes into a bin. In order to minimize the standard deviation, I have a greedy algorithm that appears to accomplish this. I am not sure of the name, but would like to know more. All m bins must have a sum greater than zero at the end of the algorithm.
It seems simple:
sort the summands from highest to lowest.
for each summand in the summands: find the first available bin with minimum sum and place it in the bin
I haven't proved anything about it, but I've come up with a few test data sets and it appears to work.
EDIT --- Here is my attempt at an analysis:
The algorithm seeks to minimize the standard deviation of m sums summed from n summands.
The mean is always the sum of n summands divided by m, that is given.
To minimize the standard deviation, the algorithm makes the greedy choice to minimize the standard deviation, or variance (either). I want to prove that the greedy choice is always the optimal choice.
Suppose there is a bin m1, not the minimum sum bin, that is a better choice than minimum sum bin m. Adding to this bin will minimize the standard deviation around u.
In other words, placing the next value n into this m will maximally decrease the variance, meaning that we have maximally decreased the sum of (m_i - u)^2. NOTE: m1' = m1 + x, m' = m + x, where x is the next summand.
So: remaining_sum_var + (m1' - u)^2 + (m - u)^2 < remaining_sum_var + (m' - u)^2 + (m1 - u^2) where m1 is greater than m.
Simplify and factor:
(m1' - u)^2 + (m - u)^2 < (m' - u)^2 + (m1 - u^2)
m1'^2 - 2um1' + 2u^2 + m^2 - 2um < m'^2 - 2um' + 2u^2 + m1^2 - 2um1
m1'^2 - 2um1' + m^2 - 2um < m'^2 - 2um' + m1^2 - 2um1
Ignoring linear terms whose difference will be constant (linearity).
m1'^2 + m^2 < m'^2 + m1^2
If we have added summand x to m1 this expands to:
(m1 + x)^2 + m^2 < (m + x)^2 + m1^2
m1^2 + 2m1x + x^2 + m^2 < m^2 + 2mx + x^2 + m1^2
2m1x < 2mx
m1 < m, this is a contradiction. So it cannot be the case that there is a better choice than minimum sum bin m.
Solution
No, your algorithm does not always produce the optimal solution. When $m=2$, this is (at least as hard as) the Partition problem, which is NP-hard. Wikipedia discusses the greedy algorithm and lists a counterexample which shows that your algorithm does not work.
OTHER TIPS
I have discovered that the algorithm is the LPT Algorithm. It is covered here along with other algorithms for an example task: scheduling on parallel machines with fixed jobs. The problem is NP hard and NP complete. The polynomial time algorithms are approximations.
https://depositonce.tu-berlin.de/bitstream/11303/11089/3/grigoriu_liliana_2019.pdf