Proof for an algorithm to minimize $\max(a, b, c) - \min(a, b, c), a \in A, b \in B, c\in C$, A, B, C are arrays in ascending order

https://cs.stackexchange.com/questions/124526

29-09-2020
|

Question

Problem Statement

I came across this problem here. For given arrays $A$, $B$ and $C$ arranged in ascending order, we need to minimize the objective function $f(a, b, c) = \max(a, b, c) - \min(a, b, c), a \in A, b \in B, c\in C$.

It can be thought of as a problem to select a number from each of the three arrays such that the numbers are as close to each other as possible (max element is as close to min element as possible).

Solution

The editorial solution to the problem is based on a greedy approach running in linear time. Here are the steps, summarized:

The algorithm involves three pointers, one for each array.
Initially, all pointers point to the beginning of the arrays.
Till the end of atleast one of the arrays is reached, steps 4 and 5 are repeated.
the element combination formed by current pointer configuration is checked to see if it is the new minimum value of the objective function.
The pointer pointing to the least element is incremented to get a new configuration.

This is the C++ code for reference and reproducibility:

int f(int a, int b, int c){ //objective function
    return max(a, max(b, c)) - min(a, min(b, c));
}

int solve(vector<int> &A, vector<int> &B, vector<int> &C) {
    int i=0, j=0, k=0;
    int best = INT_MAX;

    while(i<A.size() && j<B.size() && k<C.size()){
        int mine = min(A[i], min(B[j], C[k]));
        best = min(best, f(A[i], B[j], C[k]));

        if(A[i] == mine)
            i++;
        else if(B[j] == mine)
            j++;
        else
            k++;
    }

    return best;
}

Observations

While this approach seems reasonable to me (and does work), I cannot convince myself of its correctness. I have made some observations about the nature of the problem and the algorithm, but I cannot seem to arrive at a solid reasoning for why this solution works. Any help towards a proof, or towards a reasoning for why this approach is correct would be greatly appreciated.

I started by thinking along the lines of finding a loop invariant, thinking that the pointers would always point to the best configuration for subarrays $A[0..i], B[0.j], C[0..k]$. This line of thought is incorrect (i, j, k point to sub optimal confirugations as well)

This is what I have come up with so far:

TL;DR: if any element except the minimum element is incremented(next element), the objective function would increase or stay the same(unfavourable). If the minimum element is incremented, the objective function might decrease, increase or stay the same. So, the only "hope" of finding a lower objective function is to increment the minimum element in that iteration.

consider that the elements being pointed to by the pointers are $x, y, z$ such that $x \le y \le z$. $x, y, z$ could belong to any of the three arrays. If the elements following elements $x, y, z$ in their respective arrays are elements $x^{+}, y^{+}, z^{+}$, then the solution asks for always incrementing the pointer pointing to $x$, so that it points to $x^{+}$.

Since x is the minimum element ans z is the maximum element, f$(x, y, z)=z-x=f_{old}$.

If we increment $z$ to $z^{+}$:

$f(x, y, z^{+})=z^{+}-x \ge f_{old}$, as $z^{+} \ge z$.

So, $f_{new}\ge f_{old}$

If we increment $y$ to $y^{+}$:

If $y^{+}<=z$, $f(x, y^{+}, z)=z-x = f_{old}$.
If $y^{+}>z$, $f(x, y^{+}, z)=y^{+}-x \ge f_{old}$

So, $f_{new}\ge f_{old}$

If we increment $x$ to $x^{+}$:

If $x^{+} < y$, $f(x^{+}, y, z)=z-x^{+} \le f_{old}$
If $y \le x^{+} \le z$, $f(x^{+}, y, z)=z-y \le f_{old}$
If $z<x^{+} \le z+(y-x)$, $f(x^{+}, y, z) = x^{+}-y \le z-x$ $(= f_{old})$
If $x^{+}>z+(y-x)$, $f(x^{+}, y, z) = x^{+}-y > z-x$ $(= f_{old})$

So, $f_{new}\le f_{old}$ as long as $x^{+} \le z+(y-x)$.

I have a hunch that for the solution to work, in the case where $f_{new}> f_{old}$, when $x^{+} > z+(y-x)$, it must be impossible to get a lesser objective function without incrementing all pointers, however, I cannot prove this.

Nonetheless, none of these observations convince me that the method is correct (although I know that it is). If someone could make a loop invariant condition for this solution and the configuration of pointers, that would be the most straightforward proof.

Solution

WLOG assume that no array contains duplicate values.

We make the following claim: at no point in the algorithm, for the current state $(a, b, c)$, there exists $b' \in B$ such that $a < b' < b$ (and the same holds for all other pairs of arrays, both ways).

Initially this holds. In state $(a, b, c)$, after switching the minimum element, WLOG a, for the next value $a'$ in its array, we get the state $(a', b, c)$. The claim still holds trivially for all pairs, except potentially $b, a' $ and $c, a'$. But if it doesn't hold for $b, a'$, then $b < a < a'$, and $a$ wasn't the minimum element, a contradiction.

Say that the optimal triplet is $(a, b, c)$. WLOG $a$ is its minimum value. Then $b, c$ are the minimum values at least $a$ in arrays $B$ and $C$. Look at the triplet $(a, b', c')$ we have in the algorithm when we first reach value $a$ as our pointed-to value from the first array. Then the previous state was $(a', b', c')$, $a' < a$. Thus if $b < b' $, we have $a' < a \leq b < b'$ and the previous state doesn't satisfy the claim, a contradiction.

Thus when we first reach $(a, b', c')$, we have $b' \leq b$ and $c' \leq c$. Since $b$ and $c$ are minimum values at least $a$, the minimum value won't be $a$ until we reach the state $(a, b, c)$, and the minimum value won't be $b$ or $c$ until it has been $a$, thus we reach the state $(a, b, c)$, which was to be proven.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange