Is there an algorithm which finds sorted subsequences of size three in $O(n)$ time?

https://cs.stackexchange.com/questions/1071

16-10-2019
|

سؤال

I want to prove or disprove the existence of an algorithm which, given an array $A$ of integers, finds three indices $i, j$ and $k$ such that $i < j < k$ and $A[i] < A[j] < A[k]$ (or finds that there is no such triple) in linear time.

This is not a homework question; I saw it on a programming forum framed as “try to implement such an algorithm.” I suspect that it is impossible after various experiments. My intuition tells me so, but that does not really count for anything.

I would like to prove it formally. How do you do it? I would ideally like to see a proof laid out step-by-step, and then if you are so inclined, some explanation of how to go about proving/disproving simple questions like this in general. If it helps, some examples:

[1,5,2,0,3] → (1,2,3)
[5,6,1,2,3] → (1,2,3)
[1,5,2,3] → (1,2,3)
[5,6,1,2,7] → (1,2,7)
[5,6,1,2,7,8] → (1,2,7)
[1,2,999,3] → (1,2,999)
[999,1,2,3] → (1,2,3)
[11,12,8,9,5,6,3,4,1,2,3] → (1,2,3)
[1,5,2,0,-5,-2,-1] → (-5,-2,-1)

I supposed that one could iterate over $A$, and each time there is an $i < j$ (our current $j$, that is), we make a new triple and push it onto an array. We continue stepping and comparing each triple until one of our triples is complete. So it's like [1,5,2,0,-5,-2,-1] → 1..2.. -5.. -2.. -1, [1,5,2,0,-5,-2,3,-1] → 1..2.. -5.. -2.. 3! But I think this is more complex than mere $\mathcal{O}(n)$ as the number of triples on our triple array would in the worst case correspond to the size of the input list.

المحلول

This is variation of Longest increasing subsequence problem; this is the solution presented on Wikipedia using two auxiliary arrays $M$ and $P$:

$M[j]$ — stores the position $k$ of the smallest value $A[k]$ such that there is an increasing subsequence of length $j$ ending at $A[k]$ on the range $k \leq i$ (note we have $j \leq k \leq i$ here, because $j$ represents the length of the increasing subsequence, and $k$ represents the position of its termination. Obviously, we can never have an increasing subsequence of length $13$ ending at position $11$. $k \leq i$ by definition).

$P[k]$ — stores the position of the predecessor of $A[k]$ in the longest increasing subsequence ending at $A[k]$.

In addition the algorithm stores a variable $L$ representing the length of the longest increasing subsequence found so far.

This algorithm runs in worst-case time $\Theta(n\log n)$. Your problem is a special case that allows you to return when $L=3$ which pushes runtime down to $O(n)$ because the binary search runs only on arrays of length at most two, which is therefore in time $O(1)$ as opposed to $\Theta(\log n)$ in the general case.

Consider the modified pseudo code:

 L = 0
 for i = 1, 2, ... n:
    binary search for the largest positive j ≤ L
      such that X[M[j]] < X[i] (or set j = 0 if no such value exists)
    P[i] = M[j]
    if j == L or X[i] < X[M[j+1]]:
       M[j+1] = i
       L = max(L, j+1)
   if L==3 : return true; // you can break here, and return true.
return false; // because L is smaller than 3.

نصائح أخرى

A note on methodology

I thought a bit about this problem, and came to a solution. When I read Saeed Amiri's answer, I realized that what I came up with was a specialized version of the standard longest subsequence finding algorithm for a sequence of length 3. I'm posting the way I came up with the solution, because I think it is an interesting example of problem solving.

The two-element version

Let's start small: instead of looking for three indices at which the elements are in order, let's look for two: $i < j$ such that $A[i] < A[j]$.

If $A$ is decreasing (i.e. $\forall i < j, A[i] \ge A[j]$, or equivalently $\forall i, A[i] \ge A[i+1]$), then there are no such indices. Otherwise, there is an index $i$ such that $A[i] < A[i+1]$.

This case is very simple; we'll try to generalize it. It shows that the problem as stated is not solvable: the requested indices do not always exist. So we will rather ask that the algorithm either returns valid indices, if they exist, or correctly claims that no such indices exist.

Coming up with the algorithm

I will use the term subsequence to mean an extract from the array $A$ consisting of indices that may not be consecutive ($(A[i_1],\ldots,A[i_m])$ with $i_1 < \dots < i_m$), and run to mean consecutive elements of $A$ ($(A[i],A[i+1],\ldots,A[i+m-1])$).

We just saw that the requested indices do not always exist. Our strategy shall be to study when the indices do not exist. We'll do this by supposing we're attempting to find the indices and seeing how our search might go wrong. Then the cases where the search doesn't go wrong will provide an algorithm to find the indices.

4,3,2,1,0

With two indices, we could find consecutive indices. With three indices, we might not be able to come up with $j=i+1$ and $k=j+1$. However, we can consider the case when there is a run of three strictly increasing elements ($A[i] < A[i+1] < A[i+2]$) to be solved, as it is easy to recognize such runs, and see how this condition might not be met. Suppose that the sequence has no strictly increasing run of length 3.

4,3,2,1,2,3,2,1,0

The sequence only has strictly increasing runs of length 2 (which I'll call ordered pairs for short), separated by a decreasing run of length at least 2. In order for a strictly increasing run $A[j] < A[j+1]$ to be part of an increasing 3-element sequence, there must be an earlier element $i$ such that $A[i] < A[j]$ or a later element $k$ such that $A[j+1] < A[k]$.

4,3,2,2.5,1.5,0.5,1,0

A case when neither $i$ nor $k$ exists is when each ordered pair is entirely lower than the next one. This isn't all: when the pairs are interlaced, we need to compare them more finely.

3,2,1,3.5,2.5,1.5,0.5,-0.5,1.25,-0.25 3,2,1,2.5,1.5,0.5,2,1,0

The leftmost element $i$ of an increasing subsequence needs to come early and be small. The next element $j$ needs to be larger, but as small as possible to be able to find a third larger element $k$. The first element $i$ is not always the smallest element in the sequence, and it is not always the first for which there is a subsequent larger element, either — sometimes there is a lower 2-element subsequence further on, and sometimes there is a better fit for the already-found minimum.

2.1,3,2,1,2.5,1.5,0.5,2,1,0 1,2,0,2.5,1.5,0.5

Going from left to right, we tentatively pick the smallest element as $i$. If we find a larger element further right, we pick this pair as a tentative $(i,j)$. If we find an even larger $k$, we've won. The key thing to note is that our pick of $i$ and our pick of $(i,j)$ are updated independently: if we have a candidate $(i,j)$ and we find $i' > j$ such that $A[i'] < A[i]$, $i'$ becomes the next candidate $i$ but $(i,j)$ remains. Only if we find $j'$ such that $A[j'] < A[j]$ will $(i',j')$ become the new candidate pair.

Statement of the algorithm

Given in Python syntax, but beware that I haven't tested it.

def subsequence3(A):
    """Return the indices of a subsequence of length 3, or None if there is none."""
    index1 = None; value1 = None
    index2 = None; value2 = None
    for i in range(0,len(A)):
        if index1 == None or A[i] < value1:
            index1 = i; value1 = A[i]
        else if A[i] == value1: pass
        else if index2 == None:
            index2 = (index1, i); value2 = (value1, A[i])
        else if A[i] < value2[1]:
            index2[1] = i; value2[1] = A[i]
        else if A[i] > value2[1]:
            return (index2[0], index2[1], i)
    return None

Proof sketch

index1 is the index of the minimum of the part of the array that has already been traversed (if it occurs several times, we retain the first occurrence), or None before processing the first element. index2 stores the indices of the increasing subsequence of length 2 in the already-traversed part of the array that has the lowest largest element, or None if such a sequence doesn't exist.

When return (index2[0], index2[1], i) runs, we have value2[0] < value[1] (this is an invariant of value2) and value[1] < A[i] (obvious from the context). If the loop ends without invoking the early return, either value1 == None, in which case there is no increasing subsequence of length 2 let alone 3, or value1 contains the increasing subsequence of length 2 that has the lowest largest element. In the latter case, we furthermore have the invariant that no increasing subsequence of length 3 ends earlier than value1; therefore the last element of any such subsequence, added to value2, would form an increasing subsequence of length 3: as we also have the invariant that value2 is not part of an increasing subsequence of length 3 contained in the already-traversed part of the array, there is no such subsequence in the whole array.

Proving the aforementioned invariants is left as an exercise for the reader.

Complexity

We use $O(1)$ additional memory, and traverse the array as a stream from left to right. We perform $O(1)$ processing for each element, leading to a run time of $O(n)$.

Formal proof

Left as an exercise to the reader.

There is an $O(n)$ time algorithm, which uses $O(n)$ space.

First, traverse the array left to right maintaining a stack and an auxiliary array which tells you for each element, the index of an element greater than it and to the right of it.

Initially the stack is empty and the aux array contains all $-1$s.

Each time you consider a new element in the array, if that element is greater than the top element of the stack, you pop it off the stack, and set the aux array element corresponding to the top to have the index of the new element under consideration.

Continue popping elements off the stack and setting the corresponding index, while the current element is greater. Once the top has an element which is not lesser (or becomes empty), push the current element onto the stack, and proceed to the next element of the array, repeating the above step.

Make another pass (and another aux array), but going right to left.

Make a pass through the aux arrays and pick a position where both the corresponding entries in the array are not $-1$.

Since you consider each element of the array only a constant number of times, this is $O(n)$ time.

I believe you can also get rid of the stacks (by considering "suffix maximums" and "prefix minimums"), but the stacks give you the closest elements. So if you wanted to minimize say $k-i$, you could use the stacks.

Pseudo code for the first pass might look like this:

Stack <Pair<Elem, Index>> greats;
Elem auxArr[inputArr.Length];

for (Index i = 0; i < inputArr.Length; i++) {

    while (!greats.IsEmpty() && inputArr[i] > greats.PeekTop().Elem) {
        Pair top = greats.Pop();
        auxArr[top.Index] = i;
    }

    Pair p;
    p.Elem = inputArr[i];
    p.Index = i;

    greats.Push(p);
}

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى cs.stackexchange