Pergunta

We receive a stream of $n-1$ pairwise different numbers from the set $\left\{1,\dots,n\right\}$.

How can I determine the missing number with an algorithm that reads the stream once and uses a memory of only $O(\log_2 n)$ bits?

Foi útil?

Solução

You know $\sum_{i=1}^n i = \frac{n(n+1)}{2}$, and because $S = \frac{n(n+1)}{2}$ could be coded in $O(\log(n))$ bits this can be done in $O(\log n)$ memory and in one path (just find $S - \mathrm{currentSum}$, this is missing number).

But this problem could be solved in general case (for constant $k$): we have $k$ missing numbers, find out all of them. In this case instead of calculating just sum of $y_i$, calculate sum of j'st power of $x_i$ for all $1\le j \le k$ (I assumed $x_i$ is missing numbers and $y_i$ is input numbers):

$\qquad \displaystyle \begin{align} \sum_{i=1}^k x_i &= S_1,\\ \sum_{i=1}^k x_i^2 &= S_2,\\ &\vdots \\ \sum_{i=1}^k x_i^k &= S_k \end{align}$ $\qquad (1)$

Remember that you can calculate $S_1,...S_k$ simply, because $S_1 = S - \sum y_i$, $S_2 = \sum i^2 - \sum y_i^2$, ...

Now for finding missing numbers you should solve $(1)$ to find all $x_i$.

You can compute:

$P_1 = \sum x_i$, $P_2 = \sum x_i\cdot x_j$, ... , $P_k = \prod x_i$ $(2)$.

For this remember that $P_1 = S_1$, $P_2 = \frac{S_1^2 - S_2}{2}$, ...

But $P_i$ is coefficients of $P=(x-x_1)\cdot (x-x_2) \cdots (x-x_k)$ but $P$ could be factored uniquely, so you can find missing numbers.

These are not my thoughts; read this.

Outras dicas

From the comment above:

Before processing the stream, allocate $\lceil \log_2 n \rceil$ bits, in which you write $x:= \bigoplus_{i=1}^n \mathrm{bin}(i)$ ($\mathrm{bin}(i)$ is the binary representation of $i$ and $\oplus$ is pointwise exclusive-or). Naively, this takes $\mathcal{O}(n)$ time.

Upon processing the stream, whenever one reads a number $j$, compute $x := x \oplus \mathrm{bin}(j)$. Let $k$ be the single number from $\{ 1, ... n\}$ that is not included in the stream. After having read the whole stream, we have $$ x = \left(\bigoplus_{i=1}^n \mathrm{bin}(i)\right) \oplus \left(\bigoplus_{i \neq k } \mathrm{bin}(i)\right) = \mathrm{bin}(k) \oplus \bigoplus_{i \neq k } (\mathrm{bin}(i) \oplus \mathrm{bin}(i)) = \mathrm{bin}(k), $$ yielding the desired result.

Hence, we used $\mathcal{O}(\log n)$ space, and have an overall runtime of $\mathcal{O}(n)$.

HdM's solution works. I coded it in C++ to test it. I can't limit the value to $O(\log_2 n)$ bits, but I'm sure you can easily show how only that number of bits is actually set.

For those that want pseudo code, using a simple $\text{fold}$ operation with exclusive or ($\oplus$):

$$\text{Missing} = \text{fold}(\oplus, \{1,\ldots,N\} \cup \text{InputStream})$$

Hand-wavey proof: A $\oplus$ never requires more bits than its input, so it follows that no intermediate result in the above requires more than the maximum bits of the input (so $O(\log_2 n)$ bits). $\oplus$ is commutative, and $x \oplus x = 0$, thus if you expand the above and pair off all data present in the stream you'll be left only with a single un-matched value, the missing number.

#include <iostream>
#include <vector>
#include <cstdlib>
#include <algorithm>

using namespace std;

void find_missing( int const * stream, int len );

int main( int argc, char ** argv )
{
    if( argc < 2 )
    {
        cerr << "Syntax: " << argv[0] << " N" << endl;
        return 1;
    }
    int n = atoi( argv[1] );

    //construct sequence
    vector<int> seq;
    for( int i=1; i <= n; ++i )
        seq.push_back( i );

    //remove a number and remember it
    srand( unsigned(time(0)) );
    int remove = (rand() % n) + 1;
    seq.erase( seq.begin() + (remove - 1) );
    cout << "Removed: " << remove << endl;

    //give the stream a random order
    std::random_shuffle( seq.begin(), seq.end() );

    find_missing( &seq[0], int(seq.size()) );
}

//HdM's solution
void find_missing( int const * stream, int len )
{
    //create initial value of n sequence xor'ed (n == len+1)
    int value = 0;
    for( int i=0; i < (len+1); ++i )
        value = value ^ (i+1);

    //xor all items in stream
    for( int i=0; i < len; ++i, ++stream )
        value = value ^ *stream;

    //what's left is the missing number
    cout << "Found: " << value << endl;
}
Licenciado em: CC-BY-SA com atribuição
Não afiliado a cs.stackexchange
scroll top