Time-space tradeoff for missing element problem

https://cs.stackexchange.com/questions/2464

16-10-2019
|

Pergunta

Here is a well-known problem.

Given an array $A[1\dots n]$ of positive integers, output the smallest positive integer not in the array.

The problem can be solved in $O(n)$ space and time: read the array, keep track in $O(n)$ space whether $1,2,\dots,n+1$ occured, scan for smallest element.

I noticed you can trade space for time. If you have $O(\frac{n}{k})$ memory only, you can do it in $k$ rounds and get time $O(k n)$. In a special case, there is obviously constant-space quadratic-time algorithm.

My question is:

Is this the optimal tradeoff, i.e. does $\operatorname{time} \cdot \operatorname{space} = \Omega(n^2)$? In general, how does one prove such type of bounds?

Assume RAM model, with bounded arithmetic and random access to arrays in O(1).

Inspiration for this problem: time-space tradeoff for palindromes in one-tape model (see for example, here).

Solução

This can be done in $O(n \log n)$ word operations and $O(1)$ words of memory (respectively $O(n \log^2 n)$ time and $O(\log n)$ memory in bit-level RAM model). Indeed, the solution will be based on the following observation.

Say there are $n_0$ even and $n_1$ odd numbers in range $[1, n + 1]$ (so $n_0 \approx n_1$ and $n_0 + n_1 = n + 1$). Then there is $b \in \{0, 1\}$ such that there are at most $n_b - 1$ values with parity $b$ in the input. Indeed, otherwise there are at least $n_0$ even and at least $n_1$ odd values in the input, meaning that there are at least $n_0 + n_1 = n + 1$ values in the input, contradiction (there are only $n$ of them). It means that we can continue searching for missing number only among the odd or even numbers only. The same algorithm can be applied to higher bits of binary notation as well.

So our algorithm will look like that:

Suppose that we already now that there are only $x$ values in the input with remainder modulo $2^b$ being equal to $r \in [0, 2^b)$ but there are at least $x + 1$ numbers in range $[1, n + 1]$ that have remainder $r$ modulo $2^b$ (at the start we know that for sure for $b = 0, r = 0$).
Say there are $x_0$ values in the input with remainder $r$ modulo $2^{b + 1}$ and $x_1$ values in the input with remainder $r + 2^b$ modulo $2^{b + 1}$ (we can find these numbers in a single pass through the input). Clearly, $x_0 + x_1 = x$. Moreover, because there are at least $x + 1$ numbers in the input with remainder $r$ modulo $2^b$, at least one of the pairs $(r, b + 1), (r + 2^b, b + 1)$ satisfies the requirements of the step $1$.
We have found the missing number when $2^b \geqslant n + 1$: there is only one number in range $[1, n + 1]$ that may have remainder $r$ modulo $2^b$ ($r$ itself if it is in range), so there are at most zero values in the input that have such remainder. So $r$ is indeed missing.

Clearly, the algorithm halts in $O(\log n)$ steps, each of them needs $O(n)$ time (single pass over the input array). Moreover, only $O(1)$ words of memory are required.

Outras dicas

If I understand your definitions, this can be done in linear time with constant space. This is obviously the lowest bound, because we need to at least read the entire input.

The answer given in this question satisfies.

It's impossible to run this with less time or space, and adding extra time or space is useless, so there's no space-time tradeoff here. (Observe that $n=O(n/k)$, so the tradeoff you observed doesn't hold asymptotically, in any case.)

In terms of your general question, I don't know of any nice theorems offhand which will help you prove space-time tradeoffs. This question seems to indicate that there isn't a (known) easy answer. Basically:

Suppose some language is decidable in $t$ time (using some amount of space) and $s$ space (using some amount of time). Can we find $f,g$ such that $L$ is decidable by $M$ which runs in $f(t,s)$ time and $g(t,s)$ space?

is unknown, and a strong answer would solve a lot of open problems (most notably about SC), implying that no easy solution exists.

EDIT: Ok, with repetition (but I'm still assuming that with an input of size $n$ the maximum possible number is $n+1$).

Observe that our algorithm needs to be able to differentiate between at least $n$ possible answers. Suppose at each pass through the data we can get at most $k$ pieces of data. Then we will need $n/k$ passes to differentiate all answers. Assuming $k=n/s$ then we run in $\frac{n}{n/s}n=sn$ time. So I think this proves what you want.

The difficulty is in showing that each time through we get only $k$ bits. If you assume that our only legal operation is =, then we're good. However, if you allow more complex operations, then you'll be able to get more information.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a cs.stackexchange