Determining the particular number in $O(n)$ time and space (worst case)

https://cs.stackexchange.com/questions/2863

16-10-2019
|

Question

$\newcommand\ldotd{\mathinner{..}}$Given that $A[1\ldotd n]$ are integers such that $0\le A[k]\le m$ for all $1\le k\le n$, and the occurrence of each number except a particular number in $A[1\ldotd n]$ is an odd number. Try to find the number whose occurrence is an even number.

There is an $\Theta(n\log n)$ algorithm: we sort $A[1\ldotd n]$ into $B[1\ldotd n]$, and break $B[1\ldotd n]$ into many pieces, whose elements' value are the same, therefore we can count the occurrence of each element.

I want to find a worst-case-$O(n)$-time-and-$O(n)$-space algorithm.

Supposing that $m=\Omega(n^{1+\epsilon})$ and $\epsilon>0$, therefore radix sort is not acceptable. $\DeclareMathOperator{\xor}{xor}$ Binary bitwise operations are acceptable, for example, $A[1]\xor A[2]$.

Solution

Here is an idea for a simple algorithm; just count all occurrences!

Find $m = \max A$. -- time $\Theta(n)$
"Allocate" array $C[0..m]$. -- time $O(1)$¹
Iterate over $A$ and increase $C[x]$ by one whenever you find $A[\_]=x$. If $C[x]$ was $0$, add $x$ to a linear list $L$. -- time $\Theta(n)$
Iterate over $L$ and find the element $x_e$ with $C[x_e]$ even. -- time $O(n)$.
Return $x_e$.

All in all, this gives you a linear-time algorithm which may use (in the sense of allocating) lots of memory. Note that being able to random-access $C$ in constant time independently of $m$ is crucial here.

An additional $O(n)$ bound on space is more difficult with this approach; I don't know of any dictionary data-structure that offers $O(1)$ time lookup. You can use hash-tables for which here are implementations with $O(1 + k/n)$ expected lookup time ($n$ the table's size, $k$ the number of stored elements) so you can get arbitrarily good with linear space -- in expectation. If all values in $A$ map to the same hash value, you are screwed.

On a RAM, this is implicitly done; all we need is the start position and maybe the end position.

OTHER TIPS

An almost trivial solution - which uses however $\Theta(n)$ space - is to use a hash map. Recall that a hash map has amortized runtime $\mathcal{O}(1)$ for adding and looking up elements.

Hence, we can use the following algorithm:

Allocate a hash map $H$. Iterate over $A$. For each element $i \in A$, increase the number of occurences seen, i.e. $H(i)++$.
Iterate through the key set of the hash map, and check which of the keys has an even count of occurences.

Now this is a simple algorithm which doesn't really use any large trick, but sometimes even this suffices. If not, you might want to specifiy what space restrictions you impose.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange