Intelligent use of XOR operator to find missing number

https://cs.stackexchange.com/questions/127360

29-09-2020
|

Question

I've come across the following problem on leetcode & tried to solve it with the following code however there seems to be an even better solution that takes advantage of XOR. Leetcode has a description for XOR solution however I can't grasp it in its entirety.

I just can't wrap my head around why we need to initialize missing variable with length of array, why not initialize it with Zero & when we do initialize it with Zero, why can't this algo find the missing number? Can someone please explain it?

Following was my solution before I learned Leetcode suggested XOR solution is even better & faster

class Solution {
    public int missingNumber(int[] nums) {
        if(nums.length == 1)
            return nums[0] == 0 ? 1 : 0;

        int missing = 0;
        boolean tempNums[] = new boolean[nums.length+1]; //cuz one is missing

        for(int i:nums) {
            tempNums[i] = true;
        }

        for(int i=0; i<tempNums.length; i++)
            if(!tempNums[i]) {
                missing = i;
                break;
            }
        return missing;
    }
}

Solution

Let us use a simpler example to verify the fact.

Suppose $n=1$, i.e., we are given an array containing one (distinct) number taken from 0, 1.

If the given number is 0, then the missing number must be $1=1\land0$.
If the given number is 1, then the missing number must be $0=1\land1$.

To summarize, if the given number is $a$, then the missing number is $1\land a=n\land a\not=0\land a$.

What are the fundamental reasons behind this magic in general situations?

It turns out the bitwise XOR operator, commonly denoted by "$\land$" in programming languages, behaves like the ordinary plus operator.

It is commutative and associative. So the order of operations does not matter at all.
$0$ functions as a zero, i.e., $a\land 0=a$. So $0$'s can be removed.
$a\land a = 0$ for every number $a$. So if the same number appear twice, they cancel each other.

Outside of programming languages, the bitwise XOR is more commonly written as "$\oplus$", where the plus symbol, "$+$" in the center reminds us it behaves like ordinary addition.

Now consider the expression, $$(0\oplus1\oplus2\oplus3\cdots\oplus n)\oplus( a_0\oplus a_1\oplus a_2\cdots\oplus a_{n-1}). $$ Whenever there is a pair of the same number, we can remove them. So all numbers are cancelled out between each other, except that missing number, since that missing number is the only number that appears in $$0\oplus1\oplus2\oplus3\cdots\oplus n$$ but not in $$a_0\oplus a_1\oplus a_2\cdots\oplus a_{n-1}.$$ So,

$$(0\oplus1\oplus2\oplus3\cdots\oplus n)\oplus( a_0\oplus a_1\oplus a_2\cdots\oplus a_{n-1})=\text{the missing number} $$

Let us change the order of "summation" on the left-hand-side. We can pair $0$ with $a_0$, $1$ with $a_1$, $2$ with $a_2$, $\cdots$, $n-1$ with $a_{n-1}$, leaving only $n$ unpaired. We see that the left-hand-side is $$(0\oplus a_0)\oplus(1\oplus a_1)\oplus(2\oplus a_2)\oplus\cdots(n-1\oplus a_{n-1})\oplus n. $$ Now, just move $n$ to the front of the expression.

"I just can't wrap my head around why we need to initialize the missing variable with length of array. Why not initialize it with Zero & when we do initialize it with Zero, why can't this algo find the missing number?"

There is nothing special about initializing the missing variable with $n$, the length of array. You can certainly initialize it with any number, including zero. Then you should pair the remaining numbers with $a_0, a_2,\cdots, a_{n-1}$ (arbitrarily, each number exactly once). For example, we have

$$0\oplus(1\oplus a_0)\oplus(2\oplus a_1)\oplus\cdots(n\oplus a_{n-1})=\text{the missing number} $$

In fact, we do not have to do the pairing at all. We can even precompute directly $$f(n) = 0\oplus1\oplus2\oplus3\cdots\oplus n =\begin{cases} n&\text{ if }n=0\pmod4\\ 1&\text{ if }n=1\pmod4\\ n+1&\text{ if }n=2\pmod4\\ 0&\text{ if }n=3\pmod4\\ \end{cases}$$ and then compute $$f(n)\oplus a_0\oplus a_1\oplus a_2\cdots\oplus a_{n-1},$$ which will yield the missing number as well. Hence we have the following faster program to compute the missing number.

public int missingNumber(int[] nums) {
    final int length = nums.length;
    // missing = length, 1, length + 1, 0 when length = 0, 1, 2, 3 mod 4.
    int missing = length % 2 == 0 ? length + (length / 2 % 2) : (length + 1) / 2 % 2;
    for (int i : nums) missing ^= i;

    return missing;
}

The above approach is very similar to Approach #4 Gauss' Formula on Leetcode.

OTHER TIPS

Start with a list of any n integers. Duplicate the integers and calculate the XOR if these 2n integers. What’s the result? Rearrange the 2n integers in any way and calculate their XOR. What’s the result. Remove a number x from the list and calculate the XOR. What’s the result? Remove another number y and calculate the XOR. What’s the result?

Now go back to your problem, and the answers you gave before tell you immediately what your result will be.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange