array validation - without using an auxiliary array

https://stackoverflow.com/questions/8711397

13-04-2021
|

Question

This question is for real brainiacs, cause it should be done without an auxiliary array and has to be most efficient!

C program - needs to recieve an array with X numbers (suppose X=4 array : 5,4,3,2) and check if the array has all the numbers from 0 to X-1 (If X is 44 it needs to check if all numbers between 0 to 43 inside the array).

It has to be super efficient - I mean, running on the array 43 times is not an option!

Do you have any idea how to do this?? I'm trying to figure this one for hours without any success!!

It has to be O(n).

Solution

I don't understand what I'm missing in this question but AFAIK I don't see a reason why any (reasonable) solution should be anything more-/worse- than O(n) time (and space) complexity.

From the above comments and answers, I understand the following:

Negative numbers : I'm not sure whether negative numbers are allowed or not. The OP says check if all the array has all the numbers from 0 to X-1. So anything less than 0 is not expected in the array. So I assume negative numbers are not allowed
Duplicate numbers : referring to the same quote from the OP - check if all the array has all the numbers from 0 to X-1 I guess if X is the size of the array and all numbers from 0 to X-1 should be present, the I guess no duplicate numbers are allowed.

So making the above assumptions, we can use one bit to check whether i (0 <= i <= X-1) is present or not. If i is duplicate then it will fail mentioning that there is a duplicate number.

It scans the whole array one time - O(n) and just uses one bit per element, so X is 10 the X bits are needed - for example consider we need to handle 1000000 elements and sizeof(int) is 4 then we need 3.82M of memory to hold the array elements and only 0.48M is used for storing the presence of an element.

#include <stdio.h>

#define X 10
#define MIN 0
#define MAX X-1

int main(void)
{
    //int arr[X] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 0};
    //int arr[X] = {6, 7, 8, 9, 0, 5, 3, 2, 11, 12};
    //int arr[X] = {1, 1, 2, 4, 5, 6, 7, 8, 2, 2};
    int arr[X] = {1, 3, 2, 4, 5, 6, -7, 8, 2, 2}; 

    /* if X is more than sizeof(int) then change 
       the flag type to a wider one! */
    int flag = 0;

    int i;

    for(i=0 ;i<X ; i++) {
        if (arr[i] >= MIN && arr[i] <= MAX) {
            if (flag & 1<<arr[i]) {
                printf("Duplicate found : %d \n", arr[i]);
                return -1; 
            }   
            flag |= 1<<arr[i];
        } else {
            printf("Failure at %d : %d \n", i, arr[i]);
            return -1; 
        }   
    }   

    printf("Success \n");

    return 0;
}

OTHER TIPS

If changing order of the array is allowed you can use in-place sort algorithm, then check if for some i:

array[i] == array[i+1]

Time complexity can be O(n*lg n) then.

You can simplify the problem to finding duplicates.

Proof:

If the length of the array is not X => There are numbers missing. You can easily check that in O(1) or O(n).
Else => you have either all the correct numbers or there are duplicates.

Having that, you can use this implementation: Finding duplicates in O(n) time and O(1) space. Also make sure you check the bounds of the array. If the numbers are not inside the bounds, the array contains incorrect numbers.

This leads to an O(n) solution.

Sort both arrays (is in O(n log n)). Then treat both arrays as queues:

If the head elements of both queues are equal, print one of them and pop both.
If the head elements are not equal, pop the smaller.
Repeat.

You could sort the array, and then scan through it once. That should give you O(N log N) performance, better than the O(N^2) you'd need for the naive approach.

foreach(int i in array)
{
    int count = 0;
    foreach(int i2 in array)
    {
        if(i == i2)
        {
            count++;
        }
    }
    if(count > 1)
    {
        return false;
    }
}

return true;

See some brilliant answers here, a C++ implementation of the @caf's answer could be bool stop=true; // first place the element i at location A[i], i.e 4 at A[4]

for(int  i = 0; i<n; i++) {
    while (A[A[i]] != A[i]){ 
        swap(A[i], A[A[i]])
    }
}
// than u can have the second loop which does the decision 
for(int  i = 0; i<n && !stop; i++) {
    if (A[i] != i){ 
        stop = true;
    }
}

if (stop)
    printf("duplicate");
else
   printf("not duplicate)

an O(n) solution: The algorithm tries to put each element in the array onto its correct position, e.g. 1 onto a[0] and 2 onto a[1], by swapping with the element occupies the origin position.

at first, i = 1, a[i - 1] = 1, it's ok and nothing will be touched

i = 1
a = 1 6 3 4 5 7 1

then, i = 2, a[i - 1] = 6 != 2, then swap a[i - 1] and a[6 - 1]

i = 2 
a = 1 7 3 4 5 6 1

then, i is still 2, but a[i - 1] == 7 != 2, then swap a[i - 1] and a[7 - 1]

i = 2
a = 1 1 3 4 5 6 7

now i = 2 but we see that a[i - 1] == a[1 - 1], thus we find the duplicate

full source:

#include <stdio.h>

int main() {
    int a[7] = {1, 6, 3, 4, 5, 7, 1};
    int i, t;
    for (i = 0; i < 7; ++i) {
        while (a[i] != i + 1) {
            if (a[i] == a[a[i] - 1]) {
                printf("duplicate: %d\n", a[i]);
                goto out;
            } 
            t = a[i];
            a[i] = a[a[i] - 1];
            a[t - 1] = t;
        }
    }
    printf("no duplicate\n");
out:
    return 0;
}

You can use a modified merge operation (like the one used in mergesort) if the arrays are both sorted.

You can do [on average case] better then the suggested O(nlogn) solution.
There is an O(n) armotorized average case solution using hash tables:

hash <- new hash set
for each e in A:
  hash.add(e)
for each e in B:
  if hash.contains(e): print e

You can overcome printing twice elements if they appear twice in B by storing an additional hash set of 'printed' elements.

If latency or worst case performacne is an issue, use one of the sort and iterate solutions suggested.

Sort the smaller and use binary search to search it for each element in the larger. That way, you can do it in O((n1+n2)*log(n1)) where n1, n2 are the sizes of the arrays (n1 is the smaller).

Read this for an answer - array validation - without using an auxiliary array

an array of n elements which contains elements from 0 to n-1, with any of these numbers appearing any number of times.

For example, let n be 7 and array be {1, 2, 3, 4, 5, 4, 6}, the answer should FALSE

Isn't the above statements contradict themselves?!

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow