Extension of Binary search algo to find the first and last index of the key value to be searched in an array

https://stackoverflow.com/questions/2218931

19-09-2019
|

Question

The problem is to extend the binary search algorithm to find all occurrences of a target value in a sorted array in the most efficient way. Concretely speaking, the input of the algorithm is (1) a sorted array of integers, where some numbers may appear more than once, and (2) a target integer to be searched. The output of the algorithm should be a pair of index values, indicating the first and last occurrence of the integer in the array, if it does occur. The source code could be in c#, c, c++.

Also, what is the max and min number of comparisons that we might need to find the indexes?

Solution

If you are a little clever you can define two different binary search functions. One will return the index of the first appearance of the searched for value and the other will return the last appearance of the searched for value. From your knowledge of binary search, you should be able to determine the maximum and minimum number of comparisons.

Using two binary searches should be the fastest method on average in my opinion. For instance, if you use just one binary search to find the first item and search linearly afterwards the worst case would be if the entire function is the same value. For an array of length 10000, this would give 10013 comparisons in the worst case while using two binary searches would give 28 comparisons in the worst case for the same array. Of course, using the same size of array, the best case for the binary/linear search method would be 14 comparisons while the best case for two binary searches method is 26 comparisons.

** Update

Okay, here is a binary search to find the first appearance of an element in an array. I'll give you a recursive function (you can of course make it iterative and optimize this in other ways). This searches for the int val in the array a of ints. Also, I haven't been careful about finding the midpoint (if the array is really large there could be problems).

int bs1(int a[], int val, int left, int right)
{
    if(right == left) return left;
    int mid = (right+left)/2;

    if(val > a[mid]) return bs1(a, val, mid+1, right);
    else return bs1(a, val, left, mid);
}

However, you should check after you are returned an index that it actually refers to the correct value because if val is not in the array, the returned index will to correspond to the next element larger than val.

A few minor changes to this will make a function that finds the last element. The keys to doing this are using the comparators correctly and remembering that integer division always truncates.

OTHER TIPS

For C++, you could look up std::equal_range() and its complexity requirements. As long as you're interested in the basic algorithm, the same general rules should apply regardless of the language use for the implementation.

This is fairly easy to do without writing your own binary search algorithm, by repeatedly calling a standard algorithm.

// some curly-bracket language:

// int BinarySearch(sortedList, searchIndex, searchLength, valueToFind)
// returns the zero-based index of the item in the list, or a negative value
// if the item is not found

int inner = BinarySearch(list, 0, listSize, value);
if(inner < 0){
    // handle case where value is not found in list
}

int bottom = inner, top = inner;
while(true){
    int i = BinarySearch(list, 0, bottom, value);
    if(i < 0)
        break;
    bottom = i;
}
while(true){
    int i = BinarySearch(list, top + 1, listSize - top - 1, value);
    if(i < 0)
        break;
    top = i;
}

// bottom and top now hold the bounds of all instances of value in list

This is pretty close to the same efficiency you'd get with a custom algorithm, except that you have more function call overhead.

As for the number of comparisons, I'd have to think a little harder to be sure, but I think it's just 2*log₂N, where N is the number of items in the list.

Edit

Bah! It's not 2*log₂N, because unlike what you could do with a custom algorithm, it doesn't incrementally exclude portions of the list. It appears¹ that the maximum comparison count is (log₂N - 0.5) * log₂N. This is still only 885 comparisons for a list with 2³⁰ elements (390 comparisons for 2²⁰ N, and 95 for 2¹⁰ N), but we can do better than that.

// int Compare(a, b)
// returns 0 if a and b are equal,
//         a negative value if a < b, or
//         a positive value if a > b

int start = 0, end = listSize, inner;

while(true){
    if(end == start){
        // handle case where value is not found in list
    }
    inner = (start + end) / 2;
    int cmp = Compare(list[inner], value);
    if(cmp == 0)
        break;
    if(cmp < 0)
        start = inner + 1;
    else end = inner;
}

int top = inner, bottom = inner;

while(true){
    if(start >= bottom)
        break;
    inner = (start + bottom) / 2;
    int cmp = Compare(list[inner], value);
    if(cmp == 0)
        bottom = inner;
    else start = inner + 1;
}

while(true){
    if(end - 1 <= top)
        break;
    inner = (top + 1 + end) / 2;
    int cmp = Compare(list[inner], value);
    if(cmp == 0)
        top = inner;
    else end = inner;
}

This will do at most 2*log₂N comparisons. 2³⁰ items will require at most 60 comparisons, 2²⁰ items will require at most 40 comparisons, etc.

_{¹ I determined this experimentally. I'm not quite smart enough to figure it out mathematically.}

You can find the discussion on this in Bentley Programming Pearls and Knuth's Vol.3 : Sorting and Searching.

Here is one implementation in C++ : http://the-algo-blog.blogspot.com/2011/06/binary-search-to-find-last-and-first.html

There's no clean answer to the most efficient part of the question. That would depend on how many entries with the same value is to be expected. If it's a few the a linear search in both directtions of the array after finding one element will be you're fastest option but if you're expecting a lot of entries with the same value you could do kind of a binary search to find the start end indices.

Disclaimer: Not tested; it's meant to show the idea and not to be used directly as production code

int org = binarySearch(array,value) //do the binary search and find on element
int min = org-delta; //delta is some constant based on how many elemts are to be expected
int max = org;
min = min < 0 ? 0 : min;
int search= min;
bool latestWasHit = false;
while(search > 0)
{
  if(search+1 == max)
     return max;
  if(array[search] != value)
  {
     min = search;
     search = search + (max-search)/2
  }
  else
  {
     max = search;
     search = (search-min)/2;
  } 
}

and then the reverse for the upper bound. However it will require quite a lot of elements before this is faster than a simple linear search.

I imagine that the normal algorithm would have something like this in it:

if(value == test) return;
if(value < test) min = i;
if(value > test) max = i;

Once you have used this to find one of the values, perform two more slightly moded binary searches using the min and max you currently have to find the tips.

To find the top most replace the above with:

if(value <= test) min = i;
if(value > test) max = i;

for the bottom most replace with:

if(value >= test) max = i;
if(value < test) min = i;

Note there is no early return using this method, you just keep going until min and max are like one or something apart, I suppose you could add one with another check

if(value == test and arr[i-1] != test) return;

etc.

I have created two binary search methods for returning first and last occurrences respectively.

public static void main(String[] args) {
    int a[] ={1,2,2,2,2,2,5,5,6,8,9,10};

    System.out.println(5+" first = "+first(a, 5, 0, a.length-1));
    System.out.println(5+" last = "+right(a, 5, 0, a.length-1));

    System.out.println(1+" first = "+first(a, 1, 0, a.length-1));
    System.out.println(1+" last = "+right(a, 1, 0, a.length-1));

    System.out.println(2+" first = "+first(a, 2, 0, a.length-1));
    System.out.println(2+" last = "+right(a, 2, 0, a.length-1));

    System.out.println(10+" first = "+first(a, 10, 0, a.length-1));
    System.out.println(10+" last = "+right(a, 10, 0, a.length-1));

    System.out.println(8+" first = "+first(a, 8, 0, a.length-1));
    System.out.println(8+" last = "+right(a, 8, 0, a.length-1));

    System.out.println(11+" first = "+first(a, 11, 0, a.length-1));
    System.out.println(11+" last = "+right(a, 11, 0, a.length-1));


}

private static int first(int [] a, int x, int l, int h){
    if(l>h){
        return -1;
    }
    int mid = (h-l)/2+l;
    if(a[mid] == x && (mid==0 || a[mid-1] != x) ){
        return mid;
    }else if(a[mid] == x){
        return first(a, x, l, mid-1);
    }else if(a[mid]>x){
        return first(a, x, l, mid-1);
    }else{
        return first(a, x, mid+1, h);
    }
}


private static int right(int [] a, int x, int l, int h){
    if(l>h){
        return -1;
    }
    int mid = (h-l)/2+l;
    if(a[mid] == x && (mid==a.length-1 || a[mid+1] != x) ){
        return mid;
    }else if(a[mid] == x){
        return right(a, x, mid+1, h);
    }else if(a[mid]>x){
        return right(a, x, l, mid-1);
    }else{
        return right(a, x, mid+1, h);
    }
}

Output:
    1 first = 0
    1 last = 0
    2 first = 1
    2 last = 5
    10 first = 11
    10 last = 11
    8 first = 9
    8 last = 9
    11 first = -1
    11 last = -1

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow