How to find the insertion point in an array bsearch() works on?

https://stackoverflow.com/questions/11245112

c
bsearch

17-06-2021
|

Question

using bsearch() in C (standard library) one can quickly find an entry in a sorted array.

However, how do I calculate where to insert a new entry (using the standard library)?

bsearch() specifically checks whether the found item's key is equal to the passed key and if it isn't, it returns NULL - so can't use that.

Solution

Its not clear from the question but possibly this is what you want:

You can do something like this to find the index in the array where bsearch () found the match.

if (bsearch_returned_address != NULL)
  index = (bsearch_returned_address - array_base_address)

EDIT

To know the location which the bsort last visited, check the below stuff out.

The good thing is that the manual says:

The compar routine is expected to have two arguments which point to the key object and to an array member, in that order, and should return an integer less than, equal to, or greater than zero if the key object is found, respectively, to be less than, to match, or be greater than the array member.

Therefore you can store the second argument inside the comparison function in a global variable, and in a case of the fail use the address in this variable, which points to the last location the bsearch function visited to find for a match.

For example:

A list with address and value:

[0x8d6010: 0][0x8d6014: 4][0x8d6018: 8][0x8d601c: 12][0x8d6020: 16][0x8d6024: 20][0x8d6028: 24][0x8d602c: 28][0x8d6030: 32][0x8d6034: 36]

Value to search

output

fmem: (nil)  //this is the memory location where it was found
last_mem1: 0x7fff8c8e6c54 //last val of 1st param of compare
last_mem2: 0x8d601c //last val of 2nd param of compare
*last_mem1: 13,        *last_mem2: 12

The sample compare function code is

static const int *last_mem1, *last_mem2;

static int
compmi(const void *a, const void *b)
{
    last_mem1 = a; last_mem2 = b;
    return *(int *)a - *(int *)b;
}

So you can insert after the address in last_mem2. Although there are terminal cases, if you find a key which is less than the first element, then last_mem2 will also have the address of the first element.

But how ever you have to shift the array elements to make place for the insertion, which will make the insertion complexity to O(n). You might want to improve performance by introducing some kind of lazy insertion, like make a separate unordered list, which is much smaller than your original list, and dump new elements there. When searching, perform bsearch in original list, and linear search in the dump. When the dump list grows past a certain threshold, you can merge the list by performing an insertion sort. But still, you can't be O(lg n).

OTHER TIPS

Here's an improvement upon @phoxis's answer that will make the code thread-safe and reentrant by avoiding any global variables. The trick is to use the key itself to store the last visited address.

bsearch_insertion.h

#include <stdlib.h>

#ifndef BSEARCH_INSERTION_H
#define BSEARCH_INSERTION_H

/* Just like bsearch(3), but return a pointer to the element after which
 * the key would need to be inserted in order to maintain sorted order. */
void *bsearch_insertion(
    const void *key, const void *base, size_t nel,
    size_t width, int (*compar)(const void *, const void *));

#endif /* BSEARCH_INSERTION_H */

bsearch_insertion.c

#include "bsearch_insertion.h"

typedef struct
{
    const void *key;
    int (*const compar)(const void *, const void *);
    void *last_visited;
} bsearch_insertion_state;

static int bsearch_insertion_compare(const void *a, const void *b)
{
    bsearch_insertion_state *state = (bsearch_insertion_state *) a;
    state->last_visited = (void *) b;
    return state->compar(state->key, b);
}

void *bsearch_insertion(
    const void *key, const void *base, size_t nel,
    size_t width, int (*compar)(const void *, const void *))
{
    bsearch_insertion_state state = {key, compar, NULL};
    bsearch(&state, base, nel, width, bsearch_insertion_compare);
    return state.last_visited;
}

Example: test.c

#include <stdio.h>
#include "bsearch_insertion.h"

static int cmp(const void *a, const void *b)
{
    int aint = *(const int *)a;
    int bint = *(const int *)b;
    return aint - bint;
}

int main(int argc, char **argv)
{
    int data[] = {0, 1, 2, 3, 5};
    int key = 4;
    void *result = bsearch_insertion(
        &key, data, sizeof(data) / sizeof(data[0]), sizeof(data[0]), cmp);
    /* Should print "Insertion point: 3" */
    printf("Insertion point: %d\n", (int *)result - data);
    return 0;
}

Not sure what you mean by "calculate insertion place"; you build an array, then sort it using qsort(), then do (many) searches using bsearch().

In other words: for typical usage, you don't need to implement the array-sorting, since the standard library contains functionality for that, too.

Not sure about the connection to bisecting, here.

UPDATE: From the comment, it seems you're concerned about doing inserts into an array that you're also doing searches from. I would recommend looking at some other data structure that is more friendly towards inserts, such as a hash table for instance. By not relying on sorting to keep searches fast, a hash table might perform better. Inserting into an array involves moving all the subsequent elements, which is also quite costly and which is not needed for e.g. a hash table.

UPDATE 2: To actually try to answer your question, assuming you have a bsearch()-comparible comparator() function for your array of n entries, the index for the new item ni should be given by something like this:

size_t i;

for( i = 0; i < n && comparator(&ni, array + i) >= 0; ++i )
  ;
/* Grow array, copy i..n to (i+1)..(n+1), insert ni at i. */

Because insert causes copying of tail of array, time is O(n). So simple linear search won't slow down your code dramatically. You can even copy items during searching, if you start searching from the end of array.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow