qsort function in C unexpected behaviour?

https://stackoverflow.com/questions/20058894

31-07-2022
|

Domanda

I am experiencing weird and unexpected behaviour of the qsort() function. I have my list of nodes, and every of them contains two values based on which I want to sort my list (technically, an array).

For example:

If I have an original array that looks like this (1st element is p, 2nd element is t):

[0,10|1], [0.05|0], [0,10|0], [0,05|2], [0,10|2], [0,15|1], [0,05|1]

After it is sorted, it should look like this:

[0,05|0], [0,05|1], [0,05|2], [0,10|0], [0,10|1], [0,10|2], [0,15|1].

Struct node:

typedef struct node{
    int t;
    double p;
}node;

Compare function, called like qsort(nodes, num_of_node, sizeof(node), compare_pairs);

static int compare_pairs(const void *n1, const void *n2){

const node *na1= n1;
const node *na2= n2;

if(na1->p < na2->p) return -1;
if(na1->p > na2->p) return 1;

// At this point, value p is equal, so I am sorting based on t

if(na1->t < na2->t) return -1;
if(na1->t > na2->t) return 1;

return 0;

Problem

The unwanted behaviour starts at 3. STEP which looks like this:

List: [0.10 | 2] [0.10 | 999] [0.10 | 999] [0.15 | 999] [0.15 | 999] [0.15 | 1] [0.25 | 999]

And should look like this:

List: [0.10 | 2] [0.10 | 999] [0.10 | 999] [0.15 | 1] [0.15 | 999] [0.15 | 999] [0.25 | 999]

Initial list: [0.25 | 999] [0.15 | 999] [0.15 | 999] [0.10 | 999] [0.10 | 999] [0.05 | 999] [0.05| 999] [0.05 | 999] [0.05 | 999] [0.05 | 999]

STEP:

Erasing (min) node 0.050000...

Creating (new) node 0.100000...

List: [0.05 | 999] [0.05 | 999] [0.05 | 999] [0.10 | 1] [0.10 | 999] [0.10 | 999] [0.15 | 999] [0.15 | 999] [0.25 | 999]

STEP:

Erasing (min) node 0.050000...

Creating (new) node 0.100000...

List: [0.05 | 999] [0.10 | 1] [0.10 | 2] [0.10 | 999] [0.10 | 999] [0.15 | 999] [0.15 | 999] [0.25 | 999]

STEP:

Erasing (min) node 0.050000...

Erasing (min) node 0.100000...

Creating (new) node 0.150000...

List: [0.10 | 2] [0.10 | 999] [0.10 | 999] [0.15 | 999] [0.15 | 999] [0.15 | 1] [0.25 | 999]

STEP:

Erasing (min) node 0.100000...

Erasing (new) node 0.200000...

List: [0.10 | 999] [0.15 | 999] [0.15 | 999] [0.15 | 1] [0.20 | 1] [0.25 | 999]

STEP:

Erasing (min) node 0.100000...

Erasing (min) node 0.150000...

Creating (new) node 0.250000...

List: [0.15 | 999] [0.15 | 1] [0.20 | 1] [0.25 | 1] [0.25 | 999]

STEP:

Erasing (min) node 0.150000...

Erasing (new) node 0.300000...

List: [0.20 | 1] [0.25 | 1] [0.25 | 999] [0.30 | 1]

STEP:

Erasing (min) node 0.200000...

Erasing (min) node 0.250000...

Erasing (new) node 0.450000...

List: [0.25 | 999] [0.30 | 1] [0.45 | 1]

STEP:

Erasing (min) node 0.250000...

Erasing (min) node 0.300000...

Creating (new) node 0.550000...

List: [0.45 | 1] [0.55 | 1]

STEP:

Erasing (min) node 0.450000...

Erasing (min) node 0.550000...

Creating (new) node 1.000000...

List: [1.00 | 1]

General idea*

In every step two minimal nodes get deleted from list, and one new node gets inserted into the list. The node that gets inserted has the value of t for 1 bigger then the greatest in the list, except that it does not compare itself to the t value of 999. If the greatest in the list has t = 999, then the one inserted will have 1.

Find greatest t:

int max_t(node *nodes, int num, double p){
int max_t= 0;
int i;
for(i=0; i<num; i+=1){
    if(nodes[i].p== p && nodes[i].t != 999){
        if(nodes[i].t > max_t){
            max_t = nodes[i].t;
        }
    }
}
return max_t;

Main code:

node *nodes = malloc(num_of_nodes*sizeof(node));
int i;
for(i=0; i<num_of_nodes; i+=1){
    node n;
    n.t = 999;
    n.p = *(probabs+ i);
    *(nodes+i) = n;
}

qsort(nodes, num_of_nodes, sizeof(node), compare_pairs);

while(num_of_nodes> 1){

    printf("\n%d. STEP:\n", z);
    z += 1;

    // 2) Find two min nodes
    node *min_n1 = malloc(sizeof(node));
    node *min_n2 = malloc(sizeof(node));

    *min_n1 = nodes[0];
    printf("Erasing (min) node %lf...\n", min_n1->p);
    nodes= erase_node(nodes, min_n1, num_of_nodes);
    num_of_nodes -= 1;

    *min_n2 = nodes[0];
    printf("Erasing (min) node %lf...\n", min_n2->p);
    nodes= erase_node(nodes, min_n2, num_of_nodes);
    num_of_nodes-= 1;

    // 3) Create new node, add it to the list
    node *new_node = malloc(sizeof(node));
    new_node->p= min_n1->p + min_n2->p;
    double p = new_node->p;
    int max_t = max_t(nodes, num_of_nodes, p);
    new_node->t = max_t + 1;

    printf("Creating (new) node %lf...\n", new_node->p);
    nodes = add_node(nodes, new_node, num_of_nodes);
    num_of_nodes += 1;

    qsort(nodes, num_of_nodes, sizeof(node), compare_pairs);

    printf("List: ");
    int k;
    for(k=0; k<num_of_nodes; k+=1){
        printf("[%.2lf | %d]  ", nodes[k].p, nodes[k].t);
    }
    printf("\n");

Add / Remove node...

node *add_node(node *nodes, node *n, int num){
nodes = realloc(nodes, (num+1)*sizeof(node));
nodes[num] = *n;
return nodes;

node *erase_node(node *nodes, node *n, int num){
int i;
int index = 0;
for(i=0; i<num; i+=1){
    if(nodes_equal(&nodes[i], n)){
        index = i;
        break;
    }
}

for(i=index; i<num-1; i+=1){
    nodes[i] = nodes[i+1];
}

nodes= realloc(nodes, (num-1)*sizeof(node));

return nodes;

}

   int nodes_equal(node *n1, node *n2){

    return !memcmp(n1, n2, sizeof(node));
 }

Soluzione

The problem you are having is with floating point imprecision. None of the exact decimal numbers 0.1, 0.05 and 0.15 have exact representations in binary floating point.

Using the IEEE 64-bit double representation, the closest representable value to 0.15 is slightly less than 0.15, and the closest representable values to 0.05 and 0.10 are slightly greater than 0.05 and 0.10 respectively. With an implementation using round-to-nearest, this means that if you add up 0.05 and 0.10 you will end up with a number slightly greater than 0.15, and if you set a double to 0.15 directly you will end up with a number slightly less than 0.15. These will not compare equal.

This is what is apparently happening in your step 3. You remove two nodes with the values 0.05 and 0.10 (actually, as discussed their real values are slightly greater than those numbers) and add them together, ending up with a number slightly greater than 0.15. This compares greater than the existing nodes whose real values are slightly less than 0.15, so it sorts after them.

It's not clear if this really matters for your algorithm, though? Doesn't it end up at the same end state anyway? If it does matter, since you are apparently storing probabilities which range from 0.0 to 1.0 inclusive, you could use a decimal fixed point representation instead (eg store the probability multiplied by 10000 in a long int rather than a double, then just divide by 10000 for display).

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow