Question

I'm a little confused here- would comparison of doubles still work correctly when they're stored as opaque (binary) fields? The problem I'm facing is the fact that the double includes a leading bit for the sign (i.e. positive or negative) and when they're stored as binary data I'm not sure it will be compared correctly:

enter image description here

I want to ensure that the comparison will work correctly, because I'm using a double as a part of a key tuple (e.g. ) in LevelDB and I want to preserve the data locality for positive and negative numbers. LevelDB only uses opaque fields as keys, but it does allow the user to specify his/her own comparator. However, I just want to make sure that I don't specify a comparator unless I absolutely need to:

// Three-way comparison function:
//   if a < b: negative result
//   if a > b: positive result
//   else: zero result
inline int Compare(const unsigned char* a, const unsigned char* b) const 
{
    if (*(double*)a < *(double*)b) return -1;
    if (*(double*)a > *(double*)b) return +1;
    return 0;
}
Was it helpful?

Solution

Making my comments an answer.

There are two things that could go wrong:

  1. If either (or both) parameters is NAN, comparisons will always return false. So even if the binary representation is the same, NAN == NAN will always be false. Furthermore, it violates comparison transitivity.

  2. If either parameter isn't properly aligned (since they are char pointers), you could run into problems on machines that don't support misaligned memory access. And for those that do, you may encounter a performance hit.

So to get around this problem, you'll need to add a trap case that will be invoked if either parameter turns out to be NAN. (I'm not sure on the status of INF.)

Because of the need for this trap case, you will need to define your own comparison operator.

OTHER TIPS

Yes, you have to specify your own comparison function. This is because doubles are not necessarily stored as 'big-endian' values. The exponent will not reside in memory before the mantissa even though logically it appears before the mantissa when the value is written out in big-endian format.

Of course, if you're sharing stuff between different CPU architectures in the same database, you may end up with weird endian problems anyway just because you stored stuff as binary blobs.

Lastly, even if you could control for endianness I would still not trust it. For example, if a double is not normalized it may not compare correctly to another double when compared as binary data.

Of course, everything the other person said about alignment and odd values like NAN and INF are important to pay attention to when writing a comparison function. But, as far as whether you should write one at all, I would have to say that it would be a really good idea.

I assume that your number format conforms to the IEEE 754 standard. If that's the case, then a simple signed-integer comparison won't work -- if both numbers are negative, the result of the comparison is reversed. So you do have to provide your own comparator.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top