How to efficiently serialize 64-bit floats so that the byte arrays preserve natural numeric order?
-
07-07-2021 - |
Question
The project I'm working on requires me to store javascript numbers(which are doubles) as BLOB primary keys in a database table(can't use the database native float data type). So basically I need to serialize numbers to a byte array in such a way that:
1 - The length of the byte array is 8(which is what is normally needed to serialize doubles)
2 - the byte arrays must preserve natural order so the database will transparently sort rows in the index b-tree.
A simple function that takes a number and returns an array of numbers representing the bytes is what I'm seeking. I prefer the function to be written in javascript but answers in java, C, C#, C++ or python will also be accepted.
Solution
To meet the sorting requirement, you need to:
- Use big-endian representation
- If the sign bit is 0, flip it to 1. Check the actual bit -- comparing the original number to 0 gives a different result in special cases such as the negative zero.
- If the sign bit is 1, flip all bits
Python 3 code:
import struct
def invert(x):
return bytes(c ^ 255 for c in x)
def tobin(x):
binx = struct.pack('>d', x)
if binx > b'\x80': #negative
return invert(binx)
else:
return struct.pack('>d', -x)
data = [float('-inf'), -100.0, -2.0, -.9, -.1, -0.0,
0.0, .1, .9, 2.0, 100.0, float('inf'), float('nan')]
print(sorted(data, key=tobin))
#[-inf, -100.0, -2.0, -0.9, -0.1, -0.0, 0.0, 0.1, 0.9, 2.0, 100.0, inf, nan]
On Python 2 change invert
to:
def invert(x):
return "".join(chr(ord(c) ^ 255) for c in x)
For reference, here is the equivalent node.js which already implements a Big Endian serialization function through the 'Buffer' class:
function serialize(n) {
var buffer = new Buffer(8);
var l = buffer.length;
buffer.writeDoubleBE(n, 0);
if (buffer[0] < 0x80) {
buffer[0] ^= 0x80;
} else {
for (var i = 0; i < l; i++)
buffer[i] ^= 0xff;
}
return buffer
}
function deserialize(buffer) {
var l = buffer.length;
// 0x80 is the most significant byte of the representation of
// the first positive number(Number.MIN_VALUE)
if (buffer[0] >= 0x80) {
buffer[0] ^= 0x80;
} else {
for (var i = 0; i < l; i++)
buffer[i] ^= 0xff;
}
return buffer.readDoubleBE(0);
}
OTHER TIPS
DataView provides an API to write doubles in 8 byte containers.
with methods getFloat64()
and setFloat64()
The link mentioned apparently doesn't implement setFloat64(), but provides some perspective at least what's to be involved in writing such a function.
https://github.com/vjeux/jDataView
For browsers, the definitely best and fastest approach would be using Float64Array from typed arrays (and fixing possibly the byte order).
The obvious answer is to remove the restriction that you can't use the native database type. I can't see any point in it. It's still 8 bytes and it does the ordering for you without any further investigation, work, experiment, testing etc being necessary.