Question

I'm trying to store a very large number that is bigger than the 8 bytes that the INTEGER and REAL field types can hold. I need to be able to return the rows that contain a number in this field that's less than or greater than another large number that I specify. I'm not sure how to do this. It seems my only option is to store it as a TEXT, but then I run into problems when I try to compare using > and < in a query, since comparison of TEXT is not the same as a numeric comparison (there are issues when the numbers don't have the same number of digits). I've tried looking into using BLOB or storing my large number as a byte array, but to no avail. Padding the numbers with zeros so that they have the same number of digits doesn't really work either because I don't know how large the number might get. Any help appreciated. Thanks!

Was it helpful?

Solution

For storage, your only choices are TEXT or BLOB, so you must encode your numbers in a way so that lexicographical and numeric ordering are the same.

For unsigned numbers, you could use a mechanism similar to SQLite4's varint encoding:

Let the bytes of the encoding be called A0, A1, A2, ..., A8.
If A0 is between 0 and 240 inclusive, then the result is the value of A0.
If A0 is between 241 and 248 inclusive, then the result is 240+256*(A0-241)+A1.
If A0 is 249 then the result is 2287+256*A1+A2.
If A0 is 250 then the result is A1..A3 as a 3-byte big-endian integer.
If A0 is 251 then the result is A1..A4 as a 4-byte big-endian integer.
If A0 is 252 then the result is A1..A5 as a 5-byte big-endian integer.
If A0 is 253 then the result is A1..A6 as a 6-byte big-endian integer.
If A0 is 254 then the result is A1..A7 as a 7-byte big-endian integer.
If A0 is 255 then the result is A1..A8 as a 8-byte big-endian integer.

The above has been designed for at most 64-bit numbers. Extending the mechanism for larger numbers is trivial, as long as you do have an upper bound.

If the numbers can be signed, you would have to split the A0 range in half, and use the first half for negative numbers.

If you do not need to do computations, you could just the same principle to store ASCII digits instead of binary values. That is, use a prefix with a fixed length that specifies the length of the number, then the number. Assuming your numbers are no longer than 9999 digits, you could use a prefix length of four, e.g.:

0001|0  ...
0001|9
0002|10  ...
0002|99
0003|100  ...
0060|321741185926535897932384626433832795281828459045235360287471

If you need negative values here, you must choose an additional prefix for negative/positive numbers that sorts correctly (the ASCII order of -/+ is wrong, so you'd better use something like n/p), and you must use a prefix like 9999 – length for negative numbers so that smaller negative numbers have a smaller prefix.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top