How does value decompression work for Facebook's Gorilla in the case where count of leading zeroes is not stored
-
29-09-2020 - |
Question
I am referring to this paper: http://www.vldb.org/pvldb/vol8/p1816-teller.pdf
My question is regarding section 4.1.2 where it says:
When XOR is non-zero, calculate the number of leading and trailing zeros in the XOR, store bit ‘1’ followed by either a) or b):
(a) (Control bit ‘0’) If the block of meaningful bits falls within the block of previous meaningful bits, i.e., there are at least as many leading zeros and as many trailing zeros as with the previous value, use that information for the block position and just store the meaningful XORed value.
When decompressing, how do we know if we should add leading zeroes or tailing zeroes? For example lets say we have following 48 bits numbers:
A = 0xfff ffff fffff
B = 0xfff 0000 fffff
C = 0xfff 0ff0 fffff
and
A xor B = 0x000 ffff 00000
B xor C = 0x000 0ff0 00000
If we follow the compression algorithm for A xor B
we will store the number of leading zeroes , which is 12
, as well as the meaningful bits ffff
.
Now, for B xor C
because "there are at least as many leading zeros and as many trailing zeros as with the previous value", so we only store the meaningful bits ff
. When I want to decode ff
, base on previous compressed bits, I know that I have at at least 12 leading zeroes and 20 tailing zeroes, but I still need to fill in 8 zeros; should I do ff00
or 00ff
or other combination?
Solution
I found my answer after reading the reference implementation: https://github.com/facebookarchive/beringei/blob/92784ec6e22572f28500c76b669276007635c875/beringei/lib/TimeSeriesStream.cpp
The use of the term "meaningful bits" in the paper is ambiguous. I thought the sentence "use that information for the block position and just store the meaningful XORed value." means to store the bits with all the trailing and leading zeroes stripped because "meaningful XORed value" is a value with no leading and trailing zero. But, according to reference implementation you just strip off same amount of leading and trailing zeroes as previous value delta; the meaningful bits in this case may still contain some leading and trailing zeroes.
So, for the example in the question, we should store 0ff0
as meaningful bits instead of just ff
.