How does value decompression work for Facebook's Gorilla in the case where count of leading zeroes is not stored

cs.stackexchange https://cs.stackexchange.com/questions/126862

  •  29-09-2020
  •  | 
  •  

Question

I am referring to this paper: http://www.vldb.org/pvldb/vol8/p1816-teller.pdf

My question is regarding section 4.1.2 where it says:

When XOR is non-zero, calculate the number of leading and trailing zeros in the XOR, store bit ‘1’ followed by either a) or b):

(a) (Control bit ‘0’) If the block of meaningful bits falls within the block of previous meaningful bits, i.e., there are at least as many leading zeros and as many trailing zeros as with the previous value, use that information for the block position and just store the meaningful XORed value.

When decompressing, how do we know if we should add leading zeroes or tailing zeroes? For example lets say we have following 48 bits numbers:

A = 0xfff ffff fffff
B = 0xfff 0000 fffff
C = 0xfff 0ff0 fffff

and

A xor B = 0x000 ffff 00000

B xor C = 0x000 0ff0 00000

If we follow the compression algorithm for A xor B we will store the number of leading zeroes , which is 12, as well as the meaningful bits ffff.

Now, for B xor C because "there are at least as many leading zeros and as many trailing zeros as with the previous value", so we only store the meaningful bits ff. When I want to decode ff, base on previous compressed bits, I know that I have at at least 12 leading zeroes and 20 tailing zeroes, but I still need to fill in 8 zeros; should I do ff00 or 00ff or other combination?

Was it helpful?

Solution

I found my answer after reading the reference implementation: https://github.com/facebookarchive/beringei/blob/92784ec6e22572f28500c76b669276007635c875/beringei/lib/TimeSeriesStream.cpp

The use of the term "meaningful bits" in the paper is ambiguous. I thought the sentence "use that information for the block position and just store the meaningful XORed value." means to store the bits with all the trailing and leading zeroes stripped because "meaningful XORed value" is a value with no leading and trailing zero. But, according to reference implementation you just strip off same amount of leading and trailing zeroes as previous value delta; the meaningful bits in this case may still contain some leading and trailing zeroes.

So, for the example in the question, we should store 0ff0 as meaningful bits instead of just ff.

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top