Redis Hyperloglog - PFCOUNT side effect

https://stackoverflow.com/questions/23164374

06-07-2023
|

Question

Redis recently released their new data structure called the HyperLogLog. It allows us to keep a count of unique objects and only takes up a size of 12k bytes. What I don't understand is that Redis's PFCOUNT command is said to be technically a write command. Why is this the case?

Note: as a side effect of calling this function, it is possible that the HyperLogLog is modified, since the last 8 bytes encode the latest computed cardinality for caching purposes. So PFCOUNT is technically a write command.

Solution

The header of the HyperLogLog object is as follows:

struct hllhdr {
    char magic[4];      /* "HYLL" */
    uint8_t encoding;   /* HLL_DENSE or HLL_SPARSE. */
    uint8_t notused[3]; /* Reserved for future use, must be zero. */
    uint8_t card[8];    /* Cached cardinality, little endian. */
    uint8_t registers[]; /* Data bytes. */
};

Note the card field: it contains the last cardinality evaluated by the algorithm. Calculating an estimation of the cardinality is an expensive operation, so Redis caches the value and keeps it in this field.

When PFADD is called, the HyperLogLog object may be updated or not (there is a good probability it is not). If it is not updated, calling PFCOUNT will reuse the cached value (card field). If it is updated, the card field is invalidated, so the next PFCOUNT will execute the counting algorithm, and write the new value in the card field.

That's why the PFCOUNT can alter the HyperLogLog object.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow