Hash function and custom C type for PostgreSQL

https://dba.stackexchange.com/questions/189862

10-10-2020
|

Domanda

I am creating a custom type for PostgreSQL. The type is as follow:

typedef struct {
    unsigned long prefix; 
    unsigned long long id; 
} CustomType;

I have build all the in and out functions. Comparison functions and such, but I don't understand how to build the hash function. Since I will need it for hash joins. I had a look at the hash_numeric function in https://doxygen.postgresql.org/backend_2utils_2adt_2numeric_8c.html#a1358689e8be944cad3a3fad87eb237f1 and don't quite understand it.

How does a hash function works and what is its purpose?

Soluzione

The hash function in this context is used to tranform the set of all possible type values into a substantially smaller set of their hash values. The hash join works by separating values it needs to compare for equality into buckets based on their hash values. Values producing the same hash and therefore falling in the same bucket have a chance to be equal, while values in different buckets cannot be equal and thus can be removed from consideration.

One trivial example of a 10-bucket hash for integer numbers would be a function returning the least significant digit of its parameter. You'll probably be able to use the built-in hash_any function in your case, but read the caveats about HASHES in the manual.

Altri suggerimenti

As a side note you can see this implemented in PostBIS,

Here is the .sql

CREATE FUNCTION hash_dna(dna_sequence)
  RETURNS integer AS
  '$libdir/postbis', 'hash_dna'
  LANGUAGE c IMMUTABLE STRICT;

CREATE OPERATOR CLASS dna_sequence_hash_ops
  DEFAULT FOR TYPE dna_sequence USING hash AS
    OPERATOR 1 = (dna_sequence, dna_sequence),
    FUNCTION 1 hash_dna(dna_sequence);

Here is the implementation of hash_dna,

**
 * hash_dna()
 *      Returns a CRC32 for a DNA sequence.
 *
 *  PB_CompressedSequence* seq1 : input sequence
 */
PG_FUNCTION_INFO_V1 (hash_dna);
Datum hash_dna(PG_FUNCTION_ARGS)
{
    PB_CompressedSequence* seq1 = (PB_CompressedSequence*) PG_GETARG_VARLENA_P(0);
    uint32 result;

    PB_TRACE(errmsg("->hash_dna()"));

    result = sequence_crc32(seq1, fixed_dna_codes);

    PB_TRACE(errmsg("<-hash_dna() exits with %u", result));

    PG_RETURN_UINT32(result);
}

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a dba.stackexchange