Domanda

I'm trying to implement my GiST index, where storage type is the same as column type (bytea). They are even going to have same length because these are bitarrays and unions are just disjunctions of all arrays in set (in other words, bitwise OR of all arrays).

Because of that, I don't need to convert or compress data. In that case, is this a correct implementation of compress function?

Datum
sub_fp_compress(PG_FUNCTION_ARGS)
{
    GISTENTRY  *entry = (GISTENTRY *)PG_GETARG_POINTER(0);
    // Column and Storage data types are the same (bytea)
    // so no compression is required.
    GISTENTRY  *retval = entry;

    PG_RETURN_POINTER(retval);
}

The reason why I'm asking is that when creating index I get some unexpected values and first occurrence of this is in compress function. To be more precise, all my bitarrays stored in DB are 256 bytes long (checked) and the GiST nodes I create have same length. But all of sudden I get in penalty function as second newentry param a bitarray that is 9 bytes long. I have traced this to compress function where it first appears so it's first place I need to check.

È stato utile?

Soluzione

I think this would actually work, but I would follow the same format used in the docs for consistency on the decompress version that doesn't require compression,

PG_FUNCTION_INFO_V1(my_decompress);

Datum
my_decompress(PG_FUNCTION_ARGS)
{
    PG_RETURN_POINTER(PG_GETARG_POINTER(0));
}

My assumption here is that because you don't actually need to initialize a new GIST entry to return for Pg to write it disk, and because you're just using binary stuff, you can use one function for both

PG_FUNCTION_INFO_V1(id);

Datum
id(PG_FUNCTION_ARGS)
{
    PG_RETURN_POINTER(PG_GETARG_POINTER(0));
}

But you may have to DETOAST too which the docs eerily don't mention.

Altri suggerimenti

As a follow-up on Evan Carroll's answer and comments, I'd like to share what I think is correct way to deal with TOASTed values. Normally, this doesn't have to be done if all rows in indexed table are under 2kb. However, some of my rows were over 2kb and this was reason for behavior described in question (i.e. getting suddenly an array with length o 9 bytes). This was result of not deTOASTing values. I thought I solved it when fixing a bug in my print function but this was only because I used different data set that didn't contain those large rows.

Following code is based on ltree GiST source code. However, they do use compression, so I've tried to remove it and add comments. The index seems to work without an issue after replacing old de/compress functions with these new, no more arrays with wrong length.

Datum
index_compress(PG_FUNCTION_ARGS)
{
    // Same problems with TOASTing as in decompress method.
    // If rows can be over 2kb in size, we can't just
    // return original entry pointer. See the comments in
    // decompress method for more details. Here we'll only
    // explain details related specifically to compress method.

    GISTENTRY  *entry = (GISTENTRY *)PG_GETARG_POINTER(0);
    GISTENTRY  *retval = entry;

    // We check if entry is leafkey as only those entries
    // may be TOASTed (internal nodes are spared of this).
    // Also, only leaf nodes are usually compressed.
    if (entry->leafkey)
    {
        STORAGE_TYPE *key = (STORAGE_TYPE*)PG_DETOAST_DATUM(entry->key);

        if (PointerGetDatum(key) != entry->key)
        {
            // Value TOASTed, construct new entry to return.
            retval = (GISTENTRY *)palloc(sizeof(GISTENTRY));
            gistentryinit(*retval, PointerGetDatum(key),
                entry->rel, entry->page,
                entry->offset, false);
        }
    }

    // Value was not TOASTed, just return original entry.
    PG_RETURN_POINTER(retval);
}

Datum
index_decompress(PG_FUNCTION_ARGS)
{
    // In case row sizes wouldn't exceed 2kb, following
    // line would be enough:
    // PG_RETURN_POINTER(PG_GETARG_POINTER(0));
    // However, if there are rows over this size, they 
    // get TOASTed and we need to deTOAST them so they can
    // be used in the rest of index_* functions.

    // Retrieve first argument - entry
    GISTENTRY *entry = (GISTENTRY*)PG_GETARG_POINTER(0);
    // Attempt to deTOAST entry key. If value is TOASTed, this will
    // produce a pointer to deTOASTed value. Otherwise, original
    // pointer is returned.
    STORAGE_TYPE *key = (STORAGE_TYPE*)PG_DETOAST_DATUM(entry->key);

    // Now we are trying to find out if the value was TOASTed. According
    // to above statement, this should be the case if obtained key is
    // different from the original (i.e. entry->key). PG_DETOAST_DATUM
    // performs DatumGetPointer, so we actually need to get back to 
    // datum with PoitnerGetDatum.
    if (PointerGetDatum(key) != entry->key)
    {
        // We are sure that value was TOASTed. The solution is to create
        // new entry and fill it with the deTOASTed value. This entry
        // will be then passed to the rest of the functions so that 
        // they don't have to deal with TOAST.
        GISTENTRY  *retval = (GISTENTRY *)palloc(sizeof(GISTENTRY));

        // gistentryinit is just a macro to fill our entry pointer
        // with arguments in one "call". For key, we use the datum
        // of the deTOASTed value. The rest of parameters is passed
        // from original entry with exception of leafkey flag, which
        // is set to false.
        gistentryinit(*retval, PointerGetDatum(key),
            entry->rel, entry->page,
            entry->offset, false);

        // We return the newly assembled entry.
        PG_RETURN_POINTER(retval);
    }

    // Value was not TOASTed and we can return original entry.
    PG_RETURN_POINTER(entry);
}
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a dba.stackexchange
scroll top