How to write median & mode calculation function based on the group in mariadb ? So that i can use it in the query itself

StackOverflow https://stackoverflow.com/questions/22706989

  •  23-06-2023
  •  | 
  •  

Question

How to write median & mode calculation function based on the group in mariadb ? So that i can use it in the query itself. My mariadb version version is 5.5. While querying when i am using partition by clause i am getting error ? Can anybody suggest me any solution.

Was it helpful?

Solution

I recently encountered the same (or rather a similar) problem. I have not solved it, yet, but I am advancing towards a solution, which I will draft here.

User defined functions (UDFs) are written in C or C++ and have to be compiled to a shared library (*.so for *nix-Systems and *.dll for Windows). They follow a certain convention, which can be found out about in the MySQL manual. I've chosen a general quantile Approach, for it can easily return the median.

my_bool quantile_init(UDF_INIT *initid, UDF_ARGS *args, char *message)
{
    INIT_BUFFER();
}

//the initialization for the *current group*
void quantile_clear(UDF_INIT *initid, char *is_null, char *error)
{
    CLEAR_BUFFER();
    num_elements=0;

    ADD_TO_BUFFER(args);
    num_elements++;

    if(ISNAN(quantile_value))
        quantile_value = GET_QUANTILE_VALUE(args); //should only be called once, for its assumed to be constant
}

//add groups item
void void quantile_add(UDF_INIT *initid, UDF_ARGS *args,
         char *is_null, char *error)
{
    ADD_TO_BUFFER(args);
    num_elements++;
}

//very simple percentile calculation, may be flawed - just for presentation
double quantile(UDF_INIT *initid, UDF_ARGS *args,
          char *is_null, char *error)
{

    SORT_BUFFER();
    return GET_BUFFER_AT(ROUNDDOWN(quantile_value * num_elements));
}

void quantile_deinit(UDF_INIT *initid)
{
    CLEANUP();
}

To clarify the logic: MariaDB or MySQL first calls quantile_init in which all of your fancy initialization will take place (allocate buffers and so on). For each group that shall be aggregated the quantile_clear is called, in which you can reset the internal summary variables used and add the first value to the list. Now for each remaining row the quantile_add method is called, in which you would add the respective value. In the quantile function the quantile is calculated and returned.

After compilation as a shared library you can copy the file (libmylib.so/mylib.dll) to the plugins directory of your RDBS and load the function by calling

CREATE AGGREGATE FUNCTION quantile RETURNS REAL SONAME mylib

Now you should be able to use the function like

SELECT quantile(value, .5) FROM data GROUP BY category;

I've never undergone the whole procedure, hence all this information is of theoretical value, but according to the MySQL Manual it should work alike.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top