Alternative to Self Join

https://dba.stackexchange.com/questions/172860

07-10-2020
|

質問

I have asked a question here: https://stackoverflow.com/questions/43807566/how-to-divide-two-values-from-the-same-column-but-at-different-rows

about dividing values from the same table, at the same column but on different rows. Now I have the problem where I have more numerators and denominators (with different uns). Is still the self join a good way to solve this problem with Postgres or there are better solutions?

Example:

| postcode | value | uns |
|----------|-------|-----|
|       AA |    40 |  53 |
|       BB |    20 |  53 |
|       AA |    10 |  54 |
|       AA |    20 |  55 |
|       AA |    10 |  56 |
|       AA |    30 |  57 |
|       AA |    50 |  58 |
|       BB |    10 |  54 |
|       BB |    10 |  55 |
|       BB |    70 |  56 |
|       BB |    80 |  57 |
|       BB |    10 |  58 |

Result should be:

| postcode | formula    |
|----------|------------|
|       AA | 18.888...  |
|       BB | 14.375     |

Where the value are grouped by postcode and the formula is (value with uns):

(V53 * V56 + V54 * V57 + V55 * V58) / (V56 + V57 + V58)

Paying attention to avoid eventual division by zero. Formula can be even more complex but that is a good example.

解決

This is a pivot / crosstab problem at its core, like Michael already diagnosed accurately.

If you are not familiar with the tablefunc module in Postgres, read basic instructions here:

PostgreSQL Crosstab Query

The query becomes simple and very fast (faster than other solutions presented here):

SELECT (v53 * v56 + v54 * v57 + v55 * v58) / NULLIF(v56 + v57 + v58, 0)
FROM   crosstab(
   'SELECT postcode, uns, value FROM tbl ORDER BY 1'
 , 'SELECT generate_series(53,58)'
   ) AS ct (postcode text
          , v53 numeric, v54 numeric, v55 numeric
          , v56 numeric, v57 numeric, v58 numeric);

NULLIF to prevent division by zero.

dbfiddle here

他のヒント

You can aggregate all uns/value pairs into a JSON object, then use that to access the UNS values by name. This requires some casting as the values can only be extracted as text from the JSON object, but the formula looks very similar to your description then:

with vals(postcode, v) as (
  select postcode, json_object_agg(uns, value)
  from x
  group by postcode
), factors (postcode, denominator, divisor) as (
  select postcode, 
         (v->>'53')::decimal * (v->>'56')::decimal + (v->>'54')::decimal * (v->>'57')::decimal + (v->>'55')::decimal * (v->>'58')::decimal,
         (v->>'56')::decimal + (v->>'57')::decimal + (v->>'58')::decimal
  from vals
)
select postcode, 
       denominator / nullif(divisor, 0)
from factors;

I have divided the aggregation, the evaluation of the denominator and divisor and the final division into three steps to make it more readable.

Online example: http://rextester.com/IZYT54566

You can simplify the formula by creating a function:

create function val(p_vals json, p_uns text)
  returns decimal
as $$
  select (p_vals ->> p_uns)::decimal;
$$
language sql;

with vals (postcode, v) as (
  select postcode, json_object_agg(uns, value)
  from x
  group by postcode
), factors (postcode, denominator, divisor) as (
  select postcode, 
         val(v, '53') * val(v, '56') + val(v, '54') * val(v, '57') + val(v, '55') * val(v, '58'),
         val(v, '56') + val(v, '57') + val(v, '58')
  from vals
)
select postcode, 
       denominator / nullif(divisor, 0)
from factors;

The PIVOT pattern would work for this. It converts rows' values to columns in a single row, according to their common key. There are a few ways to implement this. Some require only a single table scan.

After the PIVOT you would have a table with one row per postcode and a column per value. The remainder of the query would be written as though it referenced a single table.

Assuming that (postcode, uns) are UNIQUE (probably, a PK), the PIVOT pattern, as already commented by @michael-green, can be implemented portably using the following query:

SELECT
     postcode, 
     CAST(V53 * V56 + V54 * V57 + V55 * V58 AS numeric) 
         / nullif(V56 + V57 + V58, 0) AS formula
FROM
    (SELECT
         postcode,
         sum(case when uns=53 then value end) AS v53,     
         sum(case when uns=54 then value end) AS v54,     
         sum(case when uns=55 then value end) AS v55,     
         sum(case when uns=56 then value end) AS v56,
         sum(case when uns=57 then value end) AS v57,
         sum(case when uns=58 then value end) AS v58
    FROM
         t
    GROUP BY
         postcode
    ) AS s
ORDER BY
    postcode ;

Check it at SQLFiddle.

Assuming that (postcode, uns) are UNIQUE (probably, a PK), probably the simplest way, probably the most portable one, although probably not the optimal: use as many subselects as needed:

SELECT
    postcode,
    ((SELECT value FROM t WHERE t.uns = 53 AND t.postcode = p.postcode) *
     (SELECT value FROM t WHERE t.uns = 56 AND t.postcode = p.postcode) +
     (SELECT value FROM t WHERE t.uns = 54 AND t.postcode = p.postcode) *
     (SELECT value FROM t WHERE t.uns = 57 AND t.postcode = p.postcode) +
     (SELECT value FROM t WHERE t.uns = 55 AND t.postcode = p.postcode) *
     (SELECT value FROM t WHERE t.uns = 58 AND t.postcode = p.postcode)
    )::double precision / 
     nullif( (SELECT sum(value) FROM t 
              WHERE t.uns IN (56, 57, 58) AND t.postcode = p.postcode), 0)
    AS formula
FROM
    (SELECT DISTINCT postcode FROM t) AS p
ORDER BY
    postcode ;

Check at SQLFiddle.

ライセンス： CC-BY-SA と帰属

所属していません dba.stackexchange