Alternative to Self Join
-
07-10-2020 - |
質問
I have asked a question here: https://stackoverflow.com/questions/43807566/how-to-divide-two-values-from-the-same-column-but-at-different-rows
about dividing values from the same table, at the same column but on different rows. Now I have the problem where I have more numerators and denominators (with different uns
). Is still the self join
a good way to solve this problem with Postgres or there are better solutions?
Example:
| postcode | value | uns |
|----------|-------|-----|
| AA | 40 | 53 |
| BB | 20 | 53 |
| AA | 10 | 54 |
| AA | 20 | 55 |
| AA | 10 | 56 |
| AA | 30 | 57 |
| AA | 50 | 58 |
| BB | 10 | 54 |
| BB | 10 | 55 |
| BB | 70 | 56 |
| BB | 80 | 57 |
| BB | 10 | 58 |
Result should be:
| postcode | formula |
|----------|------------|
| AA | 18.888... |
| BB | 14.375 |
Where the value are grouped by postcode and the formula is (value with uns):
(V53 * V56 + V54 * V57 + V55 * V58) / (V56 + V57 + V58)
Paying attention to avoid eventual division by zero. Formula can be even more complex but that is a good example.
解決
This is a pivot / crosstab problem at its core, like Michael already diagnosed accurately.
If you are not familiar with the tablefunc
module in Postgres, read basic instructions here:
The query becomes simple and very fast (faster than other solutions presented here):
SELECT (v53 * v56 + v54 * v57 + v55 * v58) / NULLIF(v56 + v57 + v58, 0)
FROM crosstab(
'SELECT postcode, uns, value FROM tbl ORDER BY 1'
, 'SELECT generate_series(53,58)'
) AS ct (postcode text
, v53 numeric, v54 numeric, v55 numeric
, v56 numeric, v57 numeric, v58 numeric);
NULLIF
to prevent division by zero.
dbfiddle here
他のヒント
You can aggregate all uns/value pairs into a JSON object, then use that to access the UNS values by name. This requires some casting as the values can only be extracted as text from the JSON object, but the formula looks very similar to your description then:
with vals(postcode, v) as (
select postcode, json_object_agg(uns, value)
from x
group by postcode
), factors (postcode, denominator, divisor) as (
select postcode,
(v->>'53')::decimal * (v->>'56')::decimal + (v->>'54')::decimal * (v->>'57')::decimal + (v->>'55')::decimal * (v->>'58')::decimal,
(v->>'56')::decimal + (v->>'57')::decimal + (v->>'58')::decimal
from vals
)
select postcode,
denominator / nullif(divisor, 0)
from factors;
I have divided the aggregation, the evaluation of the denominator and divisor and the final division into three steps to make it more readable.
Online example: http://rextester.com/IZYT54566
You can simplify the formula by creating a function:
create function val(p_vals json, p_uns text)
returns decimal
as $$
select (p_vals ->> p_uns)::decimal;
$$
language sql;
with vals (postcode, v) as (
select postcode, json_object_agg(uns, value)
from x
group by postcode
), factors (postcode, denominator, divisor) as (
select postcode,
val(v, '53') * val(v, '56') + val(v, '54') * val(v, '57') + val(v, '55') * val(v, '58'),
val(v, '56') + val(v, '57') + val(v, '58')
from vals
)
select postcode,
denominator / nullif(divisor, 0)
from factors;
The PIVOT pattern would work for this. It converts rows' values to columns in a single row, according to their common key. There are a few ways to implement this. Some require only a single table scan.
After the PIVOT you would have a table with one row per postcode and a column per value. The remainder of the query would be written as though it referenced a single table.
Assuming that (postcode, uns)
are UNIQUE
(probably, a PK), the PIVOT pattern, as already commented by @michael-green, can be implemented portably using the following query:
SELECT
postcode,
CAST(V53 * V56 + V54 * V57 + V55 * V58 AS numeric)
/ nullif(V56 + V57 + V58, 0) AS formula
FROM
(SELECT
postcode,
sum(case when uns=53 then value end) AS v53,
sum(case when uns=54 then value end) AS v54,
sum(case when uns=55 then value end) AS v55,
sum(case when uns=56 then value end) AS v56,
sum(case when uns=57 then value end) AS v57,
sum(case when uns=58 then value end) AS v58
FROM
t
GROUP BY
postcode
) AS s
ORDER BY
postcode ;
Check it at SQLFiddle.
Assuming that (postcode, uns)
are UNIQUE
(probably, a PK), probably the simplest way, probably the most portable one, although probably not the optimal: use as many subselects as needed:
SELECT
postcode,
((SELECT value FROM t WHERE t.uns = 53 AND t.postcode = p.postcode) *
(SELECT value FROM t WHERE t.uns = 56 AND t.postcode = p.postcode) +
(SELECT value FROM t WHERE t.uns = 54 AND t.postcode = p.postcode) *
(SELECT value FROM t WHERE t.uns = 57 AND t.postcode = p.postcode) +
(SELECT value FROM t WHERE t.uns = 55 AND t.postcode = p.postcode) *
(SELECT value FROM t WHERE t.uns = 58 AND t.postcode = p.postcode)
)::double precision /
nullif( (SELECT sum(value) FROM t
WHERE t.uns IN (56, 57, 58) AND t.postcode = p.postcode), 0)
AS formula
FROM
(SELECT DISTINCT postcode FROM t) AS p
ORDER BY
postcode ;
Check at SQLFiddle.