Question

SELECT 
business_period,
SUM(transaction.transaction_value) AS total_transaction_value,
SUM(transaction.loss_value) AS total_loss_value,
(total_transaction_value - total_loss_value) AS net_value
FROM transaction
GROUP BY business_period

The above does not work as total_transaction_value and total_loss_value are not from the transaction table. Is there a way to make this query work?

Note: this query involves 500 million rows, so need to efficient.

Question:
Some answers have suggested that SUM(transaction.transaction_value) - SUM(transaction.loss_value) is cached and won't need to be computed again where as others are suggesting that I should as a derived table / subsequery to avoid repeated computation. Could someone point to something that could settle the difference in opinion?

I am using postgres 9.3.

ANSWER:

I want to quote erwin's comment here:

I ran a quick test with 40k rows and the winner was the plain version without subquery. CTE was slowest. So I think my first assumption was wrong and the query planner understands not to calculate the sums repeatedly (makes sense, too). I have seen different results with more complex expressions in the past. The planner does get smarter with every new version

Was it helpful?

Solution

Use:

SELECT 
business_period,
SUM(transaction.transaction_value) AS total_transaction_value,
SUM(transaction.loss_value) AS total_loss_value,
(SUM(transaction.transaction_value) - SUM(transaction.loss_value)) AS net_value
FROM transaction
GROUP BY business_period

OTHER TIPS

Use sum again

SELECT 
business_period,
SUM(transaction.transaction_value) AS total_transaction_value,
SUM(transaction.loss_value) AS total_loss_value,
(SUM(transaction.transaction_value) - SUM(transaction.loss_value)) AS net_value
FROM transaction
GROUP BY business_period

Just explicitly reiterate the SUMs (I believe they are only calculated once):

SELECT 
  business_period,
  SUM(transaction.transaction_value) AS total_transaction_value,
  SUM(transaction.loss_value) AS total_loss_value,
  SUM(transaction.transaction_value) - SUM(transaction.loss_value) AS net_value
FROM transaction
GROUP BY business_period

Alternatively you can use a derived table subquery, which should force it to calculate only once if the above does not do so implicitly - although there may be some additional overhead depending on what the optimizer sees:

SELECT business_period,
  total_transaction_value,
  total_loss_value,
  (total_transaction_value - total_loss_value) AS net_value
FROM
(
    SELECT 
       business_period,
       SUM(transaction.transaction_value) AS total_transaction_value,
       SUM(transaction.loss_value) AS total_loss_value,
    FROM transaction
    GROUP BY business_period
) x

Use a subquery to avoid repeated computation:

SELECT *, total_transaction_value - total_loss_value AS net_value
FROM  (
   SELECT business_period
        , SUM(transaction_value) AS total_transaction_value
        , SUM(loss_value)        AS total_loss_value
   FROM   transaction
   GROUP  BY 1
   ) sub;

Or a CTE (common table expresson) to actually force this, since CTEs pose as optimization barriers. A subquery is generally faster for simple cases like this. Postgres knows better, when collapsing subqueries is faster.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top