Question

I have daily timeseries for different companies from different industries and work with PostgreSQL. I start right with an example to explain my problem. What I have is this:

+------------+---------+-------------+----+
|    day     | company | industry    | v  |
+------------+---------+-------------+----+
| 2012-01-12 | A       | consumer    | 2  |
| 2012-01-12 | B       | consumer    | 2  |
| 2012-01-12 | C       | health      | 4  |
| 2012-01-12 | D       | health      | 4  |
| 2012-01-13 | A       | consumer    | 5  |
| 2012-01-13 | B       | consumer    | 5  |
| 2012-01-13 | C       | health      | 7  |
| 2012-01-13 | D       | health      | 7  |
| 2012-01-16 | A       | consumer    | 8  |
| 2012-01-16 | B       | consumer    | 8  |
| 2012-01-16 | C       | health      | 3  |
| 2012-01-16 | D       | health      | 3  |
+------------+---------+-------------+----+

There are different companies from different industries with some value v as daily average across industries. What I would need is this:

+------------+---------+----------+---+------------+
|    day     | company | industry | v | delta_v    |
+------------+---------+----------+---+------------+
| 2012-01-12 | A       | consumer | 2 | NULL       |
| 2012-01-12 | B       | consumer | 2 | NULL       |
| 2012-01-12 | C       | health   | 4 | NULL       |
| 2012-01-12 | D       | health   | 4 | NULL       |
| 2012-01-13 | A       | consumer | 5 | 1.5        |
| 2012-01-13 | B       | consumer | 5 | 1.5        |
| 2012-01-13 | C       | health   | 7 | 0.75       |
| 2012-01-13 | D       | health   | 7 | 0.75       |
| 2012-01-16 | A       | consumer | 8 | 0.6        |
| 2012-01-16 | B       | consumer | 8 | 0.6        |
| 2012-01-16 | C       | health   | 3 | -0.571428  |
| 2012-01-16 | D       | health   | 3 | -0.571428  |
+------------+---------+----------+---+------------+

I need the daily change of variable v. For example the average value for v for industry "consumer" on 2012-01-12 is 2 and on 2012-01-13 it is 5. Thus the growth is (5-2)/2 = 1.5.

I tried this:

    SELECT * 
           , (v - LAG(v) OVER (PARTITION BY industry ORDER BY day) )
           / LAG (v) OVER (PARTITION BY industry ORDER BY day) AS delta_v
    FROM mytable
    ORDER BY day, industry

The problem is it computes the change in value v also "intra-days", if there is more than one company from the same industry on one day.

I hope it just needs a small correction in the "PARTITION BY" - clause, but I really can't figure out how to do it. Do you have any ideas that can help me?

Was it helpful?

Solution

I think you want the company in there too:

SELECT t.*,
       ((v - LAG(v) OVER (PARTITION BY industry, company ORDER BY day) )
        / LAG (v) OVER (PARTITION BY industry, company ORDER BY day)
       ) AS delta_v
FROM mytable t
ORDER BY day, industry;

I'm not sure if Postgres actually calculates the lag() twice, but this is easier to maintain:

SELECT t.*,
       (v / LAG(v) OVER (PARTITION BY industry, company ORDER BY day) ) - 1
       ) AS delta_v
FROM mytable t
ORDER BY day, industry;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top