last_value ignore null in left join with additional dimension?
-
04-03-2021 - |
سؤال
This is just a silly example that I started to think about, but can't seem to come up with an elegant solution for. Assuming a Calendar table (I'll use Db2 syntax):
create table calendar
( dt date not null primary key
);
insert into calendar (dt)
with t (dt) as ( values cast('2020-01-01' as date)
union all
select dt + 1 day from t where dt < '2020-01-11')
select dt from t
;
and a table that holds the balance:
create table balance
( dt date not null
, amount int not null
, primary key (dt)
);
insert into balance (dt, amount)
values ('2020-01-03',100) ,('2020-01-05', -50);
If we want to copy the last known balance, we can use LAST_VALUE and 'IGNORE NULLS' like:
select c.dt, last_value(b.amount, 'IGNORE NULLS') over (order by c.dt)
from calendar c
left join balance b
on c.dt = b.dt;
However, if we add a dimension, say cid (customer_id), it is no longer obvious what last_value means. Normally we would partition by cid, but cid is lost due to the left join:
create table balance1
( cid int not null
, dt date not null
, amount int not null
, primary key (cid, dt)
);
insert into balance1 (cid, dt, amount)
values (1, '2020-01-03',100) ,(1, '2020-01-05', -50)
, (2, '2020-01-04',75), (2, '2020-01-08',55), (2, '2020-01-10', -35);
select c.dt, last_value(b.amount, 'IGNORE NULLS') over (partition by ?
order by c.dt)
from calendar c
left join balance b
on c.dt = b.dt;
The best I could come up with was to use a cartesian product between calendar and distinct customers from balances:
select cid, dt, last_value(amount, 'IGNORE NULLS') over (partition by cid order by dt)
from (
select cid, dt, amount from balance1
union
select distinct b.cid, c.dt, null
from balance1 b
cross join calendar c
where not exists (
select 1 from balance1
where dt = c.dt and cid = b.cid
)
) t
order by dt, cid
;
Not that pretty, and I'm looking for a more elegant solution. I used Db2 syntax above and in this Fiddle but it's the principle I'm after so any vendor syntax will do.
المحلول
Why not this?
SELECT def.cid, c.dt, LAST_VALUE(b.amount, 'IGNORE NULLS') OVER(PARTITION BY def.cid ORDER BY c.dt)
FROM calendar c
INNER JOIN (SELECT DISTINCT cid FROM balance1) AS def(cid) ON 1=1
LEFT JOIN balance1 b ON (c.dt, def.cid) = (b.dt, b.cid)
ORDER BY c.dt, def.cid
;
Comparison in dbfiddle