SQL cumulative % Total
-
28-06-2021 - |
Pergunta
My dataset looks like this:
COLA | COLB
Name1 | 218
Name2 | 157
Name3 | 134
Name4 | 121
I need this output:
COLA | COLB| COLC
Name1 | 218 | 0.34
Name2 | 157 | 0.60
Name3 | 134 | 0.71
Name4 | 121 | 1
My SQL looks like this so far:
SELECT COLA, COLB, COLB/SUM(COLB) FROM #MyTempTable
Two problems with this SQL. One, COLC is 0 everytime and I don't understand that. Two, even if it did result in the % it's not a cumulative %.
I've seen some similar threads on StackOverflow, but I wasn't able to make the answers from those threads work in my exact scenario.
Thanks in advance for any suggestions!
Solução
I think you're looking for something like this, though your example calculations may be off a little:
SELECT
COLA,
COLB,
ROUND(
-- Divide the running total...
(SELECT CAST(SUM(COLB) AS FLOAT) FROM #MyTempTable WHERE COLA <= a.COLA) /
-- ...by the full total
(SELECT CAST(SUM(COLB) AS FLOAT) FROM #MyTempTable),
2
) AS COLC
FROM #MyTempTable AS a
ORDER BY COLA
EDIT: I've added rounding.
This gives us the following output:
COLA COLB COLC
Name1 218 0.35
Name2 157 0.6
Name3 134 0.81
Name4 121 1
The reason that your results are 0 (or 1) is because you are dividing ints by ints, thus giving you an int (see Datatype precedence).
UPDATE:
I should add that this uses a "triangular join" to get the running total (WHERE COLA <= a.COLA
). Depending upon your SQL Server version, you may compare this to other options if performance becomes a concern.
Outras dicas
If you don't use OLAP functions, then you have to do a weird self-join on the table:
SELECT a.ColA, a.ColB, SUM(b.ColB) AS ColX
FROM #MyTempTable AS a
JOIN #MyTempTable AS b
ON a.ColA <= b.ColA
GROUP BY a.ColA, a.ColB
This gives you the raw cumulative SUM. You can definitely use that as a sub-query to get the answer, noting that to get the percentage, you need to divide the cumulative sum by the gross sum:
SELECT ColA, ColB, ColX / (SELECT SUM(ColB) FROM MyTempTable) AS ColC
FROM (SELECT a.ColA, a.ColB, SUM(b.ColB) AS ColX
FROM #MyTempTable AS a
JOIN #MyTempTable AS b
ON a.ColA <= b.ColA
GROUP BY a.ColA, a.ColB
) AS X
ORDER BY ColA
You may be able to write just:
SELECT a.ColA, a.ColB, SUM(b.ColB) / (SELECT SUM(ColB) FROM MyTempTable) AS ColC
FROM #MyTempTable AS a
JOIN #MyTempTable AS b
ON a.ColA <= b.ColA
GROUP BY a.ColA, a.ColB
ORDER BY a.ColA
Multiply the ColC expression by 100 to get a percentage instead of a fraction.
Tested against IBM Informix 11.70.FC2 on Mac OS X 10.7.3, both the queries with division work, producing the same answer (and I note that I get 0.81 instead of 0.71 as required in the question):
Name1 218 0.34603174603174603174603174603175
Name2 157 0.5952380952380952380952380952381
Name3 134 0.80793650793650793650793650793651
Name4 121 1.0
You might have to use a CAST to ensure the division is done using floating point instead of integer arithmetic — as you can see, that wasn't necessary with Informix (the SUM is a floating point decimal anyway, just in case the table has billions of rows in it, not just 4 of them). I could improve the presentation using ROUND(xxxx, 2)
to get just 2 decimal places; a cast to DECIMAL(6,2) would achieve the same result, but the client should be responsible for the presentation, not the DBMS.
In MS SQL Server, this does it (ups, wrong subaggregation -> wrong result):
create table #MyTempTable (cola varchar(10), colb int)
insert into #MyTempTable(cola,colb)
select 'Name1',218
union all
select 'Name2',157
union all
select 'Name3',134
union all
select 'Name4',121
SELECT otab.COLA, otab.COLB,
cast(otab.COLB as float)/(select SUM(cast(itab.colb as float))
from #MyTempTable itab where itab.cola >= otab.cola)
from #MyTempTable otab
drop table #MyTempTable