Pergunta

My dataset looks like this:

COLA  | COLB
Name1 | 218
Name2 | 157
Name3 | 134
Name4 | 121

I need this output:

COLA  | COLB| COLC
Name1 | 218 | 0.34
Name2 | 157 | 0.60
Name3 | 134 | 0.71
Name4 | 121 | 1

My SQL looks like this so far:

SELECT COLA, COLB, COLB/SUM(COLB) FROM #MyTempTable

Two problems with this SQL. One, COLC is 0 everytime and I don't understand that. Two, even if it did result in the % it's not a cumulative %.

I've seen some similar threads on StackOverflow, but I wasn't able to make the answers from those threads work in my exact scenario.

Thanks in advance for any suggestions!

Foi útil?

Solução

I think you're looking for something like this, though your example calculations may be off a little:

SELECT
    COLA,
    COLB,
    ROUND(
        -- Divide the running total...
        (SELECT CAST(SUM(COLB) AS FLOAT) FROM #MyTempTable WHERE COLA <= a.COLA) /
        -- ...by the full total
        (SELECT CAST(SUM(COLB) AS FLOAT) FROM #MyTempTable),
        2
    ) AS COLC
FROM #MyTempTable AS a
ORDER BY COLA

EDIT: I've added rounding.

This gives us the following output:

COLA    COLB    COLC
Name1   218     0.35
Name2   157     0.6
Name3   134     0.81
Name4   121     1

The reason that your results are 0 (or 1) is because you are dividing ints by ints, thus giving you an int (see Datatype precedence).

UPDATE:

I should add that this uses a "triangular join" to get the running total (WHERE COLA <= a.COLA). Depending upon your SQL Server version, you may compare this to other options if performance becomes a concern.

Outras dicas

If you don't use OLAP functions, then you have to do a weird self-join on the table:

SELECT a.ColA, a.ColB, SUM(b.ColB) AS ColX
  FROM #MyTempTable AS a
  JOIN #MyTempTable AS b
    ON a.ColA <= b.ColA
 GROUP BY a.ColA, a.ColB

This gives you the raw cumulative SUM. You can definitely use that as a sub-query to get the answer, noting that to get the percentage, you need to divide the cumulative sum by the gross sum:

SELECT ColA, ColB, ColX / (SELECT SUM(ColB) FROM MyTempTable) AS ColC
  FROM (SELECT a.ColA, a.ColB, SUM(b.ColB) AS ColX
          FROM #MyTempTable AS a
          JOIN #MyTempTable AS b
            ON a.ColA <= b.ColA
         GROUP BY a.ColA, a.ColB
       ) AS X
 ORDER BY ColA

You may be able to write just:

SELECT a.ColA, a.ColB, SUM(b.ColB) / (SELECT SUM(ColB) FROM MyTempTable) AS ColC
  FROM #MyTempTable AS a
  JOIN #MyTempTable AS b
    ON a.ColA <= b.ColA
 GROUP BY a.ColA, a.ColB
 ORDER BY a.ColA

Multiply the ColC expression by 100 to get a percentage instead of a fraction.

Tested against IBM Informix 11.70.FC2 on Mac OS X 10.7.3, both the queries with division work, producing the same answer (and I note that I get 0.81 instead of 0.71 as required in the question):

Name1    218    0.34603174603174603174603174603175
Name2    157    0.5952380952380952380952380952381
Name3    134    0.80793650793650793650793650793651
Name4    121    1.0 

You might have to use a CAST to ensure the division is done using floating point instead of integer arithmetic — as you can see, that wasn't necessary with Informix (the SUM is a floating point decimal anyway, just in case the table has billions of rows in it, not just 4 of them). I could improve the presentation using ROUND(xxxx, 2) to get just 2 decimal places; a cast to DECIMAL(6,2) would achieve the same result, but the client should be responsible for the presentation, not the DBMS.

In MS SQL Server, this does it (ups, wrong subaggregation -> wrong result):

create table #MyTempTable (cola varchar(10), colb int)

insert into #MyTempTable(cola,colb)
select 'Name1',218
union all
select 'Name2',157
union all
select 'Name3',134
union all
select 'Name4',121

SELECT otab.COLA, otab.COLB,
       cast(otab.COLB as float)/(select SUM(cast(itab.colb as float))
                                 from #MyTempTable itab where itab.cola >= otab.cola) 
  from #MyTempTable otab

drop table #MyTempTable
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top