Oracle SQL Query to Summarize Statistics, using GROUP BY

https://stackoverflow.com/questions/1029032

06-07-2019
|

Question

I have an Oracle table with data that looks like this:

ID   BATCH   STATUS
1    1       0
2    1       0
3    1       1
4    2       0

That is, ID is the primary key, there will be multiple rows for each "batch," and each row will have a status code in the STATUS column. There are a bunch of other columns, but these are the important ones.

I need to write a query which will summarize the status codes for each batch; there are three possible values that can go in the STATUS column, 0, 1, and 2, and I would like output that looks something like this:

BATCH  STATUS0  STATUS1  STATUS2
1      2        1        0
2      1        0        0

Those numbers would be counts; for batch 1, there are

2 records where STATUS is set to 0
1 record where STATUS is set to 1, and
no records where STATUS is set to 0.

For batch 2, there is

1 record where STATUS is set to 0, and
no records where STATUS is set to 1 or 2.

Is there a way that I can do this in one query, without having to rewrite the query for each status code? i.e. I can easily write a query like this, and run it three times:

SELECT batch, COUNT(status)
FROM table
WHERE status = 0
GROUP BY batch

I could run that, then run it again where status = 1, and again where status = 2, but I'm hoping to do it in one query.

If it makes a difference, aside from the STATUS column there is another column that I might want to summarize the same way--another reason that I don't want to have to execute SELECT statement after SELECT statement and amalgamate all of the results.

Solution

select batch 
,      count(case when status=1 then 1 end) status1
,      count(case when status=2 then 1 end) status2
,      count(case when status=3 then 1 end) status3
from   table
group by batch;

This is often called a "pivot" query, and I have written an article about how to generate these queries dynamically on my blog.

Version using DECODE (Oracle-specific but less verbose):

select batch 
,      count(decode(status,1,1)) status1
,      count(decode(status,2,1)) status2
,      count(decode(status,3,1)) status3
from   table
group by batch;

OTHER TIPS

select batch,
sum(select case when status = 0 then 1 else 0 end) status0,
sum(select case when status = 1 then 1 else 0 end) status1,
sum(select case when status = 2 then 1 else 0 end) status2
from table
group by batch

select batch,
sum((decode(status,0,1,0)) status0,
sum((decode(status,1,1,0)) status1,
sum((decode(status,2,1,0)) status2,
from table
group by batch

OP asks if there's any performance benefit of one approach (SUM) over the other (COUNT). Running a simpleminded test on a table with 26K rows shows that the COUNT approach is significantly faster. YMMV.

DECLARE
  CURSOR B IS
     select batch_id
       FROM batch
      WHERE ROWNUM < 2000;

  v_t1  NUMBER;
  v_t2  NUMBER;
  v_c1  NUMBER;
  v_c2  NUMBER;
  v_opn INTEGER;
  v_cls INTEGER;
  v_btc VARCHAR2(100);
BEGIN
-- Loop using SUM
  v_t1 := dbms_utility.get_time;
  v_c1 := dbms_utility.get_cpu_time;
  FOR R IN B LOOP
     FOR R2 IN (SELECT batch_type_code
                     , SUM(decode(batch_status_code, 'CLOSED', 1, 0)) closed
                     , SUM(decode(batch_status_code, 'OPEN', 1, 0)) OPEN
                     , SUM(decode(batch_status_code, 'REWORK', 1, 0)) rework
                  FROM batch
                 GROUP BY batch_type_code) LOOP 
        v_opn := R2.open;
        v_cls := R2.closed;
     END LOOP;
  END LOOP;
  v_t2 := dbms_utility.get_time;
  v_c2 := dbms_utility.get_cpu_time;
  dbms_output.put_line('For loop using SUM:');
  dbms_output.put_line('CPU seconds used: '||(v_c2 - v_c1)/100);
  dbms_output.put_line('Elapsed time: '||(v_t2 - v_t1)/100);

-- Loop using COUNT
  v_t1 := dbms_utility.get_time;
  v_c1 := dbms_utility.get_cpu_time;
  FOR R IN B LOOP
     FOR R2 IN (SELECT batch_type_code
                     , COUNT(CASE WHEN batch_status_code = 'CLOSED' THEN 1 END) closed
                     , COUNT(CASE WHEN batch_status_code = 'OPEN' THEN 1 END) OPEN
                     , COUNT(CASE WHEN batch_status_code = 'REWORK' THEN 1 END) rework
                  FROM batch
                 GROUP BY batch_type_code) LOOP 
        v_opn := R2.open;
        v_cls := R2.closed;
     END LOOP;
  END LOOP;
  v_t2 := dbms_utility.get_time;
  v_c2 := dbms_utility.get_cpu_time;
  dbms_output.put_line('For loop using COUNT:');
  dbms_output.put_line('CPU seconds used: '||(v_c2 - v_c1)/100);
  dbms_output.put_line('Elapsed time: '||(v_t2 - v_t1)/100);
END;
/

This yielded the following output:

For loop using SUM:
CPU seconds used: 40
Elapsed time: 40.09
For loop using COUNT:
CPU seconds used: 33.26
Elapsed time: 33.34

I repeated the test a couple of times to eliminate any effects of caching. I also swapped the select statements. Results were similar across the board.

EDIT: this is the same test harness I used to answer a similar question with.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow