Question

I have table of below format

'--------------------------------------
   ID|col1   |date_ts
--------------------------------------
1    |type1  | 2011-10-01 23:59:59.163-08
2    |type1  | 2011-10-02 21:42:20.152-19
3    |type2  | 2011-10-03 23:21:49.175-21
4    |type3  | 2011-10-03 23:19:39.169-12
5    |type2  | 2011-10-05 23:34:30.129-01'

I am trying to group by date and get count of each type in col1

Below is the output i am trying to achieve

' date        | type1  | type2     |type3 |
 ------------------------------------------
 2011-10-01   |    1   |  0        |  0   |
 2011-10-02   |    1   |  0        |  0   |
 2011-10-03   |    0   |  1        |  1   |
 2011-10-05   |    0   |  1        |  0   |'

I have the below query now. But getting run time errors.

'set hive.cli.print.header=true;
 select
 sum(if(col1 = 'type1', 1, 0)) as type_1,
 sum(if(col1 = 'type2', 1, 0)) as type_2,
 sum(if(col1 = 'type3', 1, 0)) as type_3
 from table1 WHERE unix_timestamp(date_ts) >= unix_timestamp('2011-10-01 00:00:00.178-01') AND       unix_timestamp (date_ts) <= unix_timestamp('2011-10-05 23:59:59.168-08')
 GROUP BY col1, TO_DATE(date_ts)
 ORDER BY date_ts;'

Any ideas on how to do this?Thanks

Was it helpful?

Solution

You need to expose date_ts in the projected columns.

select to_date(date_ts) date_ts, sum(if(col1 = 'type1', 1, 0)) as type_1, sum(if(col1 = 'type2', 1, 0)) as type_2, sum(if(col1 = 'type3', 1, 0)) as type_3 from table1 WHERE unix_timestamp(date_ts) >= unix_timestamp('2011-10-01 00:00:00.178-01') AND unix_timestamp (date_ts) <= unix_timestamp('2011-10-05 23:59:59.168-08') GROUP BY col1, TO_DATE(date_ts) ORDER BY date_ts;'

OTHER TIPS

I removed the where condition to filter out the dates. I used a substring to just get the date part of the entire column. And just did a GROUP BY on only date column

'select substr(ltrim(date_ts),0,10) date_ts,
 sum(if(col1 = 'type1', 1, 0)) as type_1,
 sum(if(col1 = 'type2', 1, 0)) as type_2,
 sum(if(col1 = 'type3', 1, 0)) as type_3
 from table1
 GROUP BY substr(ltrim(date_ts),0,10) 
 ORDER BY date_ts;'

My output

' date        | type1  | type2     |type3 |
 ------------------------------------------
 2011-10-01   |    1   |  0        |  0   |
 2011-10-02   |    1   |  0        |  0   |
 2011-10-03   |    0   |  1        |  1   |
 2011-10-05   |    0   |  1        |  0   |'
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top