Pregunta

i have a table which looks like this : coumn 1 = timestamp : string , column 2 = numOfentites : int please note i am using hiveql

Fri, 10 Aug 2001    274
Fri, 10 Dec 1999    39
Fri, 10 Mar 2000    107
Fri, 10 May 2002    26
Fri, 10 Nov 2000    351
Fri, 10 Sep 1999    22
Fri, 11 Aug 2000    189
Fri, 11 Dec 1998    1
Fri, 11 Feb 2000    84
Fri, 11 Jan 2002    580
Fri, 11 Jun 1999    12
Fri, 11 May 2001    571
Fri, 12 Apr 2002    41

Now, I retrieved the frequency per year from this table and found out some year XXXX had the most number of entities.

My aim now is to go one level deep and extract the frequency per month for the year XXXX.

I tired using the group by clause on the substring indicating month but it doesn’t work.

can you guys please give me a direction on how to proceed..

Just need a hint not the answer :P trying to learn hiveql here

EDIT here is the query that i used to extract the frequency of entities on yearly basis. note that timestamp is the first column of the input.

select  dates , count(dates) as numEmails
from (select split(timestamp," ")[3] as dates , count(timestamp)
      from dataset
      group by timestamp
     ) mailfreq
group by dates
order by numEmails desc;
¿Fue útil?

Solución

I know that hivesql has strange limitations, but won't this work?

select split(timestamp," ")[3] as yr, split(timestamp," ")[2] as mon, count(timestamp)
from dataset
group by split(timestamp," ")[3], split(timestamp," ")[2];
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top