Question

I am using Hives built in collect_set function. The table looks like this:

 cookie, events, keywords,pages 
 1234      1      'dress'  10
 1234      1      'dress'  10
 1235      2      'shoes'  14
 1234      5      'socks'  22

using collect_set I can get the following structure

   select cookie, collect_set(events) as ev, collect_set(keywords) as kwords, 
   collect_set(pages)
    from table1 
    group by cookie

What I need to do is search the the collected arrays, multiple times, an example would be something like:

 select cookie 
 ,array_contains(collect_set(events),2) as has_2
 ,array_contains(collect_set(keywords),1) as has_4
  from table1
  group by cookie) A 

As I understand, I am unable to project a field more than 1 time and end up having to do something like

 select a.cookie,a.has_2,b.has_4 from ( 
 select cookie 
 ,array_contains(collect_set(events),2) as has_2 
 from table1 group by cookie ) A
 inner join 
 select cookie 
 ,array_contains(collect_set(events),4) as has_4
 from table1 group by cookie) B 
on A.cookie = B. cookie

final result looks like:

 cookie, has_2, has_4 
 1234     F      F 
 1235     T      T 

Is there any way to do this without the self join? Currently I would have to self join something like 30 times to get the format I need.

Thanks

Was it helpful?

Solution

select S.cookie, array_contains(S.events_set,2), array_contains(S.events_set,4) 
from
(select cookie, collect_set(events) as events_set
 from table1 group by cookie ) S

OTHER TIPS

You should introduce a GROUP BY to your SQL.

e.g.

select
    cookie,
    array_contains(collect_set(events),2) as has_2,
    array_contains(collect_set(keywords),1) as has_4
 from
    table1
 group by
    cookie;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top