Question

I have a table with following data:

User#       App
1       A
1       B
2       A   
2       B
3       A

I want to know overlapping between Apps by distinct Users, so my end result with look like this

App1  App2  DistinctUseroverlapped 
A     A     3
A     B     2
B     B     2

So what result means is there are 3 users using app A only , there are 2 users who use App A and App B both , and there are 2 users who use App B only.

Remember there lot of app and users how can I do this in SQL?

Was it helpful?

Solution

My solution starts by generating all possible pairs of applications that are of interest. This is the driver subquery.

It then joins in the original data for each of the apps.

Finally, it uses count(distinct) to count the distinct users that match between the two lists.

select pairs.app1, pairs.app2,
       COUNT(distinct case when tleft.user = tright.user then tleft.user end) as NumCommonUsers
from (select t1.app as app1, t2.app as app2
      from (select distinct app
            from t
           ) t1 cross join
           (select distinct app
            from t
           ) t2
      where t1.app <= t2.app
     ) pairs left outer join
     t tleft
     on tleft.app = pairs.app1 left outer join
     t tright
     on tright.app = pairs.app2
group by pairs.app1, pairs.app2

You could move the conditional comparison in the count to the joins and just use count(distinct):

select pairs.app1, pairs.app2,
       COUNT(distinct tleft.user) as NumCommonUsers
from (select t1.app as app1, t2.app as app2
      from (select distinct app
            from t
           ) t1 cross join
           (select distinct app
            from t
           ) t2
      where t1.app <= t2.app
     ) pairs left outer join
     t tleft
     on tleft.app = pairs.app1 left outer join
     t tright
     on tright.app = pairs.app2 and
        tright.user = tleft.user
group by pairs.app1, pairs.app2

I prefer the first method because it is more explicit on what is being counted.

This is standard SQL, so it should work on Vertica.

OTHER TIPS

this works in vertica 6

 with tab as 
    ( select 1 as user,'A' as App
    union  select 1 as user,'B' as App
    union select 2 as user,'A' as App
    union select 2 as user,'B' as App
    union select 3 as user,'A' as App
    )
    , apps as 
    ( select distinct App  from tab )
    select apps.app as APP1,tab.app as APP2 ,count(distinct tab.user) from tab,apps
    where tab.app>=apps.app
    group by 1,2
    order by 1
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top