Question

Let's say I have a table like this:

|id|userID|email         |website    |
--------------------------------------
|1 |user1 |user1@test.com|website.com|
|2 |user2 |user2@test.com|website.com|
|3 |user3 |user3@test.com|website.com|
|4 |user1 |user1@test.com|foo.com    |
|5 |user2 |user2@test.com|foo.com    |

And I want to get all of the rows where website='website.com' and have a corresponding row with a matching userID where website='foo.com'

So, in this instance it would return rows 1 and 2.

Any ideas?

Was it helpful?

Solution 2

To get the user you can do

select userID
from your_table
where website in ('website.com', 'foo.com')
group by userID
having count(distinct website) = 2

but if you need the complete row then do

select * from your_table
where userID in
(
   select userID
   from your_table
   where website in ('website.com', 'foo.com')
   group by userID
   having count(distinct website) = 2
)

OTHER TIPS

Here is one way:

select t.*
from t
where t.website = 'website.com' and
      exists (select 1 from t t2 where t2.userId = t.userId and t2.website = 'foo.com');

EDIT:

You can also express this as a join:

select distinct t.*
from t join
     t2
     on t2.userId = t.userId and
        t.website = 'website.com' and
        t2.website = 'foo.com';

If you know there are not duplicates, then you can remove the distinct.

Hive is kinda limited in terms of using subqueries (they are only allowed in the FROM clause), so we'll have to work around that a bit. The good news is that we don't really have to worry (much) about doing massive joins, because, well, Hadoop. :-)

One way to find matching rows in a table is to simply join the table on itself.

SELECT left.*
FROM your_table left
JOIN your_table right
ON (left.userID = right.userID)
WHERE left.website = 'website.com'
AND right.website = 'foo.com';

Note we have two versions of the same table, called left and right respectively, and we are retrieving rows from left that have a matching row in right that has the same userID (JOIN condition) but website is foo.com (and clause).

Hope that helps.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top