Question

I am trying to retrieve a list of transaction IDS that only contain at least 5 product IDS, the baskets could have many more product IDS but it must contain these 5 also.

My table looks something like this:

| transaction_id | trans_date | product_id | store_id |

The table is all at product_id line level detail, nothing is rolled up.

select * from Transactions
where trans_date in ('2019-03-07','2019-03-06','2019-03-08','2019-03-09','2019-03-10')
and product_id in ('000000000021003503','000000000060280873','000000000060281480','000000000060281642','000000000020059281')

This is recovering all the transactions that contain these ids but I need it to be rolled up to a transaction level, so if the transaction_id contained these ids (at least) then its included.

Hopefully this makes sense

Was it helpful?

Solution

If Impala supports GROUP BY, COUNT and HAVING, then you can use the following.

Here we select the records for matching dates and products, then we count how many product_ids we actually found for each transaction on each day, and we take only those results for which we found all 5 product_ids.

SELECT transaction_id, trans_date, COUNT(product_id) as counter
FROM   Transactions
WHERE  trans_date IN ('2019-03-07','2019-03-06','2019-03-08','2019-03-09','2019-03-10')
AND    product_id IN ('000000000021003503', '000000000060280873', '000000000060281480', '000000000060281642', '000000000020059281')
GROUP BY transaction_id, trans_date
HAVING count(product_id) = 5

If a single transaction is allowed to span multiple dates, then you can just leave out the trans_date from the SELECT and GROUP BY parts.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top