Retrieving orders that only contain certain product IDS
-
24-01-2021 - |
Domanda
I am trying to retrieve a list of transaction IDS that only contain at least 5 product IDS, the baskets could have many more product IDS but it must contain these 5 also.
My table looks something like this:
| transaction_id | trans_date | product_id | store_id |
The table is all at product_id
line level detail, nothing is rolled up.
select * from Transactions
where trans_date in ('2019-03-07','2019-03-06','2019-03-08','2019-03-09','2019-03-10')
and product_id in ('000000000021003503','000000000060280873','000000000060281480','000000000060281642','000000000020059281')
This is recovering all the transactions that contain these ids but I need it to be rolled up to a transaction level, so if the transaction_id contained these ids (at least) then its included.
Hopefully this makes sense
Soluzione
If Impala supports GROUP BY
, COUNT
and HAVING
, then you can use the following.
Here we select the records for matching dates and products, then we count how many product_id
s we actually found for each transaction on each day, and we take only those results for which we found all 5 product_id
s.
SELECT transaction_id, trans_date, COUNT(product_id) as counter
FROM Transactions
WHERE trans_date IN ('2019-03-07','2019-03-06','2019-03-08','2019-03-09','2019-03-10')
AND product_id IN ('000000000021003503', '000000000060280873', '000000000060281480', '000000000060281642', '000000000020059281')
GROUP BY transaction_id, trans_date
HAVING count(product_id) = 5
If a single transaction is allowed to span multiple dates, then you can just leave out the trans_date
from the SELECT
and GROUP BY
parts.