query to get duplicates

https://stackoverflow.com/questions/23091553

04-07-2023
|

Question

Im working on a question for my homework assignment where I have to check an orders database to see if any book was ordered more than once in the same order. Here is an example:

+----------+------------+---------+----------+-------------+ 
| order_id | order_line | book_id | quantity | order_price | 
+----------+------------+---------+----------+-------------+ 
| 33034    | 1          | 1619    | 1        | 29.99       | 
| 33034    | 2          | 6789    | 1        | 25.95       | 
| 33034    | 3          | 1619    | 5        | 15.95       | 
| 33189    | 1          | 1667    | 2        | 25.95       |
| 40564    | 1          | 4739    | 2        | 20.99       |
| 11357    | 1          | 1667    | 2        | 35.95       |

So order 33034 ordered book 1619 twice. I cant figure out how to extract only the proper order ID's As of now it seems my query can test for more than one of the same book_id and then test the order_id associated but i cant get the logic to connect the two. The query is essentialy saying "same book? check! did the associated order_id order more than any two books? check!" I need it to make sure it ordered more than one of the SAME book.

I cant use joins, only subqueries. Im having a hard time wrapping my head around where to go next, but this is what I have so far.

select order_id
from a_bkorders.order_details
where book_id in (
     select book_id     
     from a_bkorders.order_details
     group by book_id
     having count(book_id)>1)
group by order_id
having count(order_id)>1)

thanks for any advice help!

Solution

Why not just use

SELECT a.order_id FROM
(SELECT order_id, book_id, COUNT(*)
FROM a_bkorders.order_details 
GROUP BY order_id, book_id 
HAVING count(*)>1
) AS a

Technically speaking, you should use SELECT DISTINCT (since one order might have two book ids, each of which were ordered twice). But otherwise this should do the job (you follow the logic, yes? It's using a subquery to find out which order and book ids occur in tandem more than once; from there it's grabbing the order_ids from that list of order_id, book_id, and count() for which count() > 1).

Now, this is all pending a certain assumption that "if any book was ordered more than once in the same order" --> any book that shows up as a line item twice (eg count()> 1), not any book for which quantity > 1 (eg count() >1 or quantity > 1)

If you need to check if quantity > 1 or count(*) greater than one, I'd recommend the following:

SELECT a.order_id FROM
(SELECT order_id, book_id, SUM(quantity)
FROM a_bkorders.order_details 
GROUP BY order_id, book_id 
HAVING SUM(quantity) > 1
) AS a

If it's not apparent, the SUM(quantity) will > 1 if count(*) > 1. It will also > 1 if quantity > 1 as well.

OTHER TIPS

How about something like this:

SELECT order_id FROM    
  (SELECT order_id, COUNT(*) - COUNT(DISTINCT book_id) AS duplicate_count
    FROM a_bkorders.order_details
    GROUP BY order_id
  ) t
WHERE t.duplicate_count > 0

Note that this is counting duplicates with a quantity of 0.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow