What's the SQL query to list all rows that have 2 column sub-rows as duplicates?
-
02-07-2019 - |
Question
I have a table that has redundant data and I'm trying to identify all rows that have duplicate sub-rows (for lack of a better word). By sub-rows I mean considering COL1
and COL2
only.
So let's say I have something like this:
COL1 COL2 COL3
---------------------
aa 111 blah_x
aa 111 blah_j
aa 112 blah_m
ab 111 blah_s
bb 112 blah_d
bb 112 blah_d
cc 112 blah_w
cc 113 blah_p
I need a SQL query that returns this:
COL1 COL2 COL3
---------------------
aa 111 blah_x
aa 111 blah_j
bb 112 blah_d
bb 112 blah_d
Solution
Does this work for you?
select t.* from table t
left join ( select col1, col2, count(*) as count from table group by col1, col2 ) c on t.col1=c.col1 and t.col2=c.col2
where c.count > 1
OTHER TIPS
With the data you have listed, your query is not possible. The data on rows 5 & 6 is not distinct within itself.
Assuming that your table is named 'quux', if you start with something like this:
SELECT a.COL1, a.COL2, a.COL3
FROM quux a, quux b
WHERE a.COL1 = b.COL1 AND a.COL2 = b.COL2 AND a.COL3 <> b.COL3
ORDER BY a.COL1, a.COL2
You'll end up with this answer:
COL1 COL2 COL3
---------------------
aa 111 blah_x
aa 111 blah_j
That's because rows 5 & 6 have the same values for COL3. Any query that returns both rows 5 & 6 will also return duplicates of ALL of the rows in this dataset.
On the other hand, if you have a primary key (ID), then you can use this query instead:
SELECT a.COL1, a.COL2, a.COL3
FROM quux a, quux b
WHERE a.COL1 = b.COL1 AND a.COL2 = b.COL2 AND a.ID <> b.ID
ORDER BY a.COL1, a.COL2
[Edited to simplify the WHERE clause]
And you'll get the results you want:
COL1 COL2 COL3
---------------------
aa 111 blah_x
aa 111 blah_j
bb 112 blah_d
bb 112 blah_d
I just tested this on SQL Server 2000, but you should see the same results on any modern SQL database.
blorgbeard proved me wrong -- good for him!
Join on yourself like this:
SELECT a.col3, b.col3, a.col1, a.col2
FROM tablename a, tablename b
WHERE a.col1 = b.col1 AND a.col2 = b.col2 AND a.col3 != b.col3
If you're using postgresql, you can use the oid to make it return less duplicated results, like this:
SELECT a.col3, b.col3, a.col1, a.col2
FROM tablename a, tablename b
WHERE a.col1 = b.col1 AND a.col2 = b.col2 AND a.col3 != b.col3
AND a.oid < b.oid
Don't have a database handy to test this, but I think it should work...
select
*
from
theTable
where
col1 in
(
select
col1
from
theTable
group by
col1||col2
having
count(col1||col2) > 1
)
My naive attempt would be
select a.*, b.* from table a, table b where a.col1 = b.col1 and a.col2 = b.col2 and a.col3 != b.col3;
but that would return all the rows twice. I'm not sure how you'd restrict it to just returning them once. Maybe if there was a primary key, you could add "and a.pkey < b.pkey".
Like I said, that's not elegant and there is probably a better way to to do this.
Something like this should work:
SELECT a.COL1, a.COL2, a.COL3
FROM YourTable a
JOIN YourTable b ON b.COL1 = a.COL1 AND b.COL2 = a.COL2 AND b.COL3 <> a.COL3
In general, the JOIN clause should include every column that you're considering to be part of a "duplicate" (COL1 and COL2 in this case), and at least one column (or as many as it takes) to eliminate a row joining to itself (COL3, in this case).
This is pretty similar to the self-join, except it will not have the duplicates.
select COL1,COL2,COL3
from theTable a
where exists (select 'x'
from theTable b
where a.col1=b.col1
and a.col2=b.col2
and a.col3<>b.col3)
order by col1,col2,col3
Here is how you find duplicates. Tested in oracle 10g with your data.
select * from tst where (col1, col2) in (select col1, col2 from tst group by col1, col2 having count(*) > 1)
select COL1,COL2,COL3
from table
group by COL1,COL2,COL3
having count(*)>1
Forget joins -- use an analytic function:
select col1, col2, col3
from
(
select col1, col2, col3, count(*) over (partition by col1, col2) rows_per_col1_col2
from table
)
where rows_per_col1_col2 > 1