Question

Hi I have the following table T:

id    1   2    3    4
col   a   b    a    c

I want to do a select that returns the id,col when group by(col) having count(col)>1

One way of doing it is

SELECT id,col FROM T 
   WHERE col IN (SELECT col FROM T GROUP BY(col) HAVING COUNT(col)>1);

The intern select (from the right) returns 'a' and main one (left) will return 1,a and 3,a

The problem is that the where in statement seems to be extremely slow. In my real case, the results from the internal select has many 'col's, something about 70000 and it's taking hours.

Right now it's much faster to do the internal select and the main select getting all ids and upcs and do the intersection locally. MySQL should be able to handle this kind of query efficiently.

Can I substitute the where in for a join or something faster?

Thanks

Was it helpful?

Solution

You could try if using an INNER JOIN speeds things up

  • Make sure you have an index on col
  • A covering index on col, id might gain you even better performance

SQL Statement

SELECT  T.id, T.col
FROM    T
        INNER JOIN (
          SELECT   col
          FROM     T
          GROUP BY col
          HAVING COUNT(*) > 1
        ) tcol ON tcol.col = T.col

OTHER TIPS

SELECT  id, col
FROM    t t1
WHERE   EXISTS
        (
        SELECT  NULL
        FROM    t t2
        WHERE   t2.col = t1.col
                AND t2.id <> t1.id
        )

Make sure you have an index on (col) (in InnoDB) or (col, id) (in MyISAM)

The inner query will stop evaluating as soon as it finds a first matching value. With an index, this will happen after a single index seek with at most two index scans.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top