Question

I'm wondering if someone can explain how the IN calculates? Well, ultimately I'm trying to find out why this query is slow and how to optimize it. I waited over 3 minutes and when I cancelled the query it had only returned 1000 lines which doesn't seem like it should take that long.

SELECT t2.* 
FROM report_tables.roc_test_results as t2 
WHERE t2.job IN (SELECT DISTINCT(t1.job) 
                   FROM report_tables.roc_test_results as t1 
                  WHERE t1.operation = 'TEST' 
                    AND result = 'Passed' 
                    AND STR_TO_DATE(t1.date_created,'%d-%M-%Y') BETWEEN '2009-10-01' 
                                                                    AND '2009-10-31')

I'm not sure what the total query should return, if I had to guess I would say around 2000 records, the subquery returns 332 (336 when not Distinct).

Can anyone give me some pointers on how to optimize this query? Also, I'm wondering, does the subquery calculate every time or just once and store it?

As requested, the results for DESC... (by the way, please don't laugh, I am self taught so I'm sure this table is hideously designed.)

Field                     Type               Null     Key    Default    Extra
------                      -----              -----     ---    -------    -----
operation                 varchar(10)         NO   
tester                 varchar(25)            NO   
result                 varchar(45)            NO   
fail_mode              varchar(45)         NO   
primary_failure        varchar(25)           NO   
ref_des                varchar(45)           NO   
rf_hours               varchar(15)          NO   
ac_hours               varchar(15)          NO   
comments               text              NO   
job                    varchar(15)           NO   
rma                    bigint(20) unsigned    NO   
item                   varchar(45)          NO   
item_description       text                  NO   
serial                 varchar(25)            NO   
created_by             varchar(25)            NO   
collection             bigint(20) unsigned    NO    PRI  
date_created           varchar(15)          NO   
Was it helpful?

Solution

The date_created data type needs to change to be a DATETIME before it's worth defining an index on the column. The reason being, the index will be worthless if you are changing the data type from string to DATETIME as you are currently.

You've mentioned that you're using LOAD DATA INFILE, and that the source file contains dates in DD-MON-YY format. MySQL will implicitly convert strings into DATETIME if the YY-MM-DD format is used, so if you can correct this in your source file before using LOAD DATA INFILE the rest should fall in to place.

After that, a covering index using:

  • job
  • operation
  • result
  • date_created

...would be a good idea.

OTHER TIPS

first of all you don't need the distinct in the subquery since IN eliminates duplicates anyhow Do you need the function call in the WHERE clause and do you have and index on the date_created column?

what happens when you change

WHERE STR_TO_DATE(t1.date_created,'%d-%M-%Y') 
BETWEEN '2009-10-01' AND '2009-10-31')

to

WHERE 1.date_created >= '2009-10-01' 
AND 1.date_created < '2010-01-01'

Sometimes indexes won't be used if you use functions on the column

My advice is to replace the IN with a JOIN, and then consider adding indexes on some of your columns, such as job, and maybe operation and/or result. You should read up on indexes in the MySQL manual, and also on using EXPLAIN to optimize your queries:

http://dev.mysql.com/doc/refman/5.1/en/indexes.html

http://dev.mysql.com/doc/refman/5.1/en/using-explain.html

Here's an example of converting the IN to a JOIN:

SELECT distinct t2.* 
FROM roc_test_results as t2
inner join roc_test_results as t1 on t1.job = t2.job
WHERE t1.operation = 'TEST' 
AND t1.result = 'Passed' 
AND STR_TO_DATE(t1.date_created,'%d-%M-%Y') BETWEEN '2009-10-01' AND '2009-10-31';
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top