Question

The following query hangs: (although subqueries perfomed separately are fine)

I don't know how to make the explain table look ok. If someone tells me, I'll clean it up.

select
sum(grades.points)) as p,  
from assignments 
left join grades using (assignmentID) 
where gradeID IN 

(select grades.gradeID 
from assignments 
left join grades using (assignmentID) 
where ... grades.date <= '1255503600' AND grades.date >= '984902400' 
group by     assignmentID order by grades.date DESC);

I think the problem is with the first grades table... the type ALL with that many rows seems to be the cause.. Everything is indexed.

I uploaded the table as an image. Couldn't get the formatting right: http://imgur.com/AjX34.png

A commenter wanted the full where clause:

explain extended select count(assignments.assignmentID) as asscount, sum(TRIM(TRAILING '-' FROM grades.points)) as p, sum(assignments.points) as t 
from assignments left join grades using (assignmentID) 
where gradeID IN 
(select grades.gradeID from assignments left join grades using (assignmentID) left join as_types on as_types.ID = assignments.type 
where assignments.classID = '7815' 
and (assignments.type = 30170 ) 
and grades.contactID = 7141 
and grades.points REGEXP '^[-]?[0-9]+[-]?' 
and grades.points != '-' 
and grades.points != '' 
and (grades.pointsposs IS NULL or grades.pointsposs = '') 
and grades.date <= '1255503600' 
AND grades.date >= '984902400' 
group by assignmentID 
order by grades.date DESC);
Was it helpful?

Solution

Suppose you use a Real Database (ie, any database except MySQL, but I'll use Postgres as an example) to do this query :

SELECT * FROM ta WHERE aid IN (SELECT subquery)

a Real Database would look at the subquery and estimate its rowcount :

  • If the rowcount is small (say, less than a few millions)

It would run the subquery, then build an in-memory hash of ids, which also makes them unique, which is a feature of IN().

Then, if the number of rows pulled from ta is a small part of ta, it would use a suitable index to pull the rows. Or, if a major part of the table is selected, it would just scan it entirely, and lookup each id in the hash, which is very fast.

  • If however the subquery rowcount is quite large

The database would probably rewrite it as a merge JOIN, adding a Sort+Unique to the subquery.

However, you are using MySQL. In this case, it will not do any of this (it is gonna re-execute the subquery for each row of your table) so it will take 1000 years. Sorry.

OTHER TIPS

See "The unbearable slowness of IN": http://www.artfulsoftware.com/infotree/queries.php#568

Super messy, but: (thanks for everyone's help)

   SELECT * 
   FROM grades
   LEFT JOIN assignments ON grades.assignmentID = assignments.assignmentID
   RIGHT JOIN (

   SELECT g.gradeID
 FROM assignments a
 LEFT JOIN grades g
 USING ( assignmentID ) 
 WHERE a.classID =  '7815'
 AND (
 a.type =30170
 )
 AND g.contactID =7141
  g.points
 REGEXP  '^[-]?[0-9]+[-]?'
 AND g.points !=  '-'
 AND g.points !=  ''
 AND (
 g.pointsposs IS NULL 
 OR g.pointsposs =  ''
 )
 AND g.date <=  '1255503600'
 AND g.date >=  '984902400'
 GROUP BY assignmentID
 ORDER BY g.date DESC
 ) AS t1 ON t1.gradeID = grades.gradeID

If your subquery performs fine when it is executed separately, then try using a JOIN rather than IN, like this:

select count(assignments.assignmentID) as asscount, sum(TRIM(TRAILING '-' FROM grades.points)) as p, sum(assignments.points) as t 
from assignments left join grades using (assignmentID) 
join
(select grades.gradeID from assignments left join grades using (assignmentID) left join as_types on as_types.ID = assignments.type 
where assignments.classID = '7815' 
and (assignments.type = 30170 ) 
and grades.contactID = 7141 
and grades.points REGEXP '^[-]?[0-9]+[-]?' 
and grades.points != '-' 
and grades.points != '' 
and (grades.pointsposs IS NULL or grades.pointsposs = '') 
and grades.date <= '1255503600' 
AND grades.date >= '984902400' 
group by assignmentID 
order by grades.date DESC) using (gradeID);

There really isn't enough information to answer your question, and you've put a ... in the middle of the where clause which is weird. How big are the tables involved and what are the indexes?

Having said that, if there are too many terms in an in clause, you can see seriously degraded performance. Replace the use of in with a right join.

For starters, the table as_types in the in clause is not used. Left joining it serves no purpose so get rid of it.

That leaves the in clause having only the assignments and grades table from the outer query. Clearly the wheres the modify assignments belong in the where clause for the outer query. You should move all of the where grades=whatever into the on clause of the left join to grades.

The query is a little tough to follow, but I suspect that the subquery isn't necessary at all. It seems like your query is basically thus:

SELECT FOO()
FROM assignments LEFT JOIN grades USING (assignmentID)  
WHERE gradeID IN 
(
SELECT grades.gradeID
FROM assignments LEFT JOIN grades USING (assignmentID)  
WHERE your_conditions = TRUE
);

But, you're not doing anything really fancy in the where clause in the subquery. I suspect something more like

SELECT FOO()
FROM assignments LEFT JOIN grades USING (assignmentID)  
GROUP BY groupings
WHERE your_conditions_with_some_tweaks = TRUE;

would work just as well.

If I'm missing some key logic here please comment back and I'll edit/delete this post.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top