Question

Here is the example:

 SELECT <columns>   
 FROM (..........<subquery>..........) AS xxx  
 INNER JOIN(s) with xxx.............  
 LEFT OUTER JOIN(s) with xxx........  
 WHERE <filter conditions>

Please correct me if I'm wrong:

  1. Is that <subquery> a derived table?
  2. Is it a problem if it returns too much data (say millions of rows) regarding server memory, since i know that WHERE clause is applied to the final result set and leaving the server processing too much from the subquery even if the final result has 10 rows?
  3. What if there was no inner join (to reduce the data) and only left outer join, does that make things even worse/slow since it has to make the join with all the rows from the subquery?
  4. If (2) is a problem then one solution I think of would be to limit the data returned by the subquery by adding other joins inside which would make things slower (I've tried that). Any other thoughts on this?
  5. What if I can't limit the result from the subquery since the where clause depends on the joins from after the subquery?
  6. To clarify things out, the reason the subquery returns too much data is because I'm trying to combine data from multiple tables using UNION ALL (with no filtering conditions) and then, foreach row returned by the subquery, join to get the info I need to use it in the WHERE clause. Another way to do this is to do all the joins that you see outside the subquery for each of the UNION ALL from inside the subquery, which yes, does limit the result sets but makes more joins which, as I said, slow things down. In other words, I have to choose between a subquery that does this:
 (
 SELECT * FROM A UNION ALL
 SELECT * FROM B UNION ALL
 SELECT * FROM C...
 ) AS xxx
 left outer join T with xxx

AND

 SELECT * FROM A
 LEFT OUTER JOIN T ...
 WHERE....
 UNION ALL
 SELECT * FROM B
 LEFT OUTER JOIN T ...
 WHERE....
 UNION ALL
 SELECT * FROM C
 LEFT OUTER JOIN T ...
 WHERE....
Was it helpful?

Solution

  1. Yes it is.
  2. No, the query optimizer treats the whole query as one block. It doesn't run a derived table then run the outer statement on the result. It 'optimizes through' derived tables.
  3. Again, no. Having a derived table doesn't mean bad performance. You always have to look at your query as a whole.
  4. It's not a problem.
  5. Then that's just fine. Trust the query optimizer. Have you ever met the people that wrote it? They are scary intelligent.

In each individual case, it is worth looking at your query execution plan and finding pain points. Looks for things that are doing scans when they could be doing seeks, and that will usually give you a significant boost. Things do scans and not seeks when:

  • There is no index to seek upon
  • The thing you are seeking is the result of a function (e.g. WHERE function(field) = value)
  • The optimizer decides that a scan is actually faster.

But the bottom line answer to the question is - no, you should not be worried that derived tables would contain a lot of data if you selected them out in isolation.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top