Question

Background

I've got two tables with different types of feedback items in MySQL. I've built a query to combine these tables by FULL OUTER JOIN (which is actually written as two joins and an union in MySQL) and to count some average grades. This query seems to work perfectly:

  (SELECT name, AVG(l.overallQuality) AS avgLingQual,
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM feedback_linguistic AS l
  LEFT JOIN feedback_service AS s USING(name)
  GROUP BY name)
UNION ALL
  (SELECT name, AVG(l.overallQuality) AS avgLingQual, 
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM feedback_linguistic AS l
  RIGHT JOIN feedback_service AS s USING(name)
  WHERE l.id IS NULL
  GROUP BY name)
ORDER BY name;

(This is somewhat simplified for readability but it doesn't make a difference here)

Problem

Next I tried adding filtering by date (i.e. only feedback items created after a certain date are taken in account). With my SQL skills and the research I did, I was able to come up with this:

  (SELECT name, AVG(l.overallQuality) AS avgLingQual,
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM feedback_linguistic AS l
  LEFT JOIN feedback_service AS s USING(name)
  WHERE (s.createdTime >= '" & date & "' OR s.createdTime IS NULL)
    AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL)
  GROUP BY name)
UNION ALL
  (SELECT name, AVG(l.overallQuality) AS avgLingQual, 
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM feedback_linguistic AS l
  RIGHT JOIN feedback_service AS s USING(name)
  WHERE l.id IS NULL
    AND (s.createdTime >= '" & date & "' OR s.createdTime IS NULL)
  GROUP BY name)
ORDER BY name;

This almost works: the results I get look about right. However, a couple of feedback items are missing. For example, setting the date one month ago, I counted feedback for 21 different people in the database, but this query only returns 19 people. The worst thing is that I can't seem to find any similarities between the missing items.

Am I doing something wrong in this query? I think that the WHERE clause does the date filtering after the JOIN and ideally I would probably be doing it before. Then again, I don't know if this causes my problem and I also have no idea how to write this query differently.

Was it helpful?

Solution

A full outer join is a combination of 3 joins:

1- inner join between A and B
2- left exclusion join between A and B
3- right exclusion join between A and B

Note that the combination of an inner and a left exclusion join is a left outer join, so you normally rewrite the query as a left outer join + right exclusion join.
However for debugging purposes it can be useful to union all 3 joins and to add some marker as to which join does what:

  /*inner join*/
  (SELECT
     'inner' as join_type 
     , COALESCE(s.name, l.name) as listname
     , AVG(l.overallQuality) AS avgLingQual
     , AVG(s.overallSatisfaction) AS avgSvcQual 
  FROM feedback_linguistic l 
  INNER JOIN feedback_service s ON (l.name = s.name) 
  WHERE (s.createdTime >= '" & date & "' OR s.createdTime IS NULL) 
    AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL) 
  GROUP BY l.name) 
UNION ALL
  (SELECT
     'left exclusion' as join_type 
     , COALESCE(s.name, l.name) as listname
     , AVG(l.overallQuality) AS avgLingQual
     , AVG(s.overallSatisfaction) AS avgSvcQual 
  FROM feedback_linguistic l 
  LEFT JOIN feedback_service s ON (l.name = s.name) 
  WHERE s.id IS NULL
    /*AND (s.createdTime >= '" & date & "' OR s.createdTime IS NULL) */
    AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL) 
  GROUP BY l.name) 
UNION ALL
  (SELECT 
     'right exclusion' as join_type
     , COALESCE(s.name, l.name) as listname
     , AVG(l.overallQuality) AS avgLingQual 
     , AVG(s.overallSatisfaction) AS avgSvcQual 
  FROM feedback_linguistic l 
  RIGHT JOIN feedback_service s ON (s.name = l.name) 
  WHERE l.id IS NULL
    AND (s.createdTime >= '" & date & "' OR s.createdTime IS NULL) 
    /*AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL) */
  GROUP BY s.name) 
ORDER BY listname; 

I think that the WHERE clause does the date filtering after the JOIN and ideally I would probably be doing it before.

If you want to do the filtering before, then put it in the join clause.

OTHER TIPS

I accepted Johans answer as he did a good job explaining this stuff to me and the answer is useful even in a more generic sense. However, I thought I'd also post the first solution I arrived to. It was using subqueries:

  (SELECT name, AVG(l.overallQuality) AS avgLingQual,
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM (
    SELECT * FROM feedback_linguistic WHERE createdTime >= '" & date & "'
  ) AS l
  LEFT JOIN (
    SELECT * FROM feedback_service WHERE createdTime >= '" & date & "'
  ) AS s USING(name)
  GROUP BY name)
UNION ALL
  (SELECT name, AVG(l.overallQuality) AS avgLingQual, 
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM (
    SELECT * FROM feedback_linguistic WHERE createdTime >= '" & date & "'
  ) AS l
  RIGHT JOIN (
    SELECT * FROM feedback_service WHERE createdTime >= '" & date & "'
  ) AS s USING(name)
  WHERE l.id IS NULL
  GROUP BY name)
ORDER BY name;

The results are correct with this query. However, the solution doesn't really look optimal, as subqueries are sometimes slow in my experience. Then again, I haven't done any performance analysis, so maybe using subqueries here is not a bottleneck. In any case it worked fast enough in my application.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top