Question

If I have several LOJs and several INNER JOINS is there a correct Standard syntactical structure I should use?

Example scenario

  • 5 tables #A - #E all with a UserId column and each with an additional column for a measure - MeasureA in table #A, MeasureB in table #B etc.
  • Tables #A, #B, #C all have the same set of UserIds
  • Tables #D and #E have different subsets of the set of UserIds in #A-#C.

Is this the correct structure to use:

SELECT 
    #A.UserId,
    #A.MeasureA,
    #B.MeasureB,
    #C.MeasureC,
    D = COALESCE(#D.MeasureD,0.),
    E = COALESCE(#E.MeasureE,0.)
FROM        
    #A
    JOIN #B
        ON #A.UserId = #B.UserId
    JOIN #C
        ON #A.UserId = #C.UserId
    LEFT OUTER JOIN #D
        ON #A.UserId = #D.UserId
    LEFT OUTER JOIN #E
        ON #A.UserId = #E.UserId

Or should the LOJs be applied within a subquery on #A?

SELECT 
    X.UserId,
    X.MeasureA,
    #B.MeasureB,
    #C.MeasureC,
    X.D,
    X.E
FROM        
    (
    SELECT
      #A.UserId,
      #A.MeasureA,
      D = COALESCE(#D.MeasureD,0.),
      E = COALESCE(#E.MeasureE,0.)
    FROM #A 
        LEFT OUTER JOIN #D
            ON #A.UserId = #D.UserId
        LEFT OUTER JOIN #E
            ON #A.UserId = #E.UserId
    ) X
    JOIN #B
        ON X.UserId = #B.UserId
    JOIN #C
        ON X.UserId = #C.UserId
Was it helpful?

Solution

When you are using left outer joins, the intention is that one of the tables is keeping all of its rows, regardless of matches in the other tables.

My preferred structure is to put this table first:

select . . .
from <really important table> t left outer join
     . . .

This doesn't work if you have inner joins later in the from clause, because these would filter out rows with no matches.

In terms of your query, I think the first does what you expect. The second happens to do what you want, because you are only joining on the id column. But the structure is very dangerous. If one of your subsequent inner joins were on a column from #E, then it would (inadvertently) change the left joins to inner joins.

So, put the inner joins first, then the left outer joins.

OTHER TIPS

One thing to remember is that, unless you're doing something really funky, two equivalent queries that are structured differently will probably be interpreted identically by the optimizer. That's almost certainly the case with the two queries you present.

With that in mind, the only "correct" structure is the one that you find the easiest to read and maintain. Personally, I'd go for the first query, since it spells out what it's doing in a straight-forward manner.


To be a little more explicit regarding the actual question asked: the standard that applies here is not a SQL standard, but a coding standard: don't make things more complicated than they need to be.

As app developers, we trust frameworks, how come we can't trust SQL engines to do its work? The first syntax is what SQL is expecting, don't complicate it when not necessary.

However, if A -> D is one to many; A -> E is one to many and there is no relation between D and E. I would GROUP BY the D and E matching rows in indepedent sub-queries before plugging it back to the main query.

However, this practice doesn't seem to apply to your use case.

You can do it all in one query, there's really no need to write it using sub-query. Just remind how LOJ's works and you'll clearly see that!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top