Explicit vs implicit SQL joins

https://stackoverflow.com/questions/44917

sql
join

09-06-2019
|

Question

Is there any efficiency difference in an explicit vs implicit inner join? For example:

SELECT * FROM
table a INNER JOIN table b
ON a.id = b.id;

vs.

SELECT a.*, b.*
FROM table a, table b
WHERE a.id = b.id;

Solution

Performance wise, they are exactly the same (at least in SQL Server).

PS: Be aware that the IMPLICIT OUTER JOIN syntax is deprecated since SQL Server 2005. (The IMPLICIT INNER JOIN syntax as used in the question is still supported)

Deprecation of "Old Style" JOIN Syntax: Only A Partial Thing

OTHER TIPS

Personally I prefer the join syntax as its makes it clearer that the tables are joined and how they are joined. Try compare larger SQL queries where you selecting from 8 different tables and you have lots of filtering in the where. By using join syntax you separate out the parts where the tables are joined, to the part where you are filtering the rows.

On MySQL 5.1.51, both queries have identical execution plans:

mysql> explain select * from table1 a inner join table2 b on a.pid = b.pid;
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------+
| id | select_type | table | type | possible_keys | key  | key_len | ref          | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------+
|  1 | SIMPLE      | b     | ALL  | PRIMARY       | NULL | NULL    | NULL         |  986 |       |
|  1 | SIMPLE      | a     | ref  | pid           | pid  | 4       | schema.b.pid |   70 |       |
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------+
2 rows in set (0.02 sec)

mysql> explain select * from table1 a, table2 b where a.pid = b.pid;
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------+
| id | select_type | table | type | possible_keys | key  | key_len | ref          | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------+
|  1 | SIMPLE      | b     | ALL  | PRIMARY       | NULL | NULL    | NULL         |  986 |       |
|  1 | SIMPLE      | a     | ref  | pid           | pid  | 4       | schema.b.pid |   70 |       |
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------+
2 rows in set (0.00 sec)

table1 has 166208 rows; table2 has about 1000 rows.

This is a very simple case; it doesn't by any means prove that the query optimizer wouldn't get confused and generate different plans in a more complicated case.

The second syntax has the unwanted possibility of a cross join: you can add tables to the FROM part without corresponding WHERE clause. This is considered harmful.

The first answer you gave uses what is known as ANSI join syntax, the other is valid and will work in any relational database.

I agree with grom that you should use ANSI join syntax. As they said, the main reason is for clarity. Rather than having a where clause with lots of predicates, some of which join tables and others restricting the rows returned with the ANSI join syntax you are making it blindingly clear which conditions are being used to join your tables and which are being used to restrict the results.

@lomaxx: Just to clarify, I'm pretty certain that both above syntax are supported by SQL Serv 2005. The syntax below is NOT supported however

select a.*, b.*  
from table a, table b  
where a.id *= b.id;

Specifically, the outer join (*=) is not supported.

Performance wise, they are exactly the same (at least in SQL Server) but be aware that they are deprecating this join syntax and it's not supported by sql server2005 out of the box.

I think you are thinking of the deprecated *= and =* operators vs. "outer join".

I have just now tested the two formats given, and they work properly on a SQL Server 2008 database. In my case they yielded identical execution plans, but I couldn't confidently say that this would always be true.

On some databases (notably Oracle) the order of the joins can make a huge difference to query performance (if there are more than two tables). On one application, we had literally two orders of magnitude difference in some cases. Using the inner join syntax gives you control over this - if you use the right hints syntax.

You didn't specify which database you're using, but probability suggests SQL Server or MySQL where there it makes no real difference.

As Leigh Caldwell has stated, the query optimizer can produce different query plans based on what functionally looks like the same SQL statement. For further reading on this, have a look at the following two blog postings:-

One posting from the Oracle Optimizer Team

Another posting from the "Structured Data" blog

I hope you find this interesting.

Performance wise, it should not make any difference. The explicit join syntax seems cleaner to me as it clearly defines relationships between tables in the from clause and does not clutter up the where clause.

In my experience, using the cross-join-with-a-where-clause syntax often produces a brain damaged execution plan, especially if you are using a Microsoft SQL product. The way that SQL Server attempts to estimate table row counts, for instance, is savagely horrible. Using the inner join syntax gives you some control over how the query is executed. So from a practical point of view, given the atavistic nature of current database technology, you have to go with the inner join.

Basically, the difference between the two is that one is written in the old way, while the other is written in the modern way. Personally, I prefer the modern script using the inner, left, outer, right definitions because they are more explanatory and makes the code more readable.

When dealing with inner joins there is no real difference in readability neither, however, it may get complicated when dealing with left and right joins as in the older method you would get something like this:

SELECT * 
FROM table a, table b
WHERE a.id = b.id (+);

The above is the old way how a left join is written as opposed to the following:

SELECT * 
FROM table a 
LEFT JOIN table b ON a.id = b.id;

As you can visually see, the modern way of how the script is written makes the query more readable. (By the way same goes for right joins and a little more complicated for outer joins).

Going back to the boiler plate, it doesn't make a difference to the SQL compiler how the query is written as it handles them in the same way. I've seen a mix of both in Oracle databases which have had many people writing into it, both elder and younger ones. Again, it boils down to how readable the script is and the team you are developing with.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow